Sharding overview
Note
This feature is under active development. It's not ready production use.
Sharding PostgreSQL databases involves splitting the database between multiple machines and routing queries to the right machines using a sharding function. Like its predecessor, PgDog supports sharded PostgreSQL deployments and can route queries to the correct shards automatically, implemented as a plugin.
data:image/s3,"s3://crabby-images/834cf/834cfbd1620cda0c104a73e46334ebf45abd6207" alt="Sharding"
Sharded database routing.
Architecture
There are two ways for database clients to query sharded databases: by connecting to specific shard, or by querying all shards and aggregating the results. The former is commonly used in OLTP (transactional) systems, e.g. real time applications, and the latter is more commonly used in OLAP (analytical) databases, e.g. batch reports generation.
PgDog has good support for single shard queries, and adding support for aggregates over time1.
SQL parser
The pgdog-routing
plugin parses queries using pg_query
and can calculate the shard based on a column value specified in the query. This allows applications to shard their databases without code modifications. For queries where this isn't possible, clients can specify the desired shard (or sharding key) in a query comment.
Multi-shard queries
When the sharding key isn't available or impossible to extract from a query, PgDog can route the query to all shards and return results combined in a single response. Clients using this feature are not aware they are communicating with a sharded database and can treat PgDog connections like normal.
Learn more
-
Aggregation can get pretty complex and sometimes requires query rewriting. Examples can be found in the PostgreSQL's postgres_fdw extension. ↩