Sharding a real Rails app
Apr 30th, 2025
Lev Kokotov
People on the Internet say Rails apps don’t scale. That’s wrong. You can spin up as many Docker containers as you have CPUs (and money). What you really need is more databases and that’s where we come in.
If you’ve been here before, skip the intro. For everyone else, PgDog is a Postgres proxy that can shard databases. It parses SQL and figures out where queries should go automatically, using query parameters.
To prove that this actually works, I decided to take Mastodon, an app used by millions of people, and shard its database. Of course, it’s using Postgres.
This is the first in a two-part series. On the path to success, we introduce new features that make PgDog the scaling engine your database can’t go without.
Setting things up
First things first, we needed to deploy PgDog between Mastodon and the DB. If you’re a Rails/Postgres expert, feel free to skip to where things get more interesting.
I’m doing this on my local, so I just had to start the pooler and change the port in config/database.yml
to 6432
:
config/database.yml
development:
primary:
<<: *default
port: 6432 # Port where PgDog is running
pgdog.toml
[[databases]]
name = "mastodon_development"
host = "127.0.0.1"
port = 5432
PgDog is now in the middle between Mastodon and Postgres and can see every single query that’s flying through:

We’re ready to launch the app, but we are not sharding yet. All we’re doing is running a transactional pooler between the app and the DB. We don’t have a sharding key and all queries are just going to one database.
The sharding key
A sharding key is a column in a Postgres table. Specifically, it’s the value of that column for each row in that table. If you know that column (and its values), you can split that table into roughly an even number of rows, and place them in different databases.
When searching for those rows (i.e., SELECT
query), you just need to know the value of the sharding key and you can find the database where the rows are stored.
Sharding a relational database with a bunch of tables requires us to understand how the app works. Picking the right sharding key is important: you can’t easily change it afterwards and, if chosen poorly, performance of your database can get worse.
If chosen right though, the data will be evenly split between databases and each shard will handle 1/N of total application traffic, and that will scale your database horizontally.
Finding the right key
This should ideally be done by someone who knows the app well. They will have a good intuition about where the bottleneck is and which table and column will split the database evenly. For everyone else (including me), the schema can help.
In theory, our ideal table has the most foreign key references. This means that a row in the table we pick has a reference to it in a lot of other tables.
For example, the accounts
table has a primary key: id BIGINT
. A foreign key to accounts
in another table is a column that has the same value as accounts.id
. For instance, the statuses
table (where a lot of your Mastodon posts are stored) has a column called account_id
, also a BIGINT
, which is a reference to an account that posted that status update.
If we find a table that has many foreign key references to it in other tables, this means the sharding key is also present in those tables. As long as the app can pass it with most queries, PgDog can route them to a shard.
I’m not an expert on Mastodon, but Postgres is. If you look up the schema directly in the database (\d+
in mastodon_development
), you’ll see that all foreign keys are explicitly listed. This is a Postgres feature that ensures data integrity and, in this case, will help us find the right sharding key.
In the best case, your schema looks a little bit like this:

The sharding key is the center of your app’s universe. Most other tables reference it in some way, so if you decide to move parts of it to a different machine, you can pull a bunch of rows from other tables with it.
That’s enough foreshadowing. Using a simple Python script, I found that the accounts
table is referenced by 75 other tables, which represents 36% of all tables (205 total) in the DB. While not perfect, that’s a great starting point.
Transitive relationships
Let’s do a quick detour here. 36% seems a bit low; let’s see if we can improve that number.
While accounts
are referred to directly from 75 other tables, it’s likely that those 75 tables have, themselves, foreign key references in other tables. Since that relationship is transitive, we can find rows in those tables using a join. When applied recursively, our search function found 123 tables that refer to accounts
directly or through another table.

That’s 60% of all tables, which is much better. Our sharding key of choice seems good, but before we can just slap it into production, we need to be sure. And for this, we built dry run mode.
Dry run mode
In a production system, the only real way to know how it works is to observe it, in production. That’s often difficult. I didn’t know if the app passed the account_id
in most queries and since I’ve never run a Mastodon server, I had no intuition about it.
To find out, I created a feature called “dry run”. When enabled, every query that goes through PgDog will pass through its internal router. Even if the database isn’t sharded, PgDog will pretend that it is and make a decision about where it should go. If the query has a sharding key, the decision will be a shard number. If not, it will be a cross-shard (or multi-shard) query.
If you’re trying this at home, open up pgdog.toml
and add this setting:
[general]
dry_run = true
When enabled, all queries are routed and recorded. The routing decision is ultimately ignored, but we now get to see the queries and the routing decisions PgDog made in the admin database:
admin=# SHOW QUERY_CACHE;
query | hits | direct | cross |
---|---|---|---|
SELECT "settings".* FROM "settings" WHERE " settings"."var" = $1 LIMIT $2 |
31 | 0 | 31 |
SELECT "mutes"."target_account_id" FROM "mutes" WHERE "mutes"."account_id" = $1 |
2 | 2 | 0 |
SELECT "lists".* FROM "lists" WHERE "lists"."account_id" = $1 |
1 | 1 | 0 |
For every query that passes through the proxy, we get the query text, how many times we’ve seen it (hits), how many times it was routed to just one shard (direct), and how many times it was sent to multiple shards (cross).
Internally, the router query cache is quite efficient. If the client uses prepared statements (Mastodon, a Rails app, does), they are deduped and only one instance of the statement, with placeholders ($1
, $2
, etc.), is recorded. Even if your app has thousands of ActiveRecord call sites, the number of queries will be manageable and easily stored in memory.
If your app doesn’t use prepared statements or the extended protocol, PgDog “normalizes” the query using pg_query
, removing parameters and replacing them with placeholders. This makes sure the query cache only stores unique queries and keeps the number of entries manageable.
Production metrics
Since we’re now “running in production”, we should be collecting some metrics. At the end of the day, we really just want to know, did we pick the right sharding key? The metrics we’re looking for are exported using a Prometheus endpoint. You can enable it in the config, like so:
[general]
prometheus_port = 9090
To visualize these, I installed a Datadog agent locally, signed up for an account, and created a dashboard. I’ve exported it as JSON, so you can set one up too. Our query router currently exposes 5 metrics:
Metric | Description |
---|---|
query_cache_hits |
Number of times a query has been found in the cache and required no parsing. |
query_cache_misses |
Number of times we’ve never seen a query and we had to parse it using pg_query . This number should be low relative to query_cache_hits . |
query_cache_direct |
Number of queries routed to only one shard. We want this to be high. |
query_cache_cross |
Number of queries that had to hit all shards. Ideally, we want this to be low. |
query_cache_size |
Total number of queries in the cache. |
Absolute numbers are fine, but we can summarize them using a percentage. The formula we’ll be using to track how we’re doing is simple:
success_rate = direct / (direct + cross) * 100
Once that number reaches 95%, we’re good to go. If all queries have roughly equal performance cost, this indicates 95% of database utilization is spread evenly across shards. If we wanted to be more precise (always a good idea), we can measure how expensive each query is and treat that as the number (summed across all queries). More on that in the next article.
It was time to measure how many queries were handled correctly. I wrote a small bash script to hit my local Mastodon instance with cURL, just to simulate a bunch of traffic, and started recording. The results, at first, were a bit underwhelming:

Womp womp. Only 26% of queries that Mastodon sent over to Postgres had a sharding key. It wasn’t zero, but 26% isn’t enough to make this work. If 74% of your queries have to hit all your shards, you haven’t sharded that much, really.
To find out what went wrong, I started browsing the query logs. Some queries had the account_id
in them, but most were hitting some tables that aren’t referenced from anywhere. Tables like settings
, terms_of_service
and site_uploads
. Every page load had at least one query referencing those tables (often more) and they had no obvious sharding key. Were we in trouble? Not quite.
Omnishards
Every app has something we call “metadata” tables. They store some info that’s rarely updated but frequently accessed. Back when I was at Instacart, we had tables that stored every single postal code (and their approximate coordinates) in the US and Canada. Those never changed, but we read them on each page load.
Given our sharding key (accounts.id
), tables like terms_of_service
and others, cannot be sharded. However, if we copied them to all shards, we could join them from any query. If we need to read them directly, we can route that query to any shard, distributing traffic evenly between databases.
To make this work, we wrote another feature, called “omnishards”. Omni, in Latin, means “all”. These tables are present on all shards and have identical data. I quickly looked through the schema and the query cache log, found some tables that looked like they would fit the bill, and added them to pgdog.toml
:
[[omnisharded_tables]]
database = "mastodon_development"
tables = [
"settings",
"site_uploads",
"ip_blocks",
"terms_of_services",
]
Judging from their names, I guessed (correctly) that they are accessed all the time. For example, ip_blocks
is checked for each HTTP request to Mastodon, to make sure the client isn’t a known spammer.
Every time PgDog sees a query that only refers to an omnisharded table, it sends it to one of the shards, using the round robin load balancing algorithm. This makes sure the load is evenly distributed between shards.
After reloading the config and restarting my cURL script, the numbers looked much better:

Without changing a single line of code, 52% of DB queries were sharded. While not the magic number (95%) we are looking for, it’s an amazing start. Cherry on the top: our query cache hit rate was 99.99%. We ended up almost never using pg_query
to parse SQL, so there was virtually no performance overhead for this feature.
Handling writes
Cross-shard queries in a read-only context aren’t the end of the world. Eventually, we’ll find and fix them. Until then, they will hit all shards, but the client will receive the right data; PgDog makes sure of that. However, if we send a write query and it gets routed to the wrong place, things could get messy.
It was time to test the real thing: creating posts.
Using the web UI this time, I went to the home page, submitted a post (a picture of my dog, of course) and looked at the query log. This is what it looked like on the backend:
BEGIN;
INSERT INTO "conversations" (
"uri", "created_at", "updated_at"
) VALUES ($1, $2, $3) RETURNING "id";
INSERT INTO "statuses" (
"account_id", "conversation_id", [...]
) VALUES ($1, $2, [...]) RETURNING "id";
COMMIT;
It happens inside a transaction (good!), but the first query gave me pause. It’s inserting data into the conversations
table which does have a transitive relationship to accounts
through statuses
. However, the query itself has no sharding key. The second query did, but PgDog would see it too late. It needed to make a routing decision for the first query before executing the second.
From a first glance, there was no way to handle this gracefully. We couldn’t just pause the first query until we received the second: ActiveRecord expects a response to a query before sending the next one. Then I remembered: we built manual routing for this exact use case.
The client knows where the query should go, so it can easily provide it as context to our query router. All we needed was a way to pass that context through.
Manual routing
Up until this point, I thought we could do manual routing using comments. It looks a bit like this:
/* pgdog_sharding_key: 1234 */ INSERT INTO my_table VALUES ($1);
In theory, that works, but in practice it’s pretty hard to add a comment to a query from an application. Even ActiveRecord, which has the #annotate
method, doesn’t always work. It can add comments to SELECT
queries, but all others can’t be annotated (not quite sure why, honestly).
Thankfully, we are in control of the Postgres protocol so we can do pretty much anything, as long as both the client and server (PgDog) are on the same page.
To pass the sharding key in this transaction, we decided to use the SET
command. It’s valid SQL and it adds a temporary session variable with the sharding key value:
BEGIN;
SET "pgdog.sharding_key" TO '114384243310209232';
-- Regular queries.
COMMIT;
All the client has to do is send that query first, inside a transaction, and PgDog would know which shard to use.
Using Rails logs, I found the code that creates the post and wrapped it with the sharding context. That worked, but the code looked kind of ugly and was easy to misuse: if it’s executed outside a transaction, it wouldn’t work.
To make this easier, I wrote a Ruby gem. Ruby, secretly, is one of my favorite languages. Things like these are really easy to implement and they just work. The gem is called pgdog and it’s available on rubygems.org. It’s a bit light on the documentation at the moment (I’m still a solo act), but here’s how it works:
PgDog.with_sharding_key(@account.id) do
# Read/write directly from/to the matching shard
end
Instead of using ApplicationRecord#transaction
, use PgDog#with_sharding_key
and the following will happen:
- A transaction will be started automatically
- The gem will send the
SET
statement before all other queries
The diff ended up being pretty short and easy to review:
- ApplicationRecord.transaction do
+ PgDog.with_sharding_key(@account.id) do
@status.save!
end
With this issue handled, both read and write queries were routed, correctly, to one shard. We were running in dry run mode, haven’t broken production, and our goal (95% of queries hitting just one shard) was getting closer.
Closing thoughts
That’s all I had time for this week. While I love to code, to get this off the ground I have to do a lot of other stuff. If you’re an engineer, you probably don’t want to know what’s really involved in running a startup.
There are a few (partially) unanswered questions, in no particular order:
- How do we generate primary keys? Spoiler alert: we have a workaround, directly in Postgres, that we’ll cover in a subsequent post.
- How many code changes do we need to make sure the sharding key is available most places? I suspect, in a well maintained code base, that number will be quite low (due to code reuse). For messy apps, see next point.
- Can we automate some code changes? I’m guessing in the age of AI, the answer is probably. As long as a human checks the work, of course. We just need to “join” the Rails logs with the queries in PgDog’s query cache.
- How can we maintain data integrity across omnisharded tables? At Instacart, we used logical replication (and
COPY
). PgDog supports cross-shard transactions, but we’ll need to use two-phase commit before using them in production. - How can we protect shards from accidental cross-shard writes? We have a trigger that validates INSERTs (and UPDATEs). More on this in the next post.
Next steps
If this is interesting, you should get in touch. PgDog is looking for early adopters! For everyone else, we’re getting closer to a production release and should have v1.0
released in June (2025, if you’re reading this from the future). A GitHub star is always appreciated.