You can make Postgres scale

Mar 13th, 2025
Lev Kokotov

Postgres scales. No other two words that I’ve ever heard of, produced more controversy. At least in the circles I hang out in, in the company basement where infrastructure elves make the Rails app go brrr. A lot of people believe, against all odds, and marketing campaigns by Big NoSQL, that technology you know is better than the devil you just heard pitched at the Engineering Leadership meeting.

To be completely fair, I get where they are coming from. Making Postgres write more data can be challenging. You need more hardware. Most of the time, you can just get it by pushing the “Upgrade” button. Once you’ve reached the r5.24xlarge instance, with 5 replicas, just as substantial, and your vacuums are still behind schedule, things are getting pretty scary.

This is where the test of a true engineer begins. At the edge. To be clear, I’m not talking about WebAssembly. I’m talking about that engineering spirit that looks at a problem, while under executive fire, and instead of running into the hands of your nearest sales team with big promises (but little facts about your use case), solves it using first principles.

And a first principle tells us what we need. Postgres ran out of write capacity. Either because of lock contention on the WAL, or something got the vacuum stuck. It’s probably that idle transaction that’s been open for 45 seconds while the app is making a Stripe call, but that’s not our concern. We’re the infrastructure team, and our job is to make the DB work.

So we decided, collectively as a community of duct tape enthusiasts, that we will give Postgres what it wants: more machines. We are going to split that write workload, evenly between 3 (no, let’s go with 6, just to be safe) databases. If the theory of “more is better” is correct, then we should get 6 times more writes.

The road taken

It’s funny to write this. The Internet contains at least 1 (or maybe 2) meaty blog posts about how this is done. Funny, because they describe a lot of work that was done by a team of about 5. That team went through every single ActiveRecord and psycopg2 execute call to make sure it contains a sharding key.

We took a failing database, while it was serving about 100,000 users per second, ran dozens of scary looking (and mostly undocumented) commands to get it to keep that replication slot open while we snapshot it, restore it 6 different ways, and truncate most of the data to get it in sync with the new sharding scheme.

We synchronized it with logical replication, which took a day or two. These databases are precious and one accidental write can break data integrity. So the team added triggers that validate rows that are going into every table. We picked a hashing function Postgres came with since version 10, which we found in the bowels of the mailing lists. It’s a function we can call to validate rows manually or reshard data.

Data is sharded, but we didn’t tell the application how to find it. It was in 1 database, now it’s in 6. Luckily, Postgres is open source. We found the code for that hashing function, rewrote it in Ruby, and plugged it into the ApplicationRecord class. Every time someone executes a query, that code will hash the sharding key and send the query to one of the instances configured in database.yml. If it’s a SELECT, it’ll get special treatment and will be load-balanced against 5 replicas.

Python got the short end of the stick and used a special coordinator we created using partitioned foreign tables. That will also serve 1% of cross-shard queries we couldn’t untangle in time. Postgres is scary and versatile sometimes.

The team created 5 replicas for each shard. We were not kidding around this time. We never want to do this manual process again. This should keep them going for another…years. We don’t know how many because we have never seen growth like this before, but we can always add more replicas. Adding more shards though is a problem for another time. We know we can, because we got this far, and our engineering spirit is strong, but we just want to get this done.

If you’re counting with me, we’re at 36 databases. Plenty of room to grow 6x again.

Time came to write that Pgbouncer config. Too bad it can’t load balance traffic for us, but that’s okay. We have an ActiveRecord plugin that will figure it out, and the Python apps can just use replica-5-prod. It’s 10:45pm, 2 hours before the maintenance window we negotiated with the business folks, who still don’t understand why we need to take the whole app offline for a few hours every other week.

The cutover plan is simple. Shut down the database and switch the app to the new ones. Easy enough.

The time has come. The familiar concerto of Datadog, Rollbar and CloudWatch alarms starts playing. PagerDuty is ringing. Nobody is getting woken up though, we’re all here already. We forgot to set that maintenance window, but there is no time now. The business is offline and we have about 5 minutes to switch configs, manually test that everything is working, and turn the traffic back on.

Somebody wrote a Bash script to reload all the bouncers at once. Good for that person, that’s good thinking. Infrastructure engineers use Bash for everything. First principles.

The app is ready, we ran a few queries and sanity checks to make sure everything is good to go. We’re about to flip the switch back on when one of the engineers quietly asks (it’s 2 in the morning, after all):

“Hey guys, did we bump up the sequences?”

Crap. Logical replication doesn’t handle that little detail, but that’s ok, we still have that PL/pgSQL script from the last time we did this. Quick copy/paste job into the superuser psql session and we’re back in business, with plenty of room to grow the company, and a hell of a story.

The case for Postgres

Okay, that was perhaps not the cheeriest sales pitch. For the skeptic, this really could go either way, and maybe DynamoDB is the better choice for their next greenfield project. If you’re still reading this though, you just saw a roadmap for how Postgres can be sharded. It’s not only possible, it’s actually quite doable, and it’s been done.

What we really need now is a piece of code that does it for us. Preferably automatically, in the background, while we’re busy writing more ActiveRecord queries to sell stuff. Out of this belief, and because scaling Postgres should be done after lunch with a push of a button, we wrote PgDog.

For you, and because we can. It is, we believe, the answer to “Does Postgres scale?”. Yes it does. It did and it will, for as long as we, the engineers, make it so.

Get in touch

If any of this rings familiar, or you’re just curious about the project, get in touch. PgDog is an open source project for sharding Postgres.