Multi-tenant Postgres can be easy

May 14th, 2025
Lev Kokotov

If you’re a B2B company and using Postgres, you need a way to separate the data created by your customers. Typically, that means you use two identifiers for all records:

`tenant_id`	The ID of the customer who's paying the bills
`user_id`	The ID of the user who's actually using the product

The challenge then is to make sure both fields are present in all tables. Either one or both also have to be present in all queries your app is running. If you get this right, you can iterate on your app rather quickly. I find that fear slows down product development the most, and nothing is scarier than leaking one customer’s data to another.

Any other approaches, like one schema per customer or one table per customer, are a pain. Can you imagine adding a column to support a new feature? With 10 customers, maybe, but what about a 100? Or a 1,000.

So here’s my pitch: what if we enforced multi-tenant rules in a Postgres proxy? If the query has the tenant_id in the WHERE clause, let it through, and if not, throw an error. Seems simple enough, but let’s dig into the details.

Parsing queries

To know if the column is there or not, we need to understand SQL. Humans (and AI, too) sometimes have trouble with this, but Postgres does not. So we took its parser and put it inside our proxy. 100% of all queries going through are read and understood by PgDog. If you’re curious how this is possible, look up the pg_query project by PgAnalyze; it’s awesome.

This does however cost a few CPU cycles, so we built a simple cache. First time we see a query, we parse it and insert its Abstract Syntax Tree into the cache. Next time we see it, we get the AST back from the cache and validate multi-tenant rules without the overhead.

Everything is happening inside the process, so this is super quick: a mutex lock and a hash lookup. You can store as many queries as you have system memory, but most apps use some form of prepared statements, so that won’t be more than a few thousand entries.

With the parsing problem solved, let’s see what the multi-tenant rules could look like.

Multi-tenant rules

What does it mean to “have a tenant_id” in a query? Let’s start with an example that won’t be allowed:

SELECT * FROM personal_notes
WHERE tenant_id IN (SELECT tenant_id FROM customers);

Since this query is actually selecting all tenants, the rules of tenancy are not obeyed. Therefore, it’s not enough to just check that the column is there; we need to make sure it’s used correctly. Here is another example where the tenant is correctly filtered in the WHERE clause:

SELECT * FROM personal_notes
WHERE tenant_id = 5

That’s easy enough and works great, but let’s try something more complex. What about a multi-level join:

SELECT * FROM personal_notes
INNER JOIN emails ON
    emails.tenant_id = personal_notes.tenant_id
INNER JOIN customers ON
    customers.id = emails.tenant_id
WHERE customers.id = ?

It’s not immediately obvious if this query is looking at data from a single tenant or more, but if we “walk” the syntax tree, we’ll find that there is a transitive relationship between tenant_id and customers.id. That means, whatever the value of customers.id is provided, that’s our tenant ID.

As long as we can traverse the query and find equality between tenant_id and some other specific parameter, we know the rules of tenancy are followed. Reading the tree is quick so we can efficiently validate complex queries at runtime.

Let’s not forget about CTEs:

WITH admins AS (
    SELECT * FROM users
    WHERE admin = true
    AND tenant_id = ?
)
SELECT * FROM admins

That example isn’t actually that hard. A CTE is just another query, so we can walk it recursively and apply the same rules.

Updates and deletes

SELECTs is where most of the work would be, but we should still discuss the others. UPDATE queries have to have a WHERE clause and it needs to have the tenant_id in the predicate. This is a great opportunity, by the way, to guarantee some basic data integrity. One of my biggest fears is to type a query like UPDATE users SET banned = true into psql, and then accidently hit enter. That’s a SEV 0 incident, for sure, that we can avoid.

Same applies to DELETE queries. Enforcing the presence of a WHERE clause should be a no-brainer and ensuring that it has the tenant_id as well. Better yet, use “soft” deletes and block DELETE queries entirely, e.g.:

UPDATE users SET deleted_at = NOW()
WHERE id = ?

It’s kind of surprising that Postgres doesn’t have many rules around the types of queries you can run. Maybe we can help with that, at a different layer of the stack.

It might take a few tries to catch all the edge cases, but as long as we run this in dry run mode until error rate hits 0%, this can be built in a weekend.

Multi-DB

Finally, if this doesn’t alleviate the concern about data leaks between customers, we can always place tenants on separate DBs. This level of separation is actually not so hard to manage, as long as you have a decent proxy in front of all of them to route queries. No code changes required.

Since PgDog is built for sharding Postgres, multi-DB multi-tenant applications are a pretty natural application of existing code. The sharding key is tenant_id and the corresponding database cluster (replicas and all) can be set in the config.

What’s next?

Since PgDog reads SQL for a living, adding the basic building blocks for this wasn’t difficult. I wrote a PoC last night. If multi-tenancy is a pain for you, reach out, maybe we can build something for you as well.