Healthchecks
Databases proxied by pgDog are regularly checked with healthchecks. A healthcheck is a simple query, e.g.
SELECT 1
, which ensures the database is reachable and able to answer requests.
If a database fails a healthcheck, it's placed in a list of banned hosts. Banned databases are removed from the load balancer and will not serve transactions. This allows pgDog to reduce errors clients see when a database fails, for example due to hardware issues.
Replica failure
Configuration
Healthchecks are enabled by default and are used for all databases. Healthcheck interval is configurable on a global and database levels.
The default healthcheck interval is 30 seconds.
[global]
healthcheck_interval = 30_000 # ms
[[databases]]
name = "prod"
healthcheck_interval = 60_000 # ms
Timeouts
By default, pgDog gives the database 5 seconds to answer a healthcheck. If it doesn't receive a reply, the database will be banned from serving traffic for a configurable amount of time. Both the healthcheck timeout and the ban time are configurable.
Ban expiration
By default, a ban has an expiration. Once the ban expires, the replica is unbanned and placed back into rotation. This is done to maintain a healthy level of traffic across all databases and to allow for intermittent issues, like network connectivity, to resolve themselves without manual intervention.
Failsafe
If all databases in a cluster are banned due to a healthcheck failure, pgDog assumes that Healthchecks are returning incorrect information and unbans all databases in the cluster. This protects against false positives and ensures the cluster continues to serve traffic.