Logical Replication in Postgres
a solution for creating writable Postgres replicas
Use Case
I have a Postgres instance that serves as a web application's backend. I want to create a replica for running heavy analytic queries without adding significant load to the primary database that supports the web app.
At first, I thought this was a textbook use case for a read replica. Still, I quickly learned that I needed the ability to modify and optimize the analytic database independently of the primary one.
Example: Tables in the analytics instance should have indexes optimized for analytic queries, while tables in the primary database should have indexes optimized for transactional queries.
It is impossible to create indexes, views, stored procedures, etc., on the read replica because it is read-only. Instead, indexes must be added to the primary instance, increasing its burden.
Solution
Postgres supports "logical replication." It replicates tables from one Postgres instance to another AND keeps them synchronized without constraining the replica to read-only operations. Logical replication is built-in and doesn't require plugins or third-party tools, just a little setup and a few configuration tweaks.
How It Works
After the configuration and setup, logical replication works in two steps. First, it uses Postgres's COPY command to replicate the existing data from the primary database to the replica. Then, after the initial backfill is complete, it streams INSERT, UPDATE, and DELETE commands executed on the primary instance to the replica using the write-ahead log (WAL). The replica receives these streamed changes and commits them against its copy of the data, synchronizing it with the primary database.
Unsolicited Advice
Check the Database Logs
If replication isn't working, look at the logs on the primary instance and the replica. The odds are good that you'll find the issue there.
Create a Process for Introducing Schema Changes
Logical replication cannot handle schema changes automatically. Schema changes in either the primary database or the replica can halt replication. If you must make changes, it's best to pause synchronization, make the changes in both places and then resume.
Delete Data and Drop Constraints from Replica Tables before Syncing
Replication will stop if a transaction fails due to an error, like violating a constraint. Foreign key constraints and primary key constraints are common sources of failure.
Example: You cannot sync data to a table with a foreign key constraint if the table referenced by a foreign key doesn't exist or doesn't contain the required keys.
If a primary key constraint blocks replication, delete data from the replica table before resuming. If a foreign key constraint blocks replication, remove the constraint from the replica.
Monitor Replication Lag
Measure replication lag by comparing the replication slot's Log Sequence Number (LSN) with the current WAL LSN. Igor Neyman provided a query on the Postgres message board to do this:
SELECT
confirmed_flush_lsn,
pg_current_wal_lsn(),
(pg_current_wal_lsn() - confirmed_flush_lsn) AS lsn_distance
FROM
pg_catalog.pg_replication_slots
WHERE
slot_name = '{your replication slot name}';
The lsn_distance
represents the difference in bytes between the two WAL locations. This number will grow if something goes wrong with the replication process. It should be zero when replication is fully up-to-date.