Logical Replication in Postgres

Use Case

I have a Postgres instance that serves as a web application's backend. I want to create a replica for running heavy analytic queries without adding significant load to the primary database that supports the web app.

At first, I thought this was a textbook use case for a read replica. Still, I quickly learned that I needed the ability to modify and optimize the analytic database independently of the primary one.

Example: Tables in the analytics instance should have indexes optimized for analytic queries, while tables in the primary database should have indexes optimized for transactional queries.

It is impossible to create indexes, views, stored procedures, etc., on the read replica because it is read-only. Instead, indexes must be added to the primary instance, increasing its burden.

Solution

Postgres supports "logical replication." It replicates tables from one Postgres instance to another AND keeps them synchronized without constraining the replica to read-only operations. Logical replication is built-in and doesn't require plugins or third-party tools, just a little setup and a few configuration tweaks.

How It Works

After the configuration and setup, logical replication works in two steps. First, it uses Postgres's COPY command to replicate the existing data from the primary database to the replica. Then, after the initial backfill is complete, it streams INSERT, UPDATE, and DELETE commands executed on the primary instance to the replica using the write-ahead log (WAL). The replica receives these streamed changes and commits them against its copy of the data, synchronizing it with the primary database.

Unsolicited Advice

Check the Database Logs

If replication isn't working, look at the logs on the primary instance and the replica. The odds are good that you'll find the issue there.

Create a Process for Introducing Schema Changes

Logical replication cannot handle schema changes automatically. Schema changes in either the primary database or the replica can halt replication. If you must make changes, it's best to pause synchronization, make the changes in both places and then resume.

Delete Data and Drop Constraints from Replica Tables before Syncing

Replication will stop if a transaction fails due to an error, like violating a constraint. Foreign key constraints and primary key constraints are common sources of failure.

Example: You cannot sync data to a table with a foreign key constraint if the table referenced by a foreign key doesn't exist or doesn't contain the required keys.

If a primary key constraint blocks replication, delete data from the replica table before resuming. If a foreign key constraint blocks replication, remove the constraint from the replica.

Monitor Replication Lag

Measure replication lag by comparing the replication slot's Log Sequence Number (LSN) with the current WAL LSN. Igor Neyman provided a query on the Postgres message board to do this:

SELECT
    confirmed_flush_lsn,
    pg_current_wal_lsn(),
    (pg_current_wal_lsn() - confirmed_flush_lsn) AS lsn_distance
FROM
    pg_catalog.pg_replication_slots
WHERE
    slot_name = '{your replication slot name}';

The lsn_distance represents the difference in bytes between the two WAL locations. This number will grow if something goes wrong with the replication process. It should be zero when replication is fully up-to-date.