18 KiB
PeerDB
Fast, simple, and cost effective Postgres replication.
TL;DR
Glossary
| Term | Summary |
|---|---|
| Peer | Connection to a database that PeerDB can query |
| Mirror | Stream of changes, feed in real-time, from a source peer to a target peer |
| Alert | Notifications about issues in flows |
Setup
git clone 'https://github.com/PeerDB-io/peerdb.git' \
&& docker compose -f 'peerdb/docker-compose.yml' up -d
Usage
# Connect in SQL mode.
psql 'host=localhost port=9900 password=peerdb'
psql 'postgresql://peerdb.example.org:9900/?password=peerdb'
# Use the REST APIs.
curl -fsS --url 'http://localhost:3000/api/v1/peers/list' --request 'GET' \
--header "Authorization: Basic $(printf '%s' ':' 'your password here' | base64)"
curl -fsS --url 'http://localhost:3000/api/v1/peers/create' --request 'POST' \
--header "Authorization: Basic $(printf '%s' ':' 'your password here' | base64)" \
--header 'Content-Type: application/json' \
--data '{ … }'
Real world use cases
# List peers.
psql "host=localhost port=9900 password=$(gopass show -o 'peerdb/instance')" -c "SELECT id, name, type FROM peers;"
curl -fsS --url 'http://localhost:3000/api/v1/peers/list' \
-H "Authorization: Basic $(gopass show -o 'peerdb/instance' | xargs printf '%s' ':' | base64)"
Peers
Peers are connection settings to databases that PeerDB can operate upon.
Source PostgreSQL peers require logical replication to be enabled.
-- Check settings
sourceDb=> SELECT name,setting FROM pg_settings WHERE name IN ('wal_level','rds.logical_replication');
name | setting
-------------------------+---------
rds.logical_replication | on
wal_level | logical
(2 rows)
-- Configure sources
ALTER SYSTEM SET wal_level = logical;
ALTER SYSTEM SET max_wal_senders = 10;
ALTER SYSTEM SET max_replication_slots = 10;
Operations:
List
SELECT id, name, type FROM peers;
GET /api/v1/peers/list
Create or update
CREATE PEER IF NOT EXISTS some_postgresql_peer
FROM POSTGRES
WITH (
host='pg.example.org',
port='5432',
database='postgres',
user='postgres',
password='password'
);
| Peer type | peer.type attribute |
Configuration attribute |
|---|---|---|
| ClickHouse | 8 |
clickhouse_config |
| Kafka | 9 |
kafka_config |
| PostgreSQL | 3 or 'POSTGRES' |
postgres_config |
The optional
"allow_update": trueattribute in the API seems to do absolutely nothing as of the time of writing.
POST /api/v1/peers/create
{
"allow_update": true,
"peer": {
"name": "some_postgresql_peer",
"type": "POSTGRES",
"postgres_config": {
"host": "pg.example.org",
"port": "5432",
"database": "postgres",
"user": "postgres",
"password": "password"
}
}
}
Delete
DELETE FROM peers WHERE name == 'some_postgresql_peer';
Mirrors
Mirrors can be in the following states:
| State | Returned string | Description |
|---|---|---|
| Setup | STATUS_SETUP |
The mirror is creating the target tables and metadata tables |
| Snapshot | STATUS_SNAPSHOT |
The mirror is currently performing the initial snapshot of the tables defined in the mapping |
| Running | STATUS_RUNNING |
The mirror has completed the initial snapshot, and is in its CDC phase |
| Pausing | STATUS_PAUSING |
The mirror is in its CDC phase, and is in the process of pausing |
| Paused | STATUS_PAUSED |
The mirror is in its CDC phase, and is paused |
| Terminated | STATUS_TERMINATED |
The mirror has been deleted/terminated |
| Unknown | STATUS_UNKNOWN |
The mirror is not found in PeerDB's catalog, or its status cannot be obtained due to some other issue |
Mirrors using PostgreSQL peers as sources create replication slots in the source DB to get changes from.
Operations:
List
GET /api/v1/mirrors/list
Create
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
flow_job_name |
string | yes | name of the mirror | |
source_name |
string | yes | name of the source peer | |
destination_name |
string | yes | name of the destination peer | |
table_mappings |
array | yes | ||
table_mappings.source_table_identifier |
string | yes | source schema and table | |
table_mappings.destination_table_identifier |
string | yes | destination schema and table | |
table_mappings.exclude |
list of strings | no | [] | columns excluded from the sync |
table_mappings.columns |
list of objects | no | [] | ordering setting; for ClickHouse only |
table_mappings.columns.name |
string | yes | name of the column | |
table_mappings.columns.ordering |
number | yes | rank of the column | |
idle_timeout_seconds |
number | no | 60 | |
publication_name |
string | no | will be created if not provided | |
max_batch_size |
number | no | 1000000 | |
do_initial_snapshot |
boolean | yes | ||
snapshot_num_rows_per_partition |
number | no | 1000000 | only used for the initial snapshot |
snapshot_max_parallel_workers |
number | no | 4 | only used for the initial snapshot |
snapshot_num_tables_in_parallel |
number | no | 1 | only used for the initial snapshot |
resync |
boolean | no | false | the mirror must be dropped before re-syncing |
initial_snapshot_only |
boolean | no | false | |
soft_delete_col_name |
string | no | _PEERDB_IS_DELETED |
|
synced_at_col_name |
string | no | _PEERDB_SYNCED_AT |
CREATE MIRROR IF NOT EXISTS some_cdc_mirror
FROM main_pg TO snowflake_prod -- FROM source_peer TO target_peer
WITH TABLE MAPPING
(
public.regions:main_pg.regions, -- source_schema.table:target_schema.table
{
from: public.countries, -- source_schema.table
to: main_pg.countries, -- target_schema.table
exclude: [ local_name, size, … ] -- column_1, …, column_N
},
…
)
WITH ( do_initial_copy = true );
POST /api/v1/flows/cdc/create
{
"connection_configs": {
"flow_job_name": "some_cdc_mirror",
"source_name": "main_pg",
"destination_name": "snowflake_prod",
"do_initial_snapshot": true,
"table_mappings": [
{
"source_table_identifier": "public.regions",
"destination_table_identifier": "main_pg.regions"
},
{
"source_table_identifier": "public.countries",
"destination_table_identifier": "main_pg.countries",
"exclude": [
"local_name",
"size",
…
]
},
…
]
}
}'
Get status
POST /api/v1/mirrors/status
{
"flowJobName": "some_cdc_mirror"
}
Show configuration
POST /api/v1/mirrors/status
{
"flowJobName": "some_cdc_mirror",
"includeFlowInfo": true
}
Alerts
Operations:
Create
POST /api/v1/alerts/config
{
"config": {
"id": -1,
"service_type": "slack",
"service_config": "{\"slot_lag_mb_alert_threshold\":15000,\"open_connections_alert_threshold\":20,\"auth_token\":\"xoxb-012345678901-0123456789012-1234ABcdEFGhijKLMnopQRST\",\"channel_ids\":[\"C01K23X4567\"]}",
"alert_for_mirrors": [
"some_cdc_mirror",
"some_other_mirror"
]
}
}
Show configuration
GET /api/v1/alerts/config
Gotchas
-
The documentation is sorely lacking.
-
The product appears to have not been designed with configuration automation via IaC (nor APIs in general) in mind.
-
The API proven unreliable, non-idempotent, or plain did not work as described in the API Reference as of 2025-03-19.
E.g., theallow_update: truedata parameter in a request to thepeers/createendpoint should make the APIs update a peer's settings when one with the given name already exists, but the peer just does not get updated. -
API responses hide error messages behind a
200 OKHTTP status code as of 2025-03-19.Response example
Output of a
ansible.builtin.uriAnsible task executed against the PeerDB server:{ "json": { "message": "POSTGRES peer some_pg_peer was invalidated: failed to create connection: failed to connect to `user=me database=testDb`:\n\t172.31.40.46:6005 (dblab.example.org): tls error: server refused TLS connection\n\t172.31.40.46:6005 (dblab.example.org): failed SASL auth: FATAL: password authentication failed for user \"me\" (SQLSTATE 28P01)", "status": "FAILED" }, "msg": "OK (426 bytes)", "status": 200, "url": "http://localhost:3000/api/v1/peers/create", } -
PeerDB seems unable to connect to peers which
hostparameter islocalhostor127.0.0.1, but can connect to the IP address of the system running the service (e.g.,192.168.1.10).
This is most likely a Docker-related issue.$ docker run --rm --name 'postgres' -d -p '10000:5432' -e POSTGRES_PASSWORD='password' 'postgres:15.5' 1cb9d450f1c1112601022dec4315a4dac7f564ee67760788850e4f61a8b5d8fb $ psql 'host=localhost port=10000 user=postgres password=password' -c '\conninfo' You are connected to database "postgres" as user "postgres" on host "localhost" (address "127.0.0.1") at port "10000". $ psql 'host=192.168.1.10 port=10000 user=postgres password=password' -c '\conninfo' You are connected to database "postgres" as user "postgres" on host "192.168.1.10" at port "10000". $ psql 'host=localhost port=9900 user=me password=peerdb' psql (15.8, server 14) Type "help" for help. me=> CREATE PEER IF NOT EXISTS some_pg_peer FROM POSTGRES WITH (host='localhost', port='10000', user='postgres', password='password', database='postgres'); ERROR: User provided error: ErrorInfo: ERROR, internal_error, failed to create peer: POSTGRES peer some_pg_peer was invalidated: failed to create connection: failed to connect to `user=postgres database=postgres`: 127.0.0.1:10000 (localhost): dial error: dial tcp 127.0.0.1:10000: connect: connection refused [::1]:10000 (localhost): dial error: dial tcp [::1]:10000: connect: cannot assign requested address 127.0.0.1:10000 (localhost): dial error: dial tcp 127.0.0.1:10000: connect: connection refused [::1]:10000 (localhost): dial error: dial tcp [::1]:10000: connect: cannot assign requested address me=> CREATE PEER IF NOT EXISTS some_pg_peer FROM POSTGRES WITH (host='192.168.1.10', port='10000', user='postgres', password='password', database='postgres'); OK -
PostgreSQL peers do not accept connection options as of 2025-03-19.
This makes it impossible to specify any or override defaults.The connection string is composed in code.
The data structure specifying its parameters does not accept options, nor explicit connection strings.// https://github.com/PeerDB-io/peerdb/blob/6a591128908cbd76df8f7e4094ec838fac08dcda/protos/peers.proto#L73 message PostgresConfig { string host = 1; uint32 port = 2; string user = 3; string password = 4 [(peerdb_redacted) = true]; string database = 5; // defaults to _peerdb_internal optional string metadata_schema = 7; optional SSHConfig ssh_config = 8; } -
PostgreSQL peers have issues connecting to DBLab clones as of 2025-03-19.
Peers seemingly require SSL to connect to them for some reason, or fail the password authentication when given the correct credentials.$ nc -vz dblab.example.org 6005 Ncat: Version 7.93 ( https://nmap.org/ncat ) Ncat: Connected to 172.31.40.46:6005. Ncat: 0 bytes sent, 0 bytes received in 0.04 seconds. $ psql 'postgresql://dblab.example.org:6005/testDb?user=me&password=1q2w3e4r' -c '\conninfo' You are connected to database "testDb" as user "me" on host "dblab.example.org" (address "172.31.40.46") at port "6005". $ psql 'host=localhost port=9900 password=peerdb' psql (15.8, server 14) Type "help" for help. me=> CREATE PEER IF NOT EXISTS some_pg_peer FROM POSTGRES WITH (host='dblab.example.org', port='6005', user='me', password='1q2w3e4r', database='testDb'); ERROR: User provided error: ErrorInfo: ERROR, internal_error, failed to create peer: POSTGRES peer some_pg_peer was invalidated: failed to create connection: failed to connect to `user=me database=testDb`: 172.31.40.46:6005 (dblab.example.org): tls error: server refused TLS connection 172.31.40.46:6005 (dblab.example.org): failed SASL auth: FATAL: password authentication failed for user "me" (SQLSTATE 28P01) -
SQL mode is provided by a translation service, which intercepts the
CREATE PEER(or other resource) command and uses it to create the correct resources in the PostgreSQL backend.
The translator does not expose all the resources (e.g., I could find no alert configuration), nor allows for easy updates (e.g. the peers and mirrors data is encoded).
The data for peers and mirrors is encoded in ways that are not disclosed in the documentation. -
Newly created mirrors will start replication right away.
Unless explicitly specified in their definition, this usually means taking an initial snapshot of the mapped tables from the source peer. -
When in the
snapshotstate, mirrors cannot be paused.
If stopped (like stopping, restarting, or killing the container), it will break and will need to be restarted. -
Paused mirrors using PostgreSQL peers as source will not consume the logical replication's transaction log, which will blow up in size (depending on the number of changes made to the source DB).
-
When creating alerts through the APIs, the alert's ID in the request's data must be
-1.
This will create duplicates.