Files
oam/knowledge base/peerdb.md
2025-04-10 21:16:36 +02:00

18 KiB

PeerDB

Fast, simple, and cost effective Postgres replication.

  1. TL;DR
  2. Peers
  3. Mirrors
  4. Alerts
  5. Gotchas
  6. Further readings
    1. Sources

TL;DR

Glossary
Term Summary
Peer Connection to a database that PeerDB can query
Mirror Stream of changes, feed in real-time, from a source peer to a target peer
Alert Notifications about issues in flows
Setup
git clone 'https://github.com/PeerDB-io/peerdb.git' \
&& docker compose -f 'peerdb/docker-compose.yml' up -d
Usage
# Connect in SQL mode.
psql 'host=localhost port=9900 password=peerdb'
psql 'postgresql://peerdb.example.org:9900/?password=peerdb'

# Use the REST APIs.
curl -fsS --url 'http://localhost:3000/api/v1/peers/list' --request 'GET' \
  --header "Authorization: Basic $(printf '%s' ':' 'your password here' | base64)"
curl -fsS --url 'http://localhost:3000/api/v1/peers/create' --request 'POST' \
  --header "Authorization: Basic $(printf '%s' ':' 'your password here' | base64)" \
  --header 'Content-Type: application/json' \
  --data '{ … }'
Real world use cases
# List peers.
psql "host=localhost port=9900 password=$(gopass show -o 'peerdb/instance')" -c "SELECT id, name, type FROM peers;"
curl -fsS --url 'http://localhost:3000/api/v1/peers/list' \
  -H "Authorization: Basic $(gopass show -o 'peerdb/instance' | xargs printf '%s' ':' | base64)"

Peers

Peers are connection settings to databases that PeerDB can operate upon.

Source PostgreSQL peers require logical replication to be enabled.

-- Check settings
sourceDb=> SELECT name,setting FROM pg_settings WHERE name IN ('wal_level','rds.logical_replication');
          name           | setting
-------------------------+---------
 rds.logical_replication | on
 wal_level               | logical
(2 rows)
-- Configure sources
ALTER SYSTEM SET wal_level = logical;
ALTER SYSTEM SET max_wal_senders = 10;
ALTER SYSTEM SET max_replication_slots = 10;

Operations:

List
SELECT id, name, type FROM peers;
GET /api/v1/peers/list
Create or update
CREATE PEER IF NOT EXISTS some_postgresql_peer
FROM POSTGRES
WITH (
  host='pg.example.org',
  port='5432',
  database='postgres',
  user='postgres',
  password='password'
);
Peer type peer.type attribute Configuration attribute
ClickHouse 8 clickhouse_config
Kafka 9 kafka_config
PostgreSQL 3 or 'POSTGRES' postgres_config

The optional "allow_update": true attribute in the API seems to do absolutely nothing as of the time of writing.

POST /api/v1/peers/create
{
  "allow_update": true,
  "peer": {
    "name": "some_postgresql_peer",
    "type": "POSTGRES",
    "postgres_config": {
      "host": "pg.example.org",
      "port": "5432",
      "database": "postgres",
      "user": "postgres",
      "password": "password"
    }
  }
}
Delete
DELETE FROM peers WHERE name == 'some_postgresql_peer';

Mirrors

Mirrors can be in the following states:

State Returned string Description
Setup STATUS_SETUP The mirror is creating the target tables and metadata tables
Snapshot STATUS_SNAPSHOT The mirror is currently performing the initial snapshot of the tables defined in the mapping
Running STATUS_RUNNING The mirror has completed the initial snapshot, and is in its CDC phase
Pausing STATUS_PAUSING The mirror is in its CDC phase, and is in the process of pausing
Paused STATUS_PAUSED The mirror is in its CDC phase, and is paused
Terminated STATUS_TERMINATED The mirror has been deleted/terminated
Unknown STATUS_UNKNOWN The mirror is not found in PeerDB's catalog, or its status cannot be obtained due to some other issue

Mirrors using PostgreSQL peers as sources create replication slots in the source DB to get changes from.

Operations:

List
GET /api/v1/mirrors/list
Create
Field Type Required Default Notes
flow_job_name string yes name of the mirror
source_name string yes name of the source peer
destination_name string yes name of the destination peer
table_mappings array yes
table_mappings.source_table_identifier string yes source schema and table
table_mappings.destination_table_identifier string yes destination schema and table
table_mappings.exclude list of strings no [] columns excluded from the sync
table_mappings.columns list of objects no [] ordering setting; for ClickHouse only
table_mappings.columns.name string yes name of the column
table_mappings.columns.ordering number yes rank of the column
idle_timeout_seconds number no 60
publication_name string no will be created if not provided
max_batch_size number no 1000000
do_initial_snapshot boolean yes
snapshot_num_rows_per_partition number no 1000000 only used for the initial snapshot
snapshot_max_parallel_workers number no 4 only used for the initial snapshot
snapshot_num_tables_in_parallel number no 1 only used for the initial snapshot
resync boolean no false the mirror must be dropped before re-syncing
initial_snapshot_only boolean no false
soft_delete_col_name string no _PEERDB_IS_DELETED
synced_at_col_name string no _PEERDB_SYNCED_AT
CREATE MIRROR IF NOT EXISTS some_cdc_mirror
FROM main_pg TO snowflake_prod  -- FROM source_peer TO target_peer
WITH TABLE MAPPING
(
  public.regions:main_pg.regions,  -- source_schema.table:target_schema.table
  {
    from: public.countries,  -- source_schema.table
    to: main_pg.countries,   -- target_schema.table
    exclude: [ local_name, size,  ]  -- column_1, …, column_N
  },
  
)
WITH ( do_initial_copy = true );
POST /api/v1/flows/cdc/create
{
  "connection_configs": {
    "flow_job_name": "some_cdc_mirror",
    "source_name": "main_pg",
    "destination_name": "snowflake_prod",
    "do_initial_snapshot": true,
    "table_mappings": [
      {
        "source_table_identifier": "public.regions",
        "destination_table_identifier": "main_pg.regions"
      },
      {
        "source_table_identifier": "public.countries",
        "destination_table_identifier": "main_pg.countries",
        "exclude": [
          "local_name",
          "size",
          …
        ]
      },
      …
    ]
  }
}'
Get status
POST /api/v1/mirrors/status
{
  "flowJobName": "some_cdc_mirror"
}
Show configuration
POST /api/v1/mirrors/status
{
  "flowJobName": "some_cdc_mirror",
  "includeFlowInfo": true
}

Alerts

Operations:

Create
POST /api/v1/alerts/config
{
  "config": {
    "id": -1,
    "service_type": "slack",
    "service_config": "{\"slot_lag_mb_alert_threshold\":15000,\"open_connections_alert_threshold\":20,\"auth_token\":\"xoxb-012345678901-0123456789012-1234ABcdEFGhijKLMnopQRST\",\"channel_ids\":[\"C01K23X4567\"]}",
    "alert_for_mirrors": [
      "some_cdc_mirror",
      "some_other_mirror"
    ]
  }
}
Show configuration
GET /api/v1/alerts/config

Gotchas

  • The documentation is sorely lacking.

  • The product appears to have not been designed with configuration automation via IaC (nor APIs in general) in mind.

  • The API proven unreliable, non-idempotent, or plain did not work as described in the API Reference as of 2025-03-19.
    E.g., the allow_update: true data parameter in a request to the peers/create endpoint should make the APIs update a peer's settings when one with the given name already exists, but the peer just does not get updated.

  • API responses hide error messages behind a 200 OK HTTP status code as of 2025-03-19.

    Response example

    Output of a ansible.builtin.uri Ansible task executed against the PeerDB server:

    {
      "json": {
        "message": "POSTGRES peer some_pg_peer was invalidated: failed to create connection: failed to connect to `user=me database=testDb`:\n\t172.31.40.46:6005 (dblab.example.org): tls error: server refused TLS connection\n\t172.31.40.46:6005 (dblab.example.org): failed SASL auth: FATAL: password authentication failed for user \"me\" (SQLSTATE 28P01)",
        "status": "FAILED"
      },
      "msg": "OK (426 bytes)",
      "status": 200,
      "url": "http://localhost:3000/api/v1/peers/create",
    }
    
  • PeerDB seems unable to connect to peers which host parameter is localhost or 127.0.0.1, but can connect to the IP address of the system running the service (e.g., 192.168.1.10).
    This is most likely a Docker-related issue.

    $ docker run --rm --name 'postgres' -d -p '10000:5432' -e POSTGRES_PASSWORD='password' 'postgres:15.5'
    1cb9d450f1c1112601022dec4315a4dac7f564ee67760788850e4f61a8b5d8fb
    
    $ psql 'host=localhost port=10000 user=postgres password=password' -c '\conninfo'
    You are connected to database "postgres" as user "postgres" on host "localhost" (address "127.0.0.1") at port "10000".
    
    $ psql 'host=192.168.1.10 port=10000 user=postgres password=password' -c '\conninfo'
    You are connected to database "postgres" as user "postgres" on host "192.168.1.10" at port "10000".
    
    $ psql 'host=localhost port=9900 user=me password=peerdb'
    psql (15.8, server 14)
    Type "help" for help.
    me=> CREATE PEER IF NOT EXISTS some_pg_peer FROM POSTGRES WITH (host='localhost', port='10000', user='postgres', password='password', database='postgres');
    ERROR:  User provided error: ErrorInfo: ERROR, internal_error, failed to create peer: POSTGRES peer some_pg_peer was invalidated: failed to create connection: failed to connect to `user=postgres database=postgres`:
            127.0.0.1:10000 (localhost): dial error: dial tcp 127.0.0.1:10000: connect: connection refused
            [::1]:10000 (localhost): dial error: dial tcp [::1]:10000: connect: cannot assign requested address
            127.0.0.1:10000 (localhost): dial error: dial tcp 127.0.0.1:10000: connect: connection refused
            [::1]:10000 (localhost): dial error: dial tcp [::1]:10000: connect: cannot assign requested address
    me=> CREATE PEER IF NOT EXISTS some_pg_peer FROM POSTGRES WITH (host='192.168.1.10', port='10000', user='postgres', password='password', database='postgres');
    OK
    
  • PostgreSQL peers do not accept connection options as of 2025-03-19.
    This makes it impossible to specify any or override defaults.

    The connection string is composed in code.
    The data structure specifying its parameters does not accept options, nor explicit connection strings.

    // https://github.com/PeerDB-io/peerdb/blob/6a591128908cbd76df8f7e4094ec838fac08dcda/protos/peers.proto#L73
    message PostgresConfig {
      string host = 1;
      uint32 port = 2;
      string user = 3;
      string password = 4 [(peerdb_redacted) = true];
      string database = 5;
      // defaults to _peerdb_internal
      optional string metadata_schema = 7;
      optional SSHConfig ssh_config = 8;
    }
    
  • PostgreSQL peers have issues connecting to DBLab clones as of 2025-03-19.
    Peers seemingly require SSL to connect to them for some reason, or fail the password authentication when given the correct credentials.

    $ nc -vz dblab.example.org 6005
    Ncat: Version 7.93 ( https://nmap.org/ncat )
    Ncat: Connected to 172.31.40.46:6005.
    Ncat: 0 bytes sent, 0 bytes received in 0.04 seconds.
    
    $ psql 'postgresql://dblab.example.org:6005/testDb?user=me&password=1q2w3e4r' -c '\conninfo'
    You are connected to database "testDb" as user "me" on host "dblab.example.org" (address "172.31.40.46") at port "6005".
    
    $ psql 'host=localhost port=9900 password=peerdb'
    psql (15.8, server 14)
    Type "help" for help.
    
    me=> CREATE PEER IF NOT EXISTS some_pg_peer FROM POSTGRES WITH (host='dblab.example.org', port='6005', user='me', password='1q2w3e4r', database='testDb');
    ERROR:  User provided error: ErrorInfo: ERROR, internal_error, failed to create peer: POSTGRES peer some_pg_peer was invalidated: failed to create connection: failed to connect to `user=me database=testDb`:
            172.31.40.46:6005 (dblab.example.org): tls error: server refused TLS connection
            172.31.40.46:6005 (dblab.example.org): failed SASL auth: FATAL: password authentication failed for user "me" (SQLSTATE 28P01)
    
  • SQL mode is provided by a translation service, which intercepts the CREATE PEER (or other resource) command and uses it to create the correct resources in the PostgreSQL backend.
    The translator does not expose all the resources (e.g., I could find no alert configuration), nor allows for easy updates (e.g. the peers and mirrors data is encoded).
    The data for peers and mirrors is encoded in ways that are not disclosed in the documentation.

  • Newly created mirrors will start replication right away.
    Unless explicitly specified in their definition, this usually means taking an initial snapshot of the mapped tables from the source peer.

  • When in the snapshot state, mirrors cannot be paused.
    If stopped (like stopping, restarting, or killing the container), it will break and will need to be restarted.

  • Paused mirrors using PostgreSQL peers as source will not consume the logical replication's transaction log, which will blow up in size (depending on the number of changes made to the source DB).

  • When creating alerts through the APIs, the alert's ID in the request's data must be -1.
    This will create duplicates.

Further readings

Sources