From 234ea42eb303bbc01af60a8afa3d7c97e094b257 Mon Sep 17 00:00:00 2001 From: Michele Cereda Date: Sat, 7 Feb 2026 13:36:57 +0100 Subject: [PATCH] docs(peerdb): expand mirrors --- .markdownlint.yaml | 2 ++ knowledge base/peerdb.md | 76 +++++++++++++++++++++++++++------------- 2 files changed, 54 insertions(+), 24 deletions(-) diff --git a/.markdownlint.yaml b/.markdownlint.yaml index a846295..fc4a8d8 100644 --- a/.markdownlint.yaml +++ b/.markdownlint.yaml @@ -8,6 +8,8 @@ MD013: # line-length tables: false code_blocks: false severity: warning +MD028: # no-blanks-blockquote + false MD033: # no-inline-html allowed_elements: - b diff --git a/knowledge base/peerdb.md b/knowledge base/peerdb.md index 220360a..e5484c4 100644 --- a/knowledge base/peerdb.md +++ b/knowledge base/peerdb.md @@ -108,8 +108,35 @@ Mirrors can be in the following states: | Terminated | `STATUS_TERMINATED` | The mirror has been deleted/terminated | | Unknown | `STATUS_UNKNOWN` | The mirror is not found in PeerDB's catalog, or its status cannot be obtained due to some other issue | +Existing mirrors can be edited, but **must** be paused beforehand.
+There are checks in place that will make operations that would edit a mirror fail, if the mirror is not in the +`STATUS_PAUSED` state. + +> [!warning] +> Once a mirror is created, one **will not** be able to change some of the mirror's settings, like whether to do an +> initial snapshot or not. +> +> Some other parameters, like the number of tables or max workers for initial snapshots, will **only** be configurable +> via the API (and **not** via the UI). + Mirrors using _PostgreSQL_ peers as sources create [replication slots] in the source DB to get changes from. +During mirrors' initial snapshots, PeerDB creates at least one worker per table (`snapshot_max_parallel_workers` times +`snapshot_num_tables_in_parallel`). + +> [!caution] +> Newly created mirrors **will start replication right away**.\ +> This _usually_ means taking a snapshot of the tables from the source. While in this state, a mirror **cannot be +> paused**. + +> [!note] +> It looks like partitions are allocated to workers at the start of the process, which results in a slow worker lagging +> behind while the rest of the workers for that table already finished. + +> [!tip] +> When dealing with lots of data, prefer starting by adding tables one at a time (with the bigger tables first), then +> add them in bigger and bigger batches. + Operations:
@@ -124,30 +151,6 @@ GET /api/v1/mirrors/list
Create -| Field | Type | Required | Default | Notes | -| --------------------------------------------- | --------------- | -------- | -------------------- | ------------------------------------------------ | -| `flow_job_name` | string | yes | | name of the mirror | -| `source_name` | string | yes | | name of the source peer | -| `destination_name` | string | yes | | name of the destination peer | -| `table_mappings` | array | yes | | | -| `table_mappings.source_table_identifier` | string | yes | | source schema and table | -| `table_mappings.destination_table_identifier` | string | yes | | destination schema and table | -| `table_mappings.exclude` | list of strings | no | [] | columns excluded from the sync | -| `table_mappings.columns` | list of objects | no | [] | ordering setting; for ClickHouse only | -| `table_mappings.columns.name` | string | yes | | name of the column | -| `table_mappings.columns.ordering` | number | yes | | rank of the column | -| `idle_timeout_seconds` | number | no | 60 | | -| `publication_name` | string | no | | will be created if not provided | -| `max_batch_size` | number | no | 1000000 | | -| `do_initial_snapshot` | boolean | yes | | | -| `snapshot_num_rows_per_partition` | number | no | 1000000 | only used for the initial snapshot | -| `snapshot_max_parallel_workers` | number | no | 4 | only used for the initial snapshot | -| `snapshot_num_tables_in_parallel` | number | no | 1 | only used for the initial snapshot | -| `resync` | boolean | no | false | the mirror **must be dropped** before re-syncing | -| `initial_snapshot_only` | boolean | no | false | | -| `soft_delete_col_name` | string | no | `_PEERDB_IS_DELETED` | | -| `synced_at_col_name` | string | no | `_PEERDB_SYNCED_AT` | | - ```sql CREATE MIRROR IF NOT EXISTS some_cdc_mirror FROM main_pg TO snowflake_prod -- FROM source_peer TO target_peer @@ -192,6 +195,30 @@ POST /api/v1/flows/cdc/create }' ``` +| Field | Type | Required | Default | Notes | +| --------------------------------------------- | --------------- | -------- | -------------------- | ------------------------------------------------ | +| `flow_job_name` | string | yes | | name of the mirror | +| `source_name` | string | yes | | name of the source peer | +| `destination_name` | string | yes | | name of the destination peer | +| `table_mappings` | array | yes | | | +| `table_mappings.source_table_identifier` | string | yes | | source schema and table | +| `table_mappings.destination_table_identifier` | string | yes | | destination schema and table | +| `table_mappings.exclude` | list of strings | no | [] | columns excluded from the sync | +| `table_mappings.columns` | list of objects | no | [] | ordering setting; for ClickHouse only | +| `table_mappings.columns.name` | string | yes | | name of the column | +| `table_mappings.columns.ordering` | number | yes | | rank of the column | +| `idle_timeout_seconds` | number | no | 60 | | +| `publication_name` | string | no | | will be created if not provided | +| `max_batch_size` | number | no | 1000000 | | +| `do_initial_snapshot` | boolean | yes | | | +| `snapshot_num_rows_per_partition` | number | no | 1000000 | only used for the initial snapshot | +| `snapshot_max_parallel_workers` | number | no | 4 | only used for the initial snapshot | +| `snapshot_num_tables_in_parallel` | number | no | 1 | only used for the initial snapshot | +| `resync` | boolean | no | false | the mirror **must be dropped** before re-syncing | +| `initial_snapshot_only` | boolean | no | false | | +| `soft_delete_col_name` | string | no | `_PEERDB_IS_DELETED` | | +| `synced_at_col_name` | string | no | `_PEERDB_SYNCED_AT` | | +
@@ -347,6 +374,7 @@ WITH ( | Kafka | `9` | `kafka_config` | | PostgreSQL | `3` or `'POSTGRES'` | `postgres_config` | +> [!note] > The optional `"allow_update": true` attribute in the API seems to do **absolutely nothing** as of the time of writing. ```plaintext