docs(peerdb): expand mirrors

2026-02-08 21:34:25 +00:00 · 2026-02-07 13:36:57 +01:00
parent c96e15dfab
commit 234ea42eb3
2 changed files with 54 additions and 24 deletions
--- a/.markdownlint.yaml
+++ b/.markdownlint.yaml
@@ -8,6 +8,8 @@ MD013:  # line-length
  tables: false
  code_blocks: false
  severity: warning
 MD028:  # no-blanks-blockquote
  false
 MD033:  # no-inline-html
  allowed_elements:
    - b
--- a/base/peerdb.md
+++ b/base/peerdb.md
@@ -108,8 +108,35 @@ Mirrors can be in the following states:
 | Terminated | `STATUS_TERMINATED` | The mirror has been deleted/terminated                                                                |
 | Unknown    | `STATUS_UNKNOWN`    | The mirror is not found in PeerDB's catalog, or its status cannot be obtained due to some other issue |
 Existing mirrors can be edited, but **must** be paused beforehand.<br/>
 There are checks in place that will make operations that would edit a mirror fail, if the mirror is not in the
 `STATUS_PAUSED` state.
 > [!warning]
 > Once a mirror is created, one **will not** be able to change some of the mirror's settings, like whether to do an
 > initial snapshot or not.
 >
 > Some other parameters, like the number of tables or max workers for initial snapshots, will **only** be configurable
 > via the API (and **not** via the UI).
 Mirrors using _PostgreSQL_ peers as sources create [replication slots] in the source DB to get changes from.
 During mirrors' initial snapshots, PeerDB creates at least one worker per table (`snapshot_max_parallel_workers` times
 `snapshot_num_tables_in_parallel`).
 > [!caution]
 > Newly created mirrors **will start replication right away**.\
 > This _usually_ means taking a snapshot of the tables from the source. While in this state, a mirror **cannot be
 > paused**.
 > [!note]
 > It looks like partitions are allocated to workers at the start of the process, which results in a slow worker lagging
 > behind while the rest of the workers for that table already finished.
 > [!tip]
 > When dealing with lots of data, prefer starting by adding tables one at a time (with the bigger tables first), then
 > add them in bigger and bigger batches.
 Operations:
 <details style="padding: 0 0 0 1rem">
@@ -124,30 +151,6 @@ GET /api/v1/mirrors/list
 <details style="padding: 0 0 0 1rem">
  <summary>Create</summary>
 | Field                                         | Type            | Required | Default              | Notes                                            |
 | --------------------------------------------- | --------------- | -------- | -------------------- | ------------------------------------------------ |
 | `flow_job_name`                               | string          | yes      |                      | name of the mirror                               |
 | `source_name`                                 | string          | yes      |                      | name of the source peer                          |
 | `destination_name`                            | string          | yes      |                      | name of the destination peer                     |
 | `table_mappings`                              | array           | yes      |                      |                                                  |
 | `table_mappings.source_table_identifier`      | string          | yes      |                      | source schema and table                          |
 | `table_mappings.destination_table_identifier` | string          | yes      |                      | destination schema and table                     |
 | `table_mappings.exclude`                      | list of strings | no       | []                   | columns excluded from the sync                   |
 | `table_mappings.columns`                      | list of objects | no       | []                   | ordering setting; for ClickHouse only            |
 | `table_mappings.columns.name`                 | string          | yes      |                      | name of the column                               |
 | `table_mappings.columns.ordering`             | number          | yes      |                      | rank of the column                               |
 | `idle_timeout_seconds`                        | number          | no       | 60                   |                                                  |
 | `publication_name`                            | string          | no       |                      | will be created if not provided                  |
 | `max_batch_size`                              | number          | no       | 1000000              |                                                  |
 | `do_initial_snapshot`                         | boolean         | yes      |                      |                                                  |
 | `snapshot_num_rows_per_partition`             | number          | no       | 1000000              | only used for the initial snapshot               |
 | `snapshot_max_parallel_workers`               | number          | no       | 4                    | only used for the initial snapshot               |
 | `snapshot_num_tables_in_parallel`             | number          | no       | 1                    | only used for the initial snapshot               |
 | `resync`                                      | boolean         | no       | false                | the mirror **must be dropped** before re-syncing |
 | `initial_snapshot_only`                       | boolean         | no       | false                |                                                  |
 | `soft_delete_col_name`                        | string          | no       | `_PEERDB_IS_DELETED` |                                                  |
 | `synced_at_col_name`                          | string          | no       | `_PEERDB_SYNCED_AT`  |                                                  |
 ```sql
 CREATE MIRROR IF NOT EXISTS some_cdc_mirror
 FROM main_pg TO snowflake_prod  -- FROM source_peer TO target_peer
@@ -192,6 +195,30 @@ POST /api/v1/flows/cdc/create
 }'
 ```
 | Field                                         | Type            | Required | Default              | Notes                                            |
 | --------------------------------------------- | --------------- | -------- | -------------------- | ------------------------------------------------ |
 | `flow_job_name`                               | string          | yes      |                      | name of the mirror                               |
 | `source_name`                                 | string          | yes      |                      | name of the source peer                          |
 | `destination_name`                            | string          | yes      |                      | name of the destination peer                     |
 | `table_mappings`                              | array           | yes      |                      |                                                  |
 | `table_mappings.source_table_identifier`      | string          | yes      |                      | source schema and table                          |
 | `table_mappings.destination_table_identifier` | string          | yes      |                      | destination schema and table                     |
 | `table_mappings.exclude`                      | list of strings | no       | []                   | columns excluded from the sync                   |
 | `table_mappings.columns`                      | list of objects | no       | []                   | ordering setting; for ClickHouse only            |
 | `table_mappings.columns.name`                 | string          | yes      |                      | name of the column                               |
 | `table_mappings.columns.ordering`             | number          | yes      |                      | rank of the column                               |
 | `idle_timeout_seconds`                        | number          | no       | 60                   |                                                  |
 | `publication_name`                            | string          | no       |                      | will be created if not provided                  |
 | `max_batch_size`                              | number          | no       | 1000000              |                                                  |
 | `do_initial_snapshot`                         | boolean         | yes      |                      |                                                  |
 | `snapshot_num_rows_per_partition`             | number          | no       | 1000000              | only used for the initial snapshot               |
 | `snapshot_max_parallel_workers`               | number          | no       | 4                    | only used for the initial snapshot               |
 | `snapshot_num_tables_in_parallel`             | number          | no       | 1                    | only used for the initial snapshot               |
 | `resync`                                      | boolean         | no       | false                | the mirror **must be dropped** before re-syncing |
 | `initial_snapshot_only`                       | boolean         | no       | false                |                                                  |
 | `soft_delete_col_name`                        | string          | no       | `_PEERDB_IS_DELETED` |                                                  |
 | `synced_at_col_name`                          | string          | no       | `_PEERDB_SYNCED_AT`  |                                                  |
 </details>
 <details style="padding: 0 0 0 1rem">
@@ -347,6 +374,7 @@ WITH (
 | Kafka      | `9`                   | `kafka_config`          |
 | PostgreSQL | `3` or `'POSTGRES'`   | `postgres_config`       |
 > [!note]
 > The optional `"allow_update": true` attribute in the API seems to do **absolutely nothing** as of the time of writing.
 ```plaintext