refactor(kb/postgres): group related articles into a dedicated folder

This commit is contained in:
Michele Cereda
2025-08-29 09:51:52 +02:00
parent 57a04c57fc
commit f24d6e5d8a
11 changed files with 247 additions and 15 deletions

View File

@@ -0,0 +1,427 @@
# PostgreSQL
1. [TL;DR](#tldr)
1. [Functions](#functions)
1. [Backup](#backup)
1. [Restore](#restore)
1. [Extensions of interest](#extensions-of-interest)
1. [PostGIS](#postgis)
1. [`postgresql_anonymizer`](#postgresql_anonymizer)
1. [Make it distributed](#make-it-distributed)
1. [Further readings](#further-readings)
1. [Sources](#sources)
## TL;DR
One can store one's credentials in the `~/.pgpass` file.
<details style='padding: 0 0 1rem 1rem'>
```plaintext
# line format => hostname:port:database:username:password`
# can use wildcards
postgres.lan:5643:postgres:postgres:BananaORama
*:*:sales:elaine:modestPassword
```
> [!important]
> The credentials file's permissions must be `0600`, or it will be ignored.
</details>
Database roles represent **both** users and groups.<br/>
Roles are **distinct** from the OS' users and groups, and are global across the whole installation (there are **no**
DB-specific roles).
Extensions in PostgreSQL are managed **per database**.
Prefer using [pg_dumpall] to create **logical** backups.<br/>
Consider using [pgBackRest] to create **physical** backups.
Consider using the [Percona toolkit] to ease management.
<details>
<summary>Setup</summary>
```sh
# Installation.
brew install 'postgresql@16'
sudo dnf install 'postgresql' 'postgresql-server'
sudo zypper install 'postgresql15' 'postgresql15-server'
# Set the password in environment variables.
export PGPASSWORD='securePassword'
# Set up the credentials file.
cat <<EOF > ~/'.pgpass'
postgres.lan:5643:postgres:postgres:BananaORama
*:*:sales:elaine:modestPassword
EOF
chmod '600' ~/'.pgpass'
# Set up the per-user services file.
# do *not* use spaces around the '=' sign.
cat <<EOF > ~/'.pg_service.conf'
[prod]
host=prod.0123456789ab.eu-west-1.rds.amazonaws.com
port=5433
user=master
EOF
```
</details>
<details>
<summary>Usage</summary>
```sh
# Connect to servers via CLI client.
# If not given:
# - the hostname defaults to 'localhost';
# - the port defaults to '5432';
# - the username defaults to the current user;
# - the 'sslmode' parameter defaults to 'prefer'.
psql 'my-db'
psql 'my-db' 'user'
psql 'postgres://host'
psql 'postgresql://host:5433/my-db?sslmode=require'
psql -U 'username' -d 'my-db' -h 'hostname' -p 'port' -W
psql --host 'host.fqnd' --port '5432' --username 'postgres' --database 'postgres' --password
psql "service=prod sslmode=disable"
# List available databases.
psql … --list
# Execute commands.
psql 'my-db' … -c 'select * from tableName;' -o 'out.file'
psql 'my-db' … -c 'select * from tableName;' -H
psql 'my-db' … -f 'commands.sql'
# Initialize a test DB.
pgbench -i 'test-db'
pgbench -i 'test-db' -h 'hostname' -p '5555' -U 'user'
# Create full backups of databases.
pg_dump -U 'postgres' -d 'sales' -F 'custom' -f 'sales.bak'
pg_dump --host 'host.fqnd' --port '5432' --username 'postgres' --dbname 'postgres' --password --schema-only
pg_dump … -T 'customers,orders' -t 'salespeople,performances'
pg_dump … -s --format 'custom'
# Dump users and groups to file
pg_dumpall -h 'host.fqnd' -p '5432' -U 'postgres' -l 'postgres' -W --roles-only --file 'roles.sql'
pg_dumpall -h 'host.fqnd' -p '5432' -U 'postgres' -l 'postgres' -Wrf 'roles.sql' --no-role-passwords
# Restore backups.
pg_restore -U 'postgres' -d 'sales' 'sales.bak'
# Execute commands from file
# E.g., restore from dump
psql -h 'host.fqnd' -U 'postgres' -d 'postgres' -W -f 'dump.sql' -e
# Generate scram-sha-256 hashes using only tools from PostgreSQL.
# Requires to actually create and delete users.
createuser 'dummyuser' -e --pwprompt && dropuser 'dummyuser'
# Generate scram-sha-256 hashes.
# Leverage https://github.com/supercaracal/scram-sha-256
scram-sha-256 'mySecretPassword'
```
```sql
-- Load extensions from the underlying operating system
-- They must be already installed on the instance
ALTER SYSTEM SET shared_preload_libraries = 'anon';
ALTER DATABASE postgres SET session_preload_libraries = 'anon';
```
</details>
Also see [yugabyte/yugabyte-db] for a distributed, PostgreSQL-like DBMS.
## Functions
Refer [CREATE FUNCTION].
```sql
CREATE OR REPLACE FUNCTION just_return_1() RETURNS integer
LANGUAGE SQL
RETURN 1;
SELECT just_return_1();
```
```sql
CREATE OR REPLACE FUNCTION increment(i integer) RETURNS integer
AS $$
BEGIN
RETURN i + 1;
END;
$$
LANGUAGE plpgsql;
```
```sql
CREATE OR REPLACE FUNCTION entries_in_column(
table_name TEXT,
column_name TEXT
) RETURNS INTEGER
LANGUAGE plpgsql
AS $func$
DECLARE result INTEGER;
BEGIN
EXECUTE format('SELECT count(%s) FROM %s LIMIT 2', column_name, table_name) INTO result;
RETURN result;
END;
$func$;
SELECT * FROM entries_in_column('vendors','vendor_id');
```
## Backup
PostgreSQL offers the `pg_dump` and `pg_dumpall` native client utilities to dump databases to files.<br/>
They produce sets of SQL statements that can be executed to reproduce the original databases' object definitions and
table data.
These utilities are suitable when:
- The databases' size is less than 100 GB.<br/>
They tend to start giving issues for bigger databases.
- One plans to migrate the databases' metadata as well as the table data.
- There is a relatively large number of tables to migrate.
> [!important]
> These utilities work better when the database is taken offline (but do **not** require it).
Objects like roles, groups, tablespace and others are **not** dumped by `pg_dump`. It also only dumps a **single**
database per execution.<br/>
Use `pg_dumpall` to back up entire clusters and/or global objects like roles and tablespaces.
Dumps can be output as script or archive file formats.<br/>
Script dumps are plain-text files containing the SQL commands that would allow to reconstruct the dumped database to the
state it was in at the time it was saved.
The _custom_ format (`-F='c'`) and the _directory_ format (`-F='d'`) are the most flexible output file formats.<br/>
They allow for selection and reordering of archived items, support parallel restoration, and are compressed by default.
The directory format is the only format that supports parallel dumps.
```sh
# Dump single DBs
pg_dump --host 'host.fqnd' --port '5432' --username 'postgres' --dbname 'postgres' --password
pg_dump -h 'host.fqnd' -p '5432' -U 'admin' -d 'postgres' -W
pg_dump -U 'postgres' -d 'sales' -F 'custom' -f 'sales.bak' --schema-only
pg_dump … -T 'customers,orders' -t 'salespeople,performances'
pg_dump … -s --format 'custom'
pg_dump … -bF'd' --jobs '3'
# Dump DBs' schema only
pg_dump … --schema-only
# Dump only users and groups to file
pg_dumpall … --roles-only --file 'roles.sql'
pg_dumpall … -rf 'roles.sql' --no-role-passwords
# Dump roles and tablespace
pg_dumpall … --globals-only
pg_dumpall … -g --no-role-passwords
```
> [!important]
> Prefer separating command line options from their values via the `=` character than using a space.<br/>
> This prevents confusion and errors.
>
> <details style='padding: 0 0 0 1rem'>
>
> ```diff
> pg_dumpall --no-publications \
> - --format d --jobs 4 --exclude-schema archived --exclude-schema bi
> + --format='d' --jobs=4 --exclude-schema='archived' --exclude-schema='bi'
> ```
>
> </details>
A list of common backup tools can be found in the [PostgreSQL Wiki][wiki], in the [Backup][wiki backup] page.<br/>
For the _limited_™ experience accrued until now, the TL;DR is:
- Prefer [pg_dumpall], and eventually [pg_dump], for **logical** backups.<br/>
- Should one have **physical** access to the DB data directory (`$PGDATA`), consider using [pgBackRest] instead.
## Restore
PostgreSQL offers the `pg_restore` native client utility for restoration of databases from **logical** dumps.
Feed script dumps to `psql` to execute the commands in them and restore the data.
One can give archives created with `pg_dump` or `pg_dumpall` in one of the non-plain-text formats in input to
`pg_restore`. It issues the commands necessary to reconstruct the database to the state it was in at the time it was
saved.
The archive files allow `pg_restore` to be _somewhat_ selective about what it restores, or reorder the items prior to
being restored.
The archive files are designed to be portable across architectures.
> [!important]
> Executing a restore on an online database will probably introduce conflicts of some kind.
> It is very much suggested to take the target offline before restoring.
```sh
# Restore dumps
pg_restore … --dbname 'sales' 'sales.dump'
pg_restore … -d 'sales' -Oxj '8' 'sales.dump'
pg_restore … -d 'sales' --clean --if-exists 'sales.dump'
# Skip materialized views during a restore
pg_dump 'database' -Fc 'backup.dump'
pg_restore --list 'backup.dump' | sed -E '/[[:digit:]]+ VIEW/,+1d' > 'no-views.lst'
pg_restore -d 'database' --use-list 'no-views.lst' 'backup.dump'
# Only then, if needed, refresh the dump with the views
pg_restore --list 'backup.dump' | grep -E --after-context=1 '[[:digit:]]+ VIEW' | sed '/--/d' > 'only-views.lst'
pg_restore -d 'database' --use-list 'only-views.lst' 'backup.dump'
```
For the _limited_™ experience accrued until now, the TL;DR is:
- Prefer [pg_restore], and eventually [psql], for restoring **logical** dumps.<br/>
- Use the restore feature of the external tool used for the backup.
## Extensions of interest
### PostGIS
TODO
### `postgresql_anonymizer`
Extension to mask or replace personally identifiable information or other sensitive data in a DB.
Refer [`postgresql_anonymizer`][postgresql_anonymizer] and [An In-Depth Guide to Postgres Data Masking with Anonymizer].
Admins declare masking rules using the PostgreSQL Data Definition Language (DDL) and specify the anonymization strategy
inside each tables' definition.
<details>
<summary>Example</summary>
```sh
docker run --rm -d -e 'POSTGRES_PASSWORD=postgres' -p '6543:5432' 'registry.gitlab.com/dalibo/postgresql_anonymizer'
psql -h 'localhost' -p '6543' -U 'postgres' -d 'postgres' -W
```
```sql
=# SELECT * FROM people LIMIT 1;
id | firstname | lastname | phone
----+-----------+----------+------------
T1 | Sarah | Conor | 0609110911
-- 1. Activate the dynamic masking engine
=# CREATE EXTENSION IF NOT EXISTS anon CASCADE;
=# SELECT anon.start_dynamic_masking();
-- 2. Declare a masked user
=# CREATE ROLE skynet LOGIN PASSWORD 'skynet';
=# SECURITY LABEL FOR anon ON ROLE skynet IS 'MASKED';
-- 3. Declare masking rules
=# SECURITY LABEL FOR anon ON COLUMN people.lastname IS 'MASKED WITH FUNCTION anon.fake_last_name()';
=# SECURITY LABEL FOR anon ON COLUMN people.phone IS 'MASKED WITH FUNCTION anon.partial(phone,2,$$******$$,2)';
-- 4. Connect with the masked user and test masking
=# \connect - skynet
=# SELECT * FROM people LIMIT 1;
id | firstname | lastname | phone
----+-----------+----------+------------
T1 | Sarah | Morris | 06******11
```
</details>
## Make it distributed
Refer [How to Scale a Single-Server Database: A Guide to Distributed PostgreSQL].<br/>
See also [yugabyte/yugabyte-db].
## Further readings
- [SQL]
- [Docker image]
- [Bidirectional replication in PostgreSQL using pglogical]
- [What is the pg_dump command for backing up a PostgreSQL database?]
- [How to SCRAM in Postgres with pgBouncer]
- [`postgresql_anonymizer`][postgresql_anonymizer]
- [pgxn-manager]
- [dverite/postgresql-functions]
- [MySQL]
- [pg_flo]
- [pgAdmin]
- [How to Scale a Single-Server Database: A Guide to Distributed PostgreSQL]
- [yugabyte/yugabyte-db]
- [Logical Decoding Concepts]
### Sources
- [psql]
- [pg_settings]
- [Connect to a PostgreSQL database]
- [Database connection control functions]
- [The password file]
- [How to Generate SCRAM-SHA-256 to Create Postgres 13 User]
- [PostgreSQL: Get member roles and permissions]
- [An In-Depth Guide to Postgres Data Masking with Anonymizer]
- [Get count of records affected by INSERT or UPDATE in PostgreSQL]
- [How to write update function (stored procedure) in Postgresql?]
- [How to search a specific value in all tables (PostgreSQL)?]
- [PostgreSQL: Show all the privileges for a concrete user]
- [PostgreSQL - disabling constraints]
- [Hashing a String to a Numeric Value in PostgreSQL]
- [I replaced my entire tech stack with Postgres...]
- [What does GRANT USAGE ON SCHEMA do exactly?]
<!--
Reference
═╬═Time══
-->
<!-- Knowledge base -->
[mysql]: ../mysql.md
[Percona toolkit]: ../percona%20toolkit.md
[pg_dump]: pg_dump.md
[pg_dumpall]: pg_dumpall.md
[pg_flo]: pg_flo.md
[pg_restore]: pg_restore.md
[pgadmin]: pgadmin.md
[pgBackRest]: pgbackrest.md
[sql]: ../sql.md
<!-- Upstream -->
[create function]: https://www.postgresql.org/docs/current/sql-createfunction.html
[database connection control functions]: https://www.postgresql.org/docs/current/libpq-connect.html
[docker image]: https://github.com/docker-library/docs/blob/master/postgres/README.md
[logical decoding concepts]: https://www.postgresql.org/docs/current/logicaldecoding-explanation.html
[pg_settings]: https://www.postgresql.org/docs/current/view-pg-settings.html
[psql]: https://www.postgresql.org/docs/current/app-psql.html
[the password file]: https://www.postgresql.org/docs/current/libpq-pgpass.html
[wiki]: https://wiki.postgresql.org/wiki/
[wiki backup]: https://wiki.postgresql.org/wiki/Ecosystem:Backup
<!-- Others -->
[an in-depth guide to postgres data masking with anonymizer]: https://thelinuxcode.com/postgresql-anonymizer-data-masking/
[bidirectional replication in postgresql using pglogical]: https://www.jamesarmes.com/2023/03/bidirectional-replication-postgresql-pglogical.html
[connect to a postgresql database]: https://www.postgresqltutorial.com/connect-to-postgresql-database/
[dverite/postgresql-functions]: https://github.com/dverite/postgresql-functions
[get count of records affected by insert or update in postgresql]: https://stackoverflow.com/questions/4038616/get-count-of-records-affected-by-insert-or-update-in-postgresql#78459743
[hashing a string to a numeric value in postgresql]: https://stackoverflow.com/questions/9809381/hashing-a-string-to-a-numeric-value-in-postgresql#69650940
[how to generate scram-sha-256 to create postgres 13 user]: https://stackoverflow.com/questions/68400120/how-to-generate-scram-sha-256-to-create-postgres-13-user
[how to scale a single-server database: a guide to distributed postgresql]: https://www.yugabyte.com/postgresql/distributed-postgresql/
[how to scram in postgres with pgbouncer]: https://www.crunchydata.com/blog/pgbouncer-scram-authentication-postgresql
[how to search a specific value in all tables (postgresql)?]: https://stackoverflow.com/questions/5350088/how-to-search-a-specific-value-in-all-tables-postgresql/23036421#23036421
[how to write update function (stored procedure) in postgresql?]: https://stackoverflow.com/questions/21087710/how-to-write-update-function-stored-procedure-in-postgresql
[i replaced my entire tech stack with postgres...]: https://www.youtube.com/watch?v=3JW732GrMdg
[pgxn-manager]: https://github.com/pgxn/pgxn-manager
[postgresql - disabling constraints]: https://stackoverflow.com/questions/2679854/postgresql-disabling-constraints#2681413
[postgresql_anonymizer]: https://postgresql-anonymizer.readthedocs.io/en/stable/
[postgresql: get member roles and permissions]: https://www.cybertec-postgresql.com/en/postgresql-get-member-roles-and-permissions/
[postgresql: show all the privileges for a concrete user]: https://stackoverflow.com/questions/40759177/postgresql-show-all-the-privileges-for-a-concrete-user
[what does grant usage on schema do exactly?]: https://stackoverflow.com/questions/17338621/what-does-grant-usage-on-schema-do-exactly
[what is the pg_dump command for backing up a postgresql database?]: https://www.linkedin.com/advice/3/what-pgdump-command-backing-up-postgresql-ke2ef
[yugabyte/yugabyte-db]: https://github.com/yugabyte/yugabyte-db

View File

@@ -0,0 +1,74 @@
# pg_dump
> [!caution]
> TODO
Intro
<!-- Remove this line to uncomment if used
## Table of contents <!-- omit in toc -->
1. [TL;DR](#tldr)
1. [Further readings](#further-readings)
1. [Sources](#sources)
## TL;DR
<!-- Uncomment if used
<details>
<summary>Setup</summary>
```sh
```
</details>
-->
<!-- Uncomment if used
<details>
<summary>Usage</summary>
```sh
```
</details>
-->
<!-- Uncomment if used
<details>
<summary>Real world use cases</summary>
```sh
```
</details>
-->
## Further readings
- [PostgreSQL]
- [pg_dumpall]
- [pg_restore]
### Sources
- [Documentation]
- [A Complete Guide to pg_dump With Examples, Tips, and Tricks]
<!--
Reference
═╬═Time══
-->
<!-- In-article sections -->
<!-- Knowledge base -->
[pg_dumpall]: pg_dumpall.md
[pg_restore]: pg_restore.md
[PostgreSQL]: README.md
<!-- Files -->
<!-- Upstream -->
[Documentation]: https://www.postgresql.org/docs/current/app-pgdump.html
<!-- Others -->
[A Complete Guide to pg_dump With Examples, Tips, and Tricks]: https://www.dbvis.com/thetable/a-complete-guide-to-pg-dump-with-examples-tips-and-tricks/

View File

@@ -0,0 +1,72 @@
# pg_dumpall
> [!caution]
> TODO
Intro
<!-- Remove this line to uncomment if used
## Table of contents <!-- omit in toc -->
1. [TL;DR](#tldr)
1. [Further readings](#further-readings)
1. [Sources](#sources)
## TL;DR
<!-- Uncomment if used
<details>
<summary>Setup</summary>
```sh
```
</details>
-->
<!-- Uncomment if used
<details>
<summary>Usage</summary>
```sh
```
</details>
-->
<!-- Uncomment if used
<details>
<summary>Real world use cases</summary>
```sh
```
</details>
-->
## Further readings
- [PostgreSQL]
- [pg_dump]
- [pg_restore]
### Sources
- [Documentation]
<!--
Reference
═╬═Time══
-->
<!-- In-article sections -->
<!-- Knowledge base -->
[pg_dump]: pg_dump.md
[pg_restore]: pg_restore.md
[PostgreSQL]: README.md
<!-- Files -->
<!-- Upstream -->
[Documentation]: https://www.postgresql.org/docs/current/app-pg-dumpall.html
<!-- Others -->

View File

@@ -0,0 +1,222 @@
# `pg_flo`
Move and transform data between PostgreSQL databases using Logical Replication.
1. [TL;DR](#tldr)
1. [How this works](#how-this-works)
1. [State Management](#state-management)
1. [Further readings](#further-readings)
1. [Sources](#sources)
## TL;DR
Leverages PostgreSQL's logical replication system to capture changes and apply transformations and filtrations to the
data before streaming it to the destination.
Decouples the _replicator_ and _worker_ processes using [NATS] as message broker.<br/>
The NATS server **must have JetStream** enabled (`nats-server -js`).
The _replicator_ component captures PostgreSQL changes via logical replication.<br/>
The _worker_ component processes and routes changes through NATS.
<details>
<summary>Setup</summary>
<details style="padding: 0 0 0 1em">
<summary>Check requirements</summary>
```sql
sourceDb=> SELECT name,setting FROM pg_settings WHERE name IN ('wal_level','rds.logical_replication');
name | setting
-------------------------+---------
rds.logical_replication | on
wal_level | logical
(2 rows)
```
</details>
```sh
docker pull 'nats' && docker pull 'shayonj/pg_flo'
```
<details style="padding: 0 0 1em 1em">
<summary>Configuration file</summary>
[Reference][configuration file reference]
```yaml
# Replicator settings
host: "localhost"
port: 5432
dbname: "myapp"
user: "replicator"
password: "secret"
group: "users"
tables:
- "users"
# Worker settings (postgres sink)
target-host: "dest-db"
target-dbname: "myapp"
target-user: "writer"
target-password: "secret"
# Common settings
nats-url: "nats://localhost:4222"
```
</details>
</details>
<details>
<summary>Usage</summary>
```sh
# Open a shell
# For debugging purposes, mostly
docker run --rm --name 'pg_flo' --network 'host' --entrypoint 'sh' -ti 'shayonj/pg_flo'
# Start the replicator
# Using the config file failed for some reason at the time of writing
docker run --rm --name 'replicator' --network 'host' 'shayonj/pg_flo' \
replicator \
--host 'source-db.fqdn' --dbname 'sales' --user 'pgflo' --password '1q2w3e4r' \
--group 'whatevah' --nats-url 'nats://localhost:4222'
# Start the worker
docker run --rm --name 'pg_flo_worker' --network 'host' 'shayonj/pg_flo' \
worker stdout --group 'whatevah' --nats-url 'nats://localhost:4222'
docker run … \
worker postgres --group 'whatevah' --nats-url 'nats://localhost:4222' \
--target-host 'dest-db.fqdn' --target-dbname 'sales' --target-user 'pgflo' --target-password '1q2w3e4r'
```
</details>
<details>
<summary>Real world use cases</summary>
```sh
# Start a basic replication to stdout as example.
docker run --rm --name 'pg_flo_nats' -p '4222:4222' 'nats' -js \
&& docker run -d --name 'pg_flo_replicator' --network 'host' 'shayonj/pg_flo' \
replicator \
--host 'source-db.fqdn' --port '6001' --dbname 'sales' --user 'pgflo' --password '1q2w3e4r' \
--copy-and-stream --group 'whatevah' --nats-url 'nats://localhost:4222' \
&& docker run -d --name 'pg_flo_worker' --network 'host' 'shayonj/pg_flo' \
worker stdout --group 'whatevah' --nats-url 'nats://localhost:4222'
```
</details>
## How this works
Refer [How it Works].
1. The _replicator_ creates a PostgreSQL **publication** in the source DB for the replicated tables.
1. The _replicator_ creates a **replication slot** in the source DB.<br/>
This ensures no data is lost between streaming sessions.
1. The _replicator_ starts streaming changes from the source DB and publishes them to NATS:
- **After** performing an initial bulk copy, if in _Copy-and-Stream_ mode.
<details style="margin-top: -1em; padding: 0 0 1em 0">
If no valid LSN is found in NATS, `pg_flo` performs an initial bulk copy of existing data.
The process is parallelized for fast data sync:
1. A snapshot is taken to ensure consistency.
1. Each table is divided into page ranges.
1. Multiple workers copy different ranges concurrently.
</details>
- **Immediately**, from the last known position, if in _Stream-Only_ mode.
It also stores the last processed LSN in NATS, allowing the _worker_ to resume operations from where it left off in
case of interruptions.
1. The _worker_ processes messages from NATS.
<details style="margin-top: -1em; padding: 0 0 1em 0">
| Message type | Summary |
| ------------------------------------------------ | ------------------------------------ |
| Relation | Allow understanding table structures |
| `Insert`, `Update`, `Delete` | Contain actual data changes |
| `Begin`, `Commit` | Enable transaction boundaries |
| DDL changes (e.g. `ALTER TABLE`, `CREATE INDEX`) | Contain actual structural changes |
</details>
1. The _worker_ converts received data into a structured format with type-aware conversions for different PostgreSQL
data types.
1. \[If any rule is configured] The _worker_ applies transformation and filtering rules to the data.
<details style="margin-top: -1em; padding: 0 0 1em 0">
Transform Rules:
- Regex: apply regular expression transformations to string values.
- Mask: hide sensitive data, keeping the first and last characters visible.
Filter Rules:
- Comparison: filter based on equality, inequality, greater than, less than, etc.
- Contains: filter string values based on whether they contain a specific substring.
Rules _can_ be applied selectively to `insert`, `update`, or `delete` operations.
</details>
1. The _worker_ buffers processed data.
1. The _worker_ flushes data periodically from the buffer to the configured _sinks_.<br/>
Currently, _sinks_ can be `stdout`, files, PostgreSQL DBs or webhooks.<br/>
Flushed data is written to DB sinks in batches to optimize write operations.
### State Management
The _replicator_ keeps track of its progress by updating the _Last LSN_ in NATS.
The _worker_ maintains its progress to ensure data consistency.<br/>
This allows for resumable operations across multiple runs.
Periodic status updates are sent to the source DB to maintain the replication's connection.
## Further readings
- [PostgreSQL]
- [Website]
- [Main repository]
- [Transformation rules]
- [NATS]
### Sources
- [How to set the wal_level in AWS RDS Postgresql?]
- [Configuration file reference]
- [How it Works]
<!--
Reference
═╬═Time══
-->
<!-- In-article sections -->
<!-- Knowledge base -->
[NATS]: ../nats.md
[PostgreSQL]: README.md
<!-- Files -->
<!-- Upstream -->
[configuration file reference]: https://github.com/shayonj/pg_flo/blob/main/internal/pg-flo.yaml
[how it works]: https://github.com/shayonj/pg_flo/blob/main/internal/how-it-works.md
[main repository]: https://github.com/shayonj/pg_flo
[transformation rules]: https://github.com/shayonj/pg_flo/blob/main/pkg/rules/README.md
[website]: https://www.pgflo.io/
<!-- Others -->
[How to set the wal_level in AWS RDS Postgresql?]: https://dba.stackexchange.com/questions/238686/how-to-set-the-wal-level-in-aws-rds-postgresql#243576

View File

@@ -0,0 +1,72 @@
# pg_restore
> [!caution]
> TODO
Intro
<!-- Remove this line to uncomment if used
## Table of contents <!-- omit in toc -->
1. [TL;DR](#tldr)
1. [Further readings](#further-readings)
1. [Sources](#sources)
## TL;DR
<!-- Uncomment if used
<details>
<summary>Setup</summary>
```sh
```
</details>
-->
<!-- Uncomment if used
<details>
<summary>Usage</summary>
```sh
```
</details>
-->
<!-- Uncomment if used
<details>
<summary>Real world use cases</summary>
```sh
```
</details>
-->
## Further readings
- [PostgreSQL]
- [pg_dump]
- [pg_dumpall]
### Sources
- [Documentation]
<!--
Reference
═╬═Time══
-->
<!-- In-article sections -->
<!-- Knowledge base -->
[pg_dump]: pg_dump.md
[pg_dumpall]: pg_dumpall.md
[PostgreSQL]: README.md
<!-- Files -->
<!-- Upstream -->
[Documentation]: https://www.postgresql.org/docs/current/app-pgrestore.html
<!-- Others -->

View File

@@ -0,0 +1,66 @@
# pgAdmin
Browser-based management tool for [PostgreSQL].
1. [TL;DR](#tldr)
1. [Further readings](#further-readings)
1. [Sources](#sources)
## TL;DR
<details>
<summary>Setup</summary>
```sh
brew install --cask 'pgadmin4'
docker pull 'dpage/pgadmin4'
```
</details>
<!-- Uncomment if used
<details>
<summary>Usage</summary>
```sh
```
</details>
-->
<!-- Uncomment if used
<details>
<summary>Real world use cases</summary>
```sh
```
</details>
-->
## Further readings
- [PostgreSQL]
- [Website]
- [Codebase]
### Sources
- [Documentation]
<!--
Reference
═╬═Time══
-->
<!-- In-article sections -->
<!-- Knowledge base -->
[PostgreSQL]: README.md
<!-- Files -->
<!-- Upstream -->
[codebase]: https://github.com/pgadmin-org/pgadmin4
[documentation]: https://www.pgadmin.org/docs/
[website]: https://www.pgadmin.org/
<!-- Others -->

View File

@@ -0,0 +1,47 @@
# pganalyze Collector
Periodically queries configured databases and sends metrics and metadata (as _snapshots_) to the Pganalyze app.
1. [TL;DR](#tldr)
1. [Further readings](#further-readings)
1. [Sources](#sources)
## TL;DR
<details>
<summary>Setup</summary>
```sh
# Generic installation via magic script.
curl 'https://packages.pganalyze.com/collector-install.sh' | bash
# Change the configuration file and reload the collector's settings.
vim '/etc/pganalyze-collector.conf' && pganalyze-collector --test --reload \
&& systemctl status 'pganalyze-collector' && journalctl -xefu 'pganalyze-collector'
```
</details>
## Further readings
- [PostgreSQL]
- [Codebase]
- [Documentation]
### Sources
<!--
Reference
═╬═Time══
-->
<!-- In-article sections -->
<!-- Knowledge base -->
[PostgreSQL]: README.md
<!-- Files -->
<!-- Upstream -->
[Codebase]: https://github.com/pganalyze/collector
[Documentation]: https://pganalyze.com/docs/collector/
<!-- Others -->

View File

@@ -0,0 +1,179 @@
# pgBackRest
Reliable backup and restore solution for PostgreSQL.
1. [TL;DR](#tldr)
1. [Further readings](#further-readings)
1. [Sources](#sources)
## TL;DR
> [!caution]
> pgBackRest performs **physical** backups, and it requires **file-level** access to the data directory (`$PGDATA`) of
> the target PostgreSQL server.<br/>
> Use `pg_dump` or `pg_dumpall` to create **logical** backups instead.
Prefer installing pgBackRest from a package, instead of building from source.
Configuration files follow a Windows INI-like convention.<br/>
pgBackRest tries to load the configuration file from `/etc/pgbackrest/pgbackrest.conf` first. If no file exists in that
location, it checks `/etc/pgbackrest.conf`.
One can load multiple configuration files by using the `--config` option multiple times, or by specifying the
`--config-include-path` option to include a directory with multiple `.conf` files.<br/>
Each given file must exist, and be valid individually.<br/>
Multiple loaded files are **concatenated** as if they were one big file.
_Stanzas_ define the backup configuration for specific PostgreSQL database clusters.<br/>
They configure where the clusters are located, how they will be backed up, archiving options, and so on.
Each stanza must define the cluster's path, and its host and user if the cluster is remote.<br/>
Stanza-specific settings override any global configuration.
> [!tip]
> Prefer using names that describe the databases contained in the cluster.<br/>
> Stanza names will be used for the clusters' primary and all replicas, so it would be more appropriate to choose a name
> that somehow describes the actual _function_ of the cluster, rather than the local cluster name.
pgBackRest needs to know where the base data directory for any configured PostgreSQL cluster is located.<br/>
Make sure that pg-path is exactly equal to data_directory as reported by PostgreSQL.
pgBackRest stores backups and archives' WAL segments in _repositories_.<br/>
Repositories do support SFTP or object stores like S3, Azure, GCP.
Backing up a _running_ PostgreSQL cluster requires WAL archiving to be enabled.
> [!note]
> At least one WAL segment will be created during the backup process even if no explicit writes are made to the cluster.
pgBackRest _can_ take _most_ of the backup data from a standby instead of the primary, but both the primary and standby
databases are required to perform the backup.<br/>
Standby backups are identical to backups performed on the primary. This is achieved by starting and stopping the backup
on the primary, copying only files that are replicated from the standby, and finally copying the remaining few files
from the primary.<br/>
In this type of backup, logs and statistics from the primary database **will** be included in the backup.
When performing backups, pgBackRest copies file depending on the backup mode.<br/>
By default, it will attempt to perform an _incremental_ backup. Should no _full_ backups exist yet, pgBackRest will
create a full backup instead.
_Full_ backups save the **entire** contents of the database cluster. They do **not** depend on any other files for
consistency.<br/>
The first backup of the database cluster is always a full backup. Force full backups by running the backup command with
the `--type=full` option.<br/>
pgBackRest can always restore full backups directly.
_Differential_ backups save only those database cluster's files that have changed since the last **full** backup.<br/>
Differential backups require less disk space than a full backup, but require the full backup they depend on to be both
available and valid when restoring.<br/>
pgBackRest restores differential backups by copying the files in the chosen differential backup, plus the appropriate
**unchanged** files from the previous full backup.
_Incremental_ backups save only those database cluster's files that have changed since the last backup.<br/>
That last backup can be another incremental backup, a differential backup, or a full backup.<br/>
Incremental backups are generally much smaller than both full and differential backups, but require **all** the backups
they depend on, **and** these dependencies' own dependencies, to be both available and valid when restoring.
pgBackRest expires backups based on retention options. It will also retain archived WALs by default for backups that
have not expired yet.
Backups can be encrypted.<br/>
Encryption is always performed **client-side** even if the repository type supports encryption.
When multiple repositories are configured, pgBackRest will backup to the **highest** priority repository, unless
otherwise specified by the `--repo` option.
During online backups, pgBackRest waits for those WAL segments that are required for backup consistency to be
archived.<br/>
This wait time is governed by the `archive-timeout` option, which defaults to 60 seconds.
By default, pgBackRest will wait for the next regularly scheduled checkpoint before starting a backup.<br/>
Depending on the `checkpoint_timeout` and `checkpoint_segments` settings in PostgreSQL, it may be quite some time before
a checkpoint completes and the backup can begin. Generally, it is best to set `start-fast=y` to start the backup
immediately.
> [!note]
> Setting `start-fast=y` forces a checkpoint.<br/>
> An additional checkpoint should not have a noticeable impact on performance, but on busy production clusters it might
> still be best to enable the option only when needed.
pgBackRest does not come with built-in scheduler. Run it from `cron` or some other scheduling mechanism.
The `restore` command selects by default the **latest** backup available in the **first** repository that contains any
backup.
Replication slots are **not** restored as per recommendation of PostgreSQL.
<details>
<summary>Setup</summary>
```sh
# Install
brew install 'pgbackrest'
# Validate the configuration
# Give configuration files as option using their *absolute* path, or also use '--config-path'
pgbackrest check
pgbackrest check --config-path "$PWD"
pgbackrest --config-include-path '/opt/homebrew/etc/pgbackrest' check
pgbackrest --config "$PWD/pgBackRest.conf" --log-level-console 'debug' check
```
</details>
<details>
<summary>Usage</summary>
```sh
# Get help
pgbackrest help
pgbackrest --help
# Show logs on the CLI
pgbackrest … --log-level-console='info'
pgbackrest … --log-level-console 'debug'
# Create stanzas
pgbackrest … --stanza 'prod-app' stanza-create
```
</details>
<!-- Uncomment if used
<details>
<summary>Real world use cases</summary>
```sh
```
</details>
-->
## Further readings
- [PostgreSQL]
- [Website]
- [Codebase]
### Sources
- [User guide][user guide rhel]
- [Configuration]
<!--
Reference
═╬═Time══
-->
<!-- In-article sections -->
<!-- Knowledge base -->
[PostgreSQL]: README.md
<!-- Files -->
<!-- Upstream -->
[Codebase]: https://github.com/pgbackrest/pgbackrest
[Configuration]: https://pgbackrest.org/configuration.html
[User guide rhel]: https://pgbackrest.org/user-guide-rhel.html
[Website]: https://pgbackrest.org/
<!-- Others -->