chore(dblab-engine): expand notes

This commit is contained in:
Michele Cereda
2025-08-18 23:31:39 +02:00
parent 9089c3c981
commit f72db3ce28
9 changed files with 243 additions and 103 deletions

View File

@@ -30,8 +30,10 @@
1. [Resource tagging](#resource-tagging) 1. [Resource tagging](#resource-tagging)
1. [API](#api) 1. [API](#api)
1. [Python](#python) 1. [Python](#python)
1. [Container images](#container-images)
1. [Amazon Linux](#amazon-linux)
1. [Further readings](#further-readings) 1. [Further readings](#further-readings)
1. [Sources](#sources) 1. [Sources](#sources)
## TL;DR ## TL;DR
@@ -868,6 +870,21 @@ machine if not.
</details> </details>
## Container images
### Amazon Linux
Refer [Pulling the Amazon Linux container image].
Amazon Linux container images are **infamous** for having issues when connecting to their package repositories from
**outside** of AWS' network.<br/>
While it can connect to them _sometimes™_ when running locally, one can get much easier and more consistent results by
just running it from **inside** AWS.
Disconnect from the VPN, start the container, and reconnect to the VPN before installing packages when running the
container locally.<br/>
If one can, prefer just build the image from an EC2 instance.
## Further readings ## Further readings
- [Learn AWS] - [Learn AWS]
@@ -1001,11 +1018,13 @@ machine if not.
[what is amazon vpc?]: https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html [what is amazon vpc?]: https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html
[what is aws config?]: https://docs.aws.amazon.com/config/latest/developerguide/WhatIsConfig.html [what is aws config?]: https://docs.aws.amazon.com/config/latest/developerguide/WhatIsConfig.html
[what is aws global accelerator?]: https://docs.aws.amazon.com/global-accelerator/latest/dg/what-is-global-accelerator.html [what is aws global accelerator?]: https://docs.aws.amazon.com/global-accelerator/latest/dg/what-is-global-accelerator.html
[Pulling the Amazon Linux container image]: https://docs.aws.amazon.com/AmazonECR/latest/userguide/amazon_linux_container_image.html
<!-- Others --> <!-- Others -->
[a guide to tagging resources in aws]: https://medium.com/@staxmarketing/a-guide-to-tagging-resources-in-aws-8f4311afeb46 [a guide to tagging resources in aws]: https://medium.com/@staxmarketing/a-guide-to-tagging-resources-in-aws-8f4311afeb46
[automating dns-challenge based letsencrypt certificates with aws route 53]: https://johnrix.medium.com/automating-dns-challenge-based-letsencrypt-certificates-with-aws-route-53-8ba799dd207b [automating dns-challenge based letsencrypt certificates with aws route 53]: https://johnrix.medium.com/automating-dns-challenge-based-letsencrypt-certificates-with-aws-route-53-8ba799dd207b
[aws config tutorial by stephane maarek]: https://www.youtube.com/watch?v=qHdFoYSrUvk [aws config tutorial by stephane maarek]: https://www.youtube.com/watch?v=qHdFoYSrUvk
[AWS Fundamentals Blog]: https://awsfundamentals.com/blog
[aws savings plans vs. reserved instances: when to use each]: https://www.cloudzero.com/blog/savings-plans-vs-reserved-instances/ [aws savings plans vs. reserved instances: when to use each]: https://www.cloudzero.com/blog/savings-plans-vs-reserved-instances/
[date & time policy conditions at aws - 1-minute iam lesson]: https://www.youtube.com/watch?v=4wpKP1HLEXg [date & time policy conditions at aws - 1-minute iam lesson]: https://www.youtube.com/watch?v=4wpKP1HLEXg
[difference in boto3 between resource, client, and session?]: https://stackoverflow.com/questions/42809096/difference-in-boto3-between-resource-client-and-session [difference in boto3 between resource, client, and session?]: https://stackoverflow.com/questions/42809096/difference-in-boto3-between-resource-client-and-session
@@ -1017,4 +1036,3 @@ machine if not.
[using aws kms via the cli with a symmetric key]: https://nsmith.net/aws-kms-cli [using aws kms via the cli with a symmetric key]: https://nsmith.net/aws-kms-cli
[VPC Endpoints: Secure and Direct Access to AWS Services]: https://awsfundamentals.com/blog/vpc-endpoints [VPC Endpoints: Secure and Direct Access to AWS Services]: https://awsfundamentals.com/blog/vpc-endpoints
[What Is OIDC and Why Do We Need It?]: https://awsfundamentals.com/blog/oidc-introduction [What Is OIDC and Why Do We Need It?]: https://awsfundamentals.com/blog/oidc-introduction
[AWS Fundamentals Blog]: https://awsfundamentals.com/blog

View File

@@ -1,93 +0,0 @@
# Database Lab
Database Lab Engine is an open-source platform developed by Postgres.ai to create instant, full-size clones of
production databases.<br/>
Use cases of the clones are to test database migrations, optimize SQL, or deploy full-size staging apps.
The website <https://Postgres.ai/> hosts the SaaS version of the Database Lab Engine.
Configuration file examples are available at <https://gitlab.com/postgres-ai/database-lab/-/tree/v3.0.0/configs>.
1. [Engine](#engine)
1. [Clones](#clones)
1. [Further readings](#further-readings)
1. [Sources](#sources)
## Engine
Config file in YAML format, at `~/.dblab/engine/configs/server.yml` by default.
Metadata files at `~/.dblab/engine/meta` by default. The metadata folder **must be writable**.
```sh
# Reload the configuration without downtime.
docker exec -it 'dblab_server' kill -SIGHUP 1
# Follow logs.
docker logs --since '1m' -f 'dblab_server'
docker logs --since '2024-05-01' -f 'dblab_server'
docker logs --since '2024-08-01T23:11:35' -f 'dblab_server'
```
Images for the _Standard_ and _Enterprise_ editions are available at
<https://gitlab.com/postgres-ai/se-images/container_registry/>.<br/>
Images for the _Community_ edition are available at <https://gitlab.com/postgres-ai/custom-images>.
## Clones
Database clones comes in two flavours:
- _Thick_ cloning: the regular way to copy data.<br/>
It is also how data is copied to Database Lab the first time a source is added.
Thick clones can be:
- _Logical_: do a regular dump and restore using `pg_dump` and `pg_restore`.
- _Physical_: done using `pg_basebackup` or restoring data from physical archives created by backup tools such as
WAL-E/WAL-G, Barman, pgBackRest, or pg_probackup.
> Managed PostgreSQL databases in cloud environments (e.g.: AWS RDS) support only the logical clone type.
The Engine supports continuous synchronization with the source databases.<br/>
Achieved by repeating the thick cloning method one initially used for the source.
- _Thin_ cloning: local containerized database clones based on CoW (Copy-on-Write) spin up in few seconds.<br/>
They share most of the data blocks, but logically they look fully independent.<br/>
The speed of thin cloning does **not** depend on the database size.
As of 2024-06, Database Lab Engine supports ZFS and LVM for thin cloning.<br/>
With ZFS, the Engine periodically creates a new snapshot of the data directory and maintains a set of snapshots. When
requesting a new clone, users choose which snapshot to use as base.
Clone DBs configuration starting point is at `~/.dblab/postgres_conf/postgresql.conf`.
## Further readings
- [Website]
- [Main repository]
- [Documentation]
- [`dblab`][dblab]
- [Installation guide for DBLab Community Edition][how to install dblab manually]
### Sources
- [Database Lab Engine configuration reference]
<!--
Reference
═╬═Time══
-->
<!-- In-article sections -->
<!-- Knowledge base -->
[dblab]: dblab.md
<!-- Files -->
<!-- Upstream -->
[database lab engine configuration reference]: https://postgres.ai/docs/reference-guides/database-lab-engine-configuration-reference
[documentation]: https://postgres.ai/docs/
[how to install dblab manually]: https://postgres.ai/docs/how-to-guides/administration/install-dle-manually
[main repository]: https://gitlab.com/postgres-ai/database-lab
[website]: https://postgres.ai/
<!-- Others -->

View File

@@ -0,0 +1,123 @@
# DBLab engine
Creates **instant**, **full-size** clones of PostgreSQL databases.<br/>
Mainly used to test database migrations, optimize SQL, or deploy full-size staging apps.
Can be self-hosted.<br/>
The [website] hosts the SaaS version.
1. [TL;DR](#tldr)
1. [Further readings](#further-readings)
1. [Sources](#sources)
## TL;DR
It leverages thin clones to provide full-sized database environments in seconds, regardless of the source database's
size.<br/>
It relies on copy-on-write (CoW) filesystem technologies (currently ZFS or LVM) to provide efficient storage and
provisioning for database clones.
Relies on Docker containers to isolate and run PostgreSQL instances for each clone.<br/>
Each clone gets its own network port.
The _Retrieval Service_ acquires data from source PostgreSQL databases and prepares it for cloning.<br/>
It supports:
- **Physical** retrieval, by using physical backup methods like `pg_basebackup`, WAL-G, or `pgBackRest` to copy the
entire `PGDATA` directory.
- **Logical** retrieval, by using logical dump and restore tools like `pg_dump` and `pg_restore` to copy database
objects and data.
> [!important]
> Managed PostgreSQL databases in cloud environments (e.g.: AWS RDS) support only logical synchronization.
The _Pool Manager_ manages storage pools and filesystem operations.<br/>
It abstracts the underlying filesystem (ZFS or LVM) and provides a consistent interface for snapshot and clone
operations.<br/>
It supports different pools, each with its own **independent** configuration and filesystem manager.
The _Provisioner_ manages the resources it needs to run and handle the lifecycle of database clones.<br/>
It creates and manages PostgreSQL instances by allocating network ports to them from a pool, creating and managing the
containers they run on, mounting filesystem clones for them to use, and configuring them.
The _Cloning Service_ orchestrates the overall process of creating and managing database clones by coordinating the
Provisioner and Pool Manager to fulfill cloning requests from clients.
The _API Server_ exposes HTTP endpoints for interactions by providing RESTful APIs that allow creating and managing
clones, viewing snapshots, and monitoring systems' status.
Database Lab Engine uses a YAML-based configuration file, which is loaded at startup and **can be reloaded at
runtime**.<br/>
It is located at `~/.dblab/engine/configs/server.yml` by default.
Metadata files are located at `~/.dblab/engine/meta` by default.<br/>
The metadata's folder **must be writable**.
```sh
# Reload the configuration without downtime.
docker exec -it 'dblab_server' kill -SIGHUP 1
# Follow logs.
docker logs --since '1m' -f 'dblab_server'
docker logs --since '2024-05-01' -f 'dblab_server'
docker logs --since '2024-08-01T23:11:35' -f 'dblab_server'
```
Before DLE can create thin clones, it must first obtain a **full** copy of the source database.<br/>
The initial data retrieval process is also referred to as _thick cloning_, and is typically a one-time or a scheduled
operation.
Each clone runs in its own PostgreSQL container, and its configuration can be customized.<br/>
Clone DBs configuration starting point is at `~/.dblab/postgres_conf/postgresql.conf`.
Database clones come as _thick_ or _thin_ clones.
Thick clones work as normal replica would, **continuously** synchronizing with their source database.
Thin clones:
1. Prompt the creation of a dedicated filesystem snapshot.
1. Spin up a local database container that mounts that snapshot as volume.
The creation speed of thin clones does **not** depend on the database's size.
When thin clones are involved, DLE **periodically** creates a new snapshot from the source database, and maintains a
set of them.<br/>
When requesting a new clone, users choose which snapshot to use as its base.
Container images for the _Community_ edition are available at <https://gitlab.com/postgres-ai/custom-images>.<br/>
Specialized images for only the _Standard_ and _Enterprise_ editions are available at
<https://gitlab.com/postgres-ai/se-images/container_registry/>.
## Further readings
- [Website]
- [Codebase]
- [Documentation]
- [`dblab`][dblab]
### Sources
- [DeepWiki][deepwiki postgres-ai/database-lab-engine]
- [Database Lab Engine configuration reference]
- [Installation guide for DBLab Community Edition][how to install dblab manually]
<!--
Reference
═╬═Time══
-->
<!-- In-article sections -->
<!-- Knowledge base -->
[dblab]: dblab.md
<!-- Files -->
<!-- Upstream -->
[database lab engine configuration reference]: https://postgres.ai/docs/reference-guides/database-lab-engine-configuration-reference
[Documentation]: https://postgres.ai/docs/
[how to install dblab manually]: https://postgres.ai/docs/how-to-guides/administration/install-dle-manually
[Codebase]: https://gitlab.com/postgres-ai/database-lab
[Website]: https://postgres.ai/
<!-- Others -->
[DeepWiki postgres-ai/database-lab-engine]: https://deepwiki.com/postgres-ai/database-lab-engine

View File

@@ -1,6 +1,6 @@
# `dblab` # `dblab`
Database Lab Engine client CLI. DBLab Engine's CLI client.
1. [TL;DR](#tldr) 1. [TL;DR](#tldr)
1. [Further readings](#further-readings) 1. [Further readings](#further-readings)
@@ -91,7 +91,7 @@ curl -X 'DELETE' 'https://dblab.company.com:1234/api/clone/smth' \
## Further readings ## Further readings
- [Database Lab] - [DBLab engine]
- [Database Lab Client CLI reference (dblab)] - [Database Lab Client CLI reference (dblab)]
- [API reference] - [API reference]
@@ -107,11 +107,11 @@ curl -X 'DELETE' 'https://dblab.company.com:1234/api/clone/smth' \
<!-- In-article sections --> <!-- In-article sections -->
<!-- Knowledge base --> <!-- Knowledge base -->
[database lab]: database%20lab.md [DBLab engine]: dblab%20engine.md
<!-- Files --> <!-- Files -->
<!-- Upstream --> <!-- Upstream -->
[api reference]: https://dblab.readme.io/reference/ [API reference]: https://dblab.readme.io/reference/
[database lab client cli reference (dblab)]: https://postgres.ai/docs/reference-guides/dblab-client-cli-reference [database lab client cli reference (dblab)]: https://postgres.ai/docs/reference-guides/dblab-client-cli-reference
[how to install and initialize database lab cli]: https://postgres.ai/docs/how-to-guides/cli/cli-install-init [how to install and initialize database lab cli]: https://postgres.ai/docs/how-to-guides/cli/cli-install-init
[How to refresh data when working in the "logical" mode]: https://postgres.ai/docs/how-to-guides/administration/logical-full-refresh [How to refresh data when working in the "logical" mode]: https://postgres.ai/docs/how-to-guides/administration/logical-full-refresh

View File

@@ -805,7 +805,7 @@
| reject('match', '^CREATE ROLE ' + master_username) | reject('match', '^CREATE ROLE ' + master_username)
| reject('match', '.*rdsadmin.*') | reject('match', '.*rdsadmin.*')
| reject('match', '^(CREATE|ALTER) ROLE rds_') | reject('match', '^(CREATE|ALTER) ROLE rds_')
| map('regex_replace', '(NO)(SUPERUSER|REPLICATION)\s?', '') | map('regex_replace', '(\s+(NO)?(SUPERUSER|REPLICATION))?', '')
}} }}
- name: Wait for pending changes to be applied - name: Wait for pending changes to be applied
amazon.aws.rds_instance_info: amazon.aws.rds_instance_info:

View File

@@ -256,7 +256,7 @@
| reject('match', '^CREATE ROLE ' + master_username) | reject('match', '^CREATE ROLE ' + master_username)
| reject('match', '.*rdsadmin.*') | reject('match', '.*rdsadmin.*')
| reject('match', '^(CREATE|ALTER) ROLE rds_') | reject('match', '^(CREATE|ALTER) ROLE rds_')
| map('regex_replace', '(NO)(SUPERUSER|REPLICATION)\s?', '') | map('regex_replace', '(\s+(NO)?(SUPERUSER|REPLICATION))?', '')
}} }}
- name: Manipulate numbers - name: Manipulate numbers

View File

@@ -4,9 +4,11 @@
dblab --url 'http://dblab.example.org:1234/' --token "$(gopass show -o 'dblab')" dblab --url 'http://dblab.example.org:1234/' --token "$(gopass show -o 'dblab')"
# Check logs # Check logs
# Only available from the server hosting the engine
docker logs --since '5m' -f 'dblab_server' docker logs --since '5m' -f 'dblab_server'
# Reload the configuration # Reload the configuration
# Only available from the server hosting the engine
docker exec -it 'dblab_server' kill -SIGHUP '1' docker exec -it 'dblab_server' kill -SIGHUP '1'
# Check the running container's version # Check the running container's version
@@ -83,7 +85,7 @@ curl 'https://dblab.example.org:1234/clone/some-clone' -H "Verification-Token: $
curl 'https://dblab.example.org:1234/api/clone/some-clone' -H "Verification-Token: $(gopass show -o 'dblab')" curl 'https://dblab.example.org:1234/api/clone/some-clone' -H "Verification-Token: $(gopass show -o 'dblab')"
# Restart clones # Restart clones
# Only doable from the instance # Only available from the server hosting the engine
docker restart 'dblab_clone_6000' docker restart 'dblab_clone_6000'
# Reset clones # Reset clones
@@ -111,7 +113,14 @@ curl -X 'PATCH' 'https://dblab.example.org:1234/api/clone/some-clone' \
# Delete clones # Delete clones
dblab clone destroy 'some-clone' dblab clone destroy 'some-clone'
curl -X 'DELETE' 'https://dblab.example.org:1234/api/clone/some-clone' -H "Verification-Token: $(gopass show -o 'dblab')" curl -X 'DELETE' 'https://dblab.example.org:1234/api/clone/some-clone' \
-H "Verification-Token: $(gopass show -o 'dblab')"
# Get admin config in YAML format # Get admin config in YAML format
curl 'https://dblab.example.org:1234/api/admin/config.yaml' -H "Verification-Token: $(gopass show -o 'dblab')" curl 'https://dblab.example.org:1234/api/admin/config.yaml' -H "Verification-Token: $(gopass show -o 'dblab')"
# Display the engine's status
dblab instance status
# Display the engine's version
dblab instance version

View File

@@ -56,6 +56,10 @@ ALTER DATABASE reviser SET pgaudit.log TO none;
\c sales \c sales
\connect vendor \connect vendor
-- Get databases' size
SELECT pg_database_size('postgres');
SELECT pg_size_pretty(pg_database_size('postgres'));
-- List schemas -- List schemas
\dn \dn
@@ -91,6 +95,10 @@ CREATE TABLE people (
\d+ clients \d+ clients
SELECT column_name, data_type, character_maximum_length FROM information_schema.columns WHERE table_name = 'vendors'; SELECT column_name, data_type, character_maximum_length FROM information_schema.columns WHERE table_name = 'vendors';
-- Get tables' size
SELECT pg_relation_size('vendors');
SELECT pg_size_pretty(pg_relation_size('vendors'));
-- Insert data -- Insert data
INSERT INTO people(id, first_name, last_name, phone) INSERT INTO people(id, first_name, last_name, phone)

View File

@@ -1,5 +1,63 @@
#!/usr/bin/env fish #!/usr/bin/env fish
###
# Pools
# --------------------------------------
###
# List available pools
zpool list
# Show pools' I/O statistics
zpool iostat
# Show pools' configuration and status
zpool status
# List all pools available for import
zpool import
# Import pools
zpool import -a
zpool import -d
zpool import 'vault'
zpool import 'tank' -N
zpool import 'encrypted_pool_name' -l
# Get pools' properties
zpool get all 'vault'
# Set pools' properties
zpool set 'compression=lz4' 'tank'
# Get info about pools' features
man zpool-features
# Show the history of all pool's operations
zpool history 'tank'
# Check pools for errors
# Very cpu *and* disk intensive
zpool scrub 'tank'
# Export pools
# Unmounts *all* filesystems in any given pool
zpool export 'vault'
zpool export -f 'vault'
# Destroy pools
zpool destroy 'tank'
# Restore destroyed pools
# Pools can only be reimported right after the destroy command has been issued
zpool import -D
# Check pool configuration
zdb -C 'vault'
# Display the predicted effect of enabling deduplication
zdb -S 'rpool'
### ###
# File Systems # File Systems
# -------------------------------------- # --------------------------------------
@@ -9,6 +67,23 @@
# List available datasets # List available datasets
zfs list zfs list
# Automatically mount filesystems
# Find a dataset's mountpoint's root path via `zfs get mountpoint 'pool_name'`
zfs mount -alv
# Automatically unmount datasets
zfs unmount 'tank/media'
# Create filesystems
zfs create 'tank/docs'
zfs create -V '1gb' 'vault/good_memories'
# List snapshots # List snapshots
zfs list -t 'all' zfs list -t 'all'
zfs list -t 'snapshot,volume,bookmark' zfs list -t 'snapshot,volume,bookmark'
# Create snapshots
zfs snapshot 'vault/good_memories@2024-12-31'
# Check key parameters are fine
zfs get -r checksum,compression,readonly,canmount 'tank'