18 KiB
DBLab engine
Creates instant, full-size clones of PostgreSQL databases.
Mainly used to test database migrations, optimize SQL, or deploy full-size staging apps.
Can be self-hosted.
The website hosts the SaaS version.
TL;DR
It leverages thin clones to provide full-sized database environments in seconds, regardless of the source database's
size.
It relies on copy-on-write (CoW) filesystem technologies (currently ZFS or LVM) to provide efficient storage and
provisioning for database clones.
Relies on Docker containers to isolate and run PostgreSQL instances for each clone.
Each clone gets its own network port.
The Retrieval Service acquires data from source PostgreSQL databases and prepares it for cloning.
It supports:
- Physical retrieval, by using physical backup methods like
pg_basebackup, WAL-G, orpgBackRestto copy the entirePGDATAdirectory. - Logical retrieval, by using logical dump and restore tools like
pg_dumpandpg_restoreto copy database objects and data.
Important
Managed PostgreSQL databases in cloud environments (e.g.: AWS RDS) support only logical synchronization.
The Pool Manager manages storage pools and filesystem operations.
It abstracts the underlying filesystem (ZFS or LVM) and provides a consistent interface for snapshot and clone
operations.
It supports different pools, each with its own independent configuration and filesystem manager.
The Provisioner manages the resources it needs to run and handle the lifecycle of database clones.
It creates and manages PostgreSQL instances by allocating network ports to them from a pool, creating and managing the
containers they run on, mounting filesystem clones for them to use, and configuring them.
The Cloning Service orchestrates the overall process of creating and managing database clones by coordinating the Provisioner and Pool Manager to fulfill cloning requests from clients.
The API Server exposes HTTP endpoints for interactions by providing RESTful APIs that allow creating and managing clones, viewing snapshots, and monitoring systems' status.
Database Lab Engine uses a YAML-based configuration file, which is loaded at startup and can be reloaded at
runtime.
It is located at ~/.dblab/engine/configs/server.yml by default.
Metadata files are located at ~/.dblab/engine/meta by default.
The metadata's folder must be writable.
# Reload the configuration without downtime.
docker exec -it 'dblab_server' kill -SIGHUP 1
# Follow logs.
docker logs --since '1m' -f 'dblab_server'
docker logs --since '2024-05-01' -f 'dblab_server'
docker logs --since '2024-08-01T23:11:35' -f 'dblab_server'
Before DLE can create thin clones, it must first obtain a full copy of the source database.
The initial data retrieval process is also referred to as thick cloning, and is typically a one-time or a scheduled
operation.
Each clone runs in its own PostgreSQL container, and its configuration can be customized.
Clone DBs configuration starting point is at ~/.dblab/postgres_conf/postgresql.conf.
Database clones come as thick or thin clones.
Thick clones work as normal replica would, continuously synchronizing with their source database.
Thin clones:
- Prompt the creation of a dedicated filesystem snapshot.
- Spin up a local database container that mounts that snapshot as volume.
The creation speed of thin clones does not depend on the database's size.
When thin clones are involved, DLE periodically creates a new snapshot from the source database, and maintains a
set of them.
When requesting a new clone, users choose which snapshot to use as its base.
Container images for the Community edition are available at https://gitlab.com/postgres-ai/custom-images.
Specialized images for only the Standard and Enterprise editions are available at
https://gitlab.com/postgres-ai/se-images/container_registry/.
Setup
Refer How to install DBLab manually.
Tip
Prefer using PostgresAI Console or AWS Marketplace when installing DBLab in Standard or Enterprise Edition.
Requirements:
-
Docker Engine must be installed, and usable by the user running DBLab.
-
One or more extra disks, or partitions, to dedicate to DBLab Engine's data.
Tip
Prefer dedicating extra disks to the data for better performance.
The Engine can use multiple ZFS pools (or LVM volumes) to automatically full refresh data without downtime.$ sudo lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT ... nvme0n1 259:0 0 8G 0 disk └─nvme0n1p1 259:1 0 8G 0 part / nvme1n1 259:2 0 777G 0 disk $ export DBLAB_DISK='/dev/nvme1n1'
Procedure:
- Configure the storage to enable thin cloning.
- Prepare the database data directory.
- Launch DBLab server.
Configure the storage to enable thin cloning
Tip
ZFS is the recommended way to enable thin cloning in Database Lab.
DBLab also supports LVM volumes, but this method:
- Has much less flexible disk space consumption.
- Risks clones to be destroyed when executing massive maintenance operations on it.
- Does not work with multiple snapshots, forcing clones to always use the most recent version of the data.
ZFS pool
-
Install ZFS.
sudo apt-get install 'zfsutils-linux' -
Create the pool:
sudo zpool create \ -O 'compression=on' \ -O 'atime=off' \ -O 'recordsize=128k' \ -O 'logbias=throughput' \ -m '/var/lib/dblab/dblab_pool' \ 'dblab_pool' \ "${DBLAB_DISK}"Tip
When planning to set
physicalRestore.sync.enabled: truein DBLab' configuration, consider lowering the value of therecordsizeoption.Using
recordsize=128kmight provide better compression ratio and performance of massive IO-bound operations, like the creation of an index, but worse performance of WAL replay, causing the lag to be higher.
Vice versa, usingrecordsize=8kimproves the performance of WAL replay, but lowers the compression ratio and causes longer duration of index creation. -
Check the creation results:
$ sudo zfs list NAME USED AVAIL REFER MOUNTPOINT dblab_pool 106K 777G 24K /var/lib/dblab/dblab_pool $ sudo lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT ... nvme0n1 259:0 0 8G 0 disk └─nvme0n1p1 259:1 0 8G 0 part / nvme1n1 259:0 0 777G 0 disk ├─nvme1n1p1 259:3 0 777G 0 part └─nvme1n1p9 259:4 0 8M 0 part
LVM volume
-
Install LVM2:
sudo apt-get install -y 'lvm2' -
Create an LVM volume:
# Create Physical Volume and Volume Group sudo pvcreate "${DBLAB_DISK}" sudo vgcreate 'dblab_vg' "${DBLAB_DISK}" # Create Logical Volume and filesystem sudo lvcreate -l '10%FREE' -n 'pool_lv' 'dblab_vg' sudo mkfs.ext4 '/dev/dblab_vg/pool_lv' # Mount Database Lab pool sudo mkdir -p '/var/lib/dblab/dblab_vg-pool_lv' sudo mount '/dev/dblab_vg/pool_lv' '/var/lib/dblab/dblab_vg-pool_lv' # Bootstrap LVM snapshots so they could be used inside Docker containers sudo lvcreate --snapshot --extents '10%FREE' --yes --name 'dblab_bootstrap' 'dblab_vg/pool_lv' sudo lvremove --yes 'dblab_vg/dblab_bootstrap'
Important
The logical volume size must be defined at volume creation time.
By default, it is suggested to allocate 10% of the available system memory. If the volume size exceeds the allocated memory, the volume will be destroyed and potentially lead to data loss.
To prevent volumes from being destroyed, consider enabling the LVM auto-extend feature.
Enable the auto-extend feature by updating the LVM configuration with the following options:
snapshot_autoextend_threshold: auto-extend snapshot volumes when their usage exceed the specified percentage.snapshot_autoextend_percent: auto-extend snapshot volumes by the specified percentage of the available space once their usage exceeds the threshold.
sudo sed -i 's/snapshot_autoextend_threshold.*/snapshot_autoextend_threshold = 70/g' '/etc/lvm/lvm.conf'
sudo sed -i 's/snapshot_autoextend_percent.*/snapshot_autoextend_percent = 20/g' '/etc/lvm/lvm.conf'
Prepare the database data directory
The DBLab Engine server needs data to use as source.
There are 3 options:
- Use a generated database by generating a synthetic database for testing purposes.
- Create a physical copy of an existing database using physical methods such as
pg_basebackup.
See also PostgreSQL backup. - Perform a logical copy of an existing database using logical methods like dumping it and restoring the dump in the data directory.
Generated database
Preferred when one doesn't have an existing database for testing.
-
Generate some synthetic database in the
PGDATAdirectory (located at/var/lib/dblab/dblab_pool/databy default). A simple way of doing this is to usepgbench.
With scale factor-s 100, the database will occupy ~1.4 GiB.sudo docker run --detach \ --name 'dblab_pg_initdb' --label 'dblab_sync' \ --env 'PGDATA=/var/lib/postgresql/pgdata' --env 'POSTGRES_HOST_AUTH_METHOD=trust' \ --volume '/var/lib/dblab/dblab_pool/data:/var/lib/postgresql/pgdata' \ 'postgres:15-alpine' sudo docker exec -it 'dblab_pg_initdb' psql -U 'postgres' -c 'create database test' sudo docker exec -it 'dblab_pg_initdb' pgbench -U 'postgres' -i -s '100' 'test' sudo docker stop 'dblab_pg_initdb' sudo docker rm 'dblab_pg_initdb' -
Copy the contents of the configuration file example
config.example.logical_generic.ymlfrom the Database Lab repository to~/.dblab/engine/configs/server.yml.mkdir -p "$HOME/.dblab/engine/configs" curl -fsSL \ --url 'https://gitlab.com/postgres-ai/database-lab/-/raw/v4.0.0/engine/configs/config.example.logical_generic.yml' \ --output "$HOME/.dblab/engine/configs/server.yml" -
Edit the following options in the configuration file:
- Set
server:verificationToken.
It will be used to authorize API requests to the DBLab Engine. - Remove the
logicalDumpsection completely. - Remove the
logicalRestoresection completely. - Leave
logicalSnapshotas is. - If the PostgreSQL major version is not 17, set the proper image tag version in
databaseContainer:dockerImage.
- Set
Physical copy
TODO
Logical copy
Copy the existing database's data to the /var/lib/dblab/dblab_pool/data directory on the DBLab server.
This step also known as thick cloning, and it only needs to be completed once.
-
Copy the contents of the configuration file example
config.example.logical_generic.ymlfrom the Database Lab repository to~/.dblab/engine/configs/server.yml.mkdir -p "$HOME/.dblab/engine/configs" curl -fsSL \ --url 'https://gitlab.com/postgres-ai/database-lab/-/raw/v4.0.0/engine/configs/config.example.logical_generic.yml' \ --output "$HOME/.dblab/engine/configs/server.yml" -
Edit the following options in the configuration file:
- Set
server:verificationToken.
It will be used to authorize API requests to the DBLab Engine. - Set the connection options in
retrieval:spec:logicalDump:options:source:connection:host: database server hostport: database server portdbname: database name to connect tousername: database user namepassword: database master password.
This can be also set as thePGPASSWORDenvironment variable, and passed to the container using the--envoption ofdocker run.
- If the PostgreSQL major version is not 17, set the proper image tag version in
databaseContainer:dockerImage.
- Set
Launch DBLab server
sudo docker run --privileged --detach --restart on-failure \
--name 'dblab_server' --label 'dblab_control' \
--publish '127.0.0.1:2345:2345' \
--volume '/var/run/docker.sock:/var/run/docker.sock' \
--volume '/var/lib/dblab:/var/lib/dblab/:rshared' \
--volume "$HOME/.dblab/engine/configs:/home/dblab/configs" \
--volume "$HOME/.dblab/engine/meta:/home/dblab/meta" \
--volume "$HOME/.dblab/engine/logs:/home/dblab/logs" \
--volume '/sys/kernel/debug:/sys/kernel/debug:rw' \
--volume '/lib/modules:/lib/modules:ro' \
--volume '/proc:/host_proc:ro' \
--env 'DOCKER_API_VERSION=1.39' \
'postgresai/dblab-server:4.0.0'
Important
With
--publish 127.0.0.1:2345:2345, only local connections will be allowed.
To allow external connections, prepend proxies like NGINX or Envoy (preferred) or change the parameter to--publish 2345:2345to listen to all available network interfaces.
Clean up
# Stop and remove all Docker containers
sudo docker ps -aq | xargs --no-run-if-empty sudo docker rm -f
# Remove all Docker images
sudo docker images -q | xargs --no-run-if-empty sudo docker rmi
# Clean up the data directory
sudo rm -rf '/var/lib/dblab/dblab_pool/data'/*
# Remove the dump directory
sudo umount '/var/lib/dblab/dblab_pool/dump'
sudo rm -rf '/var/lib/dblab/dblab_pool/dump'
# Start from the beginning by destroying the ZFS storage pool
sudo zpool destroy 'dblab_pool'
Automatically full refresh data without downtime
Refer Automatic full refresh data from a source.
DBLab Engine can use two or more ZFS pools or LVM logical volumes to perform an automatic full refresh on schedule and without downtime.
Tip
Prefer dedicating an entire disk to each pool or logical volume.
This avoids overloading a single disk when syncing, and prevents the whole data failing should a disk fail.
Troubleshooting
Cannot destroy automatic snapshot in the pool
Error message example:
2025/10/07 09:49:32 cannot destroy automatic snapshot in the pool
Root cause: still unknown.
Short term solution: manually delete the ZFS snapshots and restart the Engine.
-
Decide what snapshots need to be deleted.
$ zfs list -t 'snapshot' NAME USED AVAIL REFER MOUNTPOINT dblab_pool_0@snapshot_20250924055533 92.3G - 266G - dblab_pool_1@snapshot_20250923130042 132G - 144G - dblab_pool_1@snapshot_20250915224319 142G - 145G - dblab_pool_1@snapshot_20251002175419 87.5K - 145G - -
Ensure no clone is using those snapshots.
Reset those that do if necessary. -
Destroy the chosen ZFS snapshots.
sudo zfs destroy 'dblab_pool_1@snapshot_20250923130042' -
Restart the DBLab Engine's container.
Needed to make it recognize the snapshots are gone.sudo docker container restart 'dblab_server'
The automatic full refresh fails claiming it cannot find available pools
Root cause: in version 4.0.0, the DBLab Engine happened to consider a pool used by clones, even if those clones were
destroyed.
This seems to have been solved in version 4.0.1.
Solution: remove all ZFS snapshots in the pool that should be used for the refresh and restart the Engine.
-
Ensure no clone is using snapshots on the pool that should be used for the refresh.
Reset those that do if necessary. -
Destroy all ZFS snapshots in the pool that should be used for the refresh.
sudo zfs list sudo zfs destroy -rv 'dblab_pool_0/branch/main' -
Restart the DBLab Engine's container.
Needed to make it recognize the pool as available.sudo docker container restart 'dblab_server'
Further readings
- Website
- Codebase
- PostgreSQL
- Documentation
dblab- Extended Docker Images with PostgreSQL for Database Lab
- SE Docker Images with PostgreSQL