Files
oam/knowledge base/cloud computing/aws/rds.md

787 lines
37 KiB
Markdown

# Amazon Relational Database Service
1. [TL;DR](#tldr)
1. [Engine](#engine)
1. [PostgreSQL](#postgresql)
1. [Storage](#storage)
1. [Parameter Groups](#parameter-groups)
1. [Option Groups](#option-groups)
1. [Backup](#backup)
1. [Automatic backups](#automatic-backups)
1. [Manual backups](#manual-backups)
1. [Export snapshots to S3](#export-snapshots-to-s3)
1. [Restore](#restore)
1. [Encryption](#encryption)
1. [Operations](#operations)
1. [PostgreSQL: reduce allocated storage by migrating using transportable databases](#postgresql-reduce-allocated-storage-by-migrating-using-transportable-databases)
1. [Troubleshooting](#troubleshooting)
1. [ERROR: extension must be loaded via shared\_preload\_libraries](#error-extension-must-be-loaded-via-shared_preload_libraries)
1. [ERROR: must be superuser to alter _X_ roles or change _X_ attribute](#error-must-be-superuser-to-alter-x-roles-or-change-x-attribute)
1. [Transport fails asking for the remote user must have superuser, but it already does](#transport-fails-asking-for-the-remote-user-must-have-superuser-but-it-already-does)
1. [Further readings](#further-readings)
1. [Sources](#sources)
## TL;DR
<details>
<summary>CLI usage</summary>
```sh
# Show RDS instances.
aws rds describe-db-instances
aws rds describe-db-instances --output 'json' --query "DBInstances[?(DBInstanceIdentifier=='master-prod')]"
# Show Parameter Groups.
aws rds describe-db-parameters --db-parameter-group-name 'default.postgres15'
# Create parameter Groups.
aws rds create-db-parameter-group --db-parameter-group-name 'pg15-source-transport-group' \
--db-parameter-group-family 'postgres15' --description 'Parameter group with transport parameters enabled'
# Modify Parameter Groups.
aws rds modify-db-parameter-group --db-parameter-group-name 'pg15-source-transport-group' \
--parameters \
'ParameterName=pg_transport.num_workers,ParameterValue=4,ApplyMethod=pending-reboot' \
'ParameterName=pg_transport.timing,ParameterValue=1,ApplyMethod=pending-reboot' \
'ParameterName=pg_transport.work_mem,ParameterValue=131072,ApplyMethod=pending-reboot' \
'ParameterName=shared_preload_libraries,ParameterValue="pg_stat_statements,pg_transport",ApplyMethod=pending-reboot' \
'ParameterName=max_worker_processes,ParameterValue=24,ApplyMethod=pending-reboot'
# Restore instances from snapshots.
aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier 'myNewDbInstance' --db-snapshot-identifier 'myDbSnapshot'
# Restore instances to point in time.
aws rds restore-db-instance-to-point-in-time \
--target-db-instance-identifier 'myNewDbInstance' --source-db-instance-identifier 'oldDbInstance' \
--use-latest-restorable-time
# Start export tasks.
aws rds start-export-task \
--export-task-identifier 'db-finalSnapshot-2024' \
--source-arn 'arn:aws:rds:eu-west-1:012345678901:snapshot:db-prod-final-2024' \
--s3-bucket-name 'backups' \
--iam-role-arn 'arn:aws:iam::012345678901:role/CustomRdsS3Exporter' \
--kms-key-id 'arn:aws:kms:eu-west-1:012345678901:key/abcdef01-2345-6789-abcd-ef0123456789'
# Get export tasks' status.
aws rds describe-export-tasks
aws rds describe-export-tasks --export-task-identifier 'my-snapshot-export'
# Cancel tasks.
aws rds cancel-export-task --export-task-identifier 'my_export'
```
</details>
<br/>
RDS _Instances_ are managed database environments.<br/>
Instances _can_ be part of a _cluster_, or _standalone_ deployments.<br/>
RDS _Clusters_ are collections of RDS Instances built on the Aurora engine.<br/>
Cluster-specific resources (snapshots, etc) are prefixed by _Cluster_ in the APIs, e.g. `create-db-cluster-snapshot`,
`DBClusterIdentifier` and `DBClusterSnapshotIdentifier`.
Instances [**can** be renamed][renaming a db instance].<br/>
Renaming them has some effects and requirements. Check the reference.
> Try and keep the DBs identifiers under 22 characters when using PostgreSQL.<br/>
> The `pg_transport` extension will try and truncate any `host` argument to 63 characters.
RDS creates FQDNs for the Instances by suffixing the instance identifier with
`.{{12-char-internal-id}}.{{region}}.rds.amazonaws.com`.<br/>
That internal ID is generated by RDS and is based on the combination of the AWS Region and Account the instance is in.
Read replicas **can** be promoted to standalone DB instances.<br/>
See [Working with DB instance read replicas].
Disk free metrics are available in CloudWatch.
One can choose any of the following retention periods for instances' Performance Insights data:
- 7 days (default, free tier).
- _n_ months, where n is a number from 1 to 24.<br/>
In CLI and IaC, this number must be _n*31_.
Each and every DB instance has a 30-minutes weekly maintenance window defining when modifications and software patching
occur. Should it not be defined during creation, one will be assigned automatically at random from the default time
block for the region.<br/>
If any maintenance event is scheduled before the window, it's **initiated** in that time frame. Most maintenance events
complete during the 30-minute maintenance window, while larger events may take more.<br/>
Maintenance windows are paused when their DB instances are stopped.
## Engine
### PostgreSQL
Refer [Understanding PostgreSQL roles and permissions].
## Storage
Refer [Amazon RDS DB instance storage].
When selecting General Purpose SSD or Provisioned IOPS SSD, RDS automatically stripes storage across multiple volumes to
enhance performance depending on the engine selected and the amount of storage requested:
| DB engine | Storage size | Number of volumes provisioned |
| -------------------------------- | ----------------- | ----------------------------- |
| MariaDB<br/>MySQL<br/>PostgreSQL | Less than 400 GiB | 1 |
| MariaDB<br/>MySQL<br/>PostgreSQL | 400 to 65,536 GiB | 4 |
| Db2 | Less than 400 GiB | 1 |
| Db2 | 400 to 65,536 GiB | 4 |
| Oracle | Less than 200 GiB | 1 |
| Oracle | 200 to 65,536 GiB | 4 |
| SQL Server | Any | 1 |
When modifying a General Purpose SSD or Provisioned IOPS SSD volume, it goes through a sequence of states.<br/>
While the volume is in the `optimizing` state, volume performance is between the source and target configuration
specifications.<br/>
Transitional volume performance will be no less than the **lower** of the two specifications.
When increasing allocated storage, increases must be by at least of 10%. Trying to increase the value by less than 10%
will result in an error.<br/>
The allocated storage **cannot** be increased when restoring RDS for SQL Server DB instances.
> The allocated storage size of any DB instance **cannot be lowered** after creation.
Decrease the storage size of DB instances by creating a new instance with lower provisioned storage size, then migrate
the data into the new instance.<br/>
Use one of the following methods:
- Use the database engine's native dump and restore method.
This **will** require **long** downtime.
- Consider using [transportable DBs][migrating databases using rds postgresql transportable databases] when dealing with
PostgreSQL DBs should the requirements match.<br/>
This **will** require **_some_** downtime.
- [Perform an homogeneous data migration][migrating databases to their amazon rds equivalents with aws dms] using AWS's
[DMS][what is aws database migration service?]<br/>
This **should** require **minimal** downtime.
## Parameter Groups
Refer [Working with parameter groups].
Used to specify how a DB is configured.
- _Static_ parameters **require** instances to be rebooted after a change for the new value to take effect.
- _Dynamic_ parameters are applied at runtime and **do not** require instances to reboot after changing.
RDS instances using custom DB parameter groups allow for changes to values of _dynamic_ parameters while running.<br/>
Make changes by using the AWS Management Console, the AWS CLI, or the Amazon RDS API.
If one has enough privileges to do so, one can also change parameter values by using the `ALTER DATABASE`,
`ALTER ROLE`, and `SET` commands.
Learn about available parameters by describing the existing default ones:
```sh
aws rds describe-db-parameters --db-parameter-group-name 'default.postgres15'
aws rds describe-db-parameters --db-parameter-group-name 'default.postgres15' \
--query "Parameters[?ParameterName=='shared_preload_libraries']" --output 'table'
aws rds describe-db-parameters --db-parameter-group-name 'default.postgres15' \
--query "Parameters[?ParameterName=='shared_preload_libraries'].ApplyType" --output 'text'
aws rds describe-db-parameters --db-parameter-group-name 'default.postgres15' --output 'table' \
--query "Parameters[?ApplyType!='dynamic']"
```
## Option Groups
Used to enable and configure additional features and functionalities in a DB.
## Backup
RDS backup storage for each Region is calculated from both the automated backups and manual DB snapshots for that
Region.<br/>
Moving snapshots to other Regions increases the backup storage in the destination Regions.
Backups are stored in [S3].
Should one choose to retain automated backups when deleting DB instances, those backups are saved for the full retention
period; otherwise, all automated backups are deleted with the instance.<br/>
After automated backups are deleted, they **cannot** be recovered.
Should one choose to have RDS create a final DB snapshot before deleting a DB instance, one can use that or previously
created manual snapshots to recover it.
Taking backups can be unbearably slow depending on the amount of data needing to be copied.<br/>
For comparison, the first snapshot of a DB instance with standard 100 GiB `gp3` storage took about 3h to complete.
### Automatic backups
Automatic backups are storage volume snapshots of **entire** DB instances.
Automatic backups are **enabled** by default.<br/>
Setting the backup retention period to 0 disables them, setting it to a nonzero value (re)enables them.
> Enabling automatic backups takes the affected instances offline to have a backup created **immediately**.<br/>
> While the backup is created, the instance is kept in the _Modifying_ state. This **will** block actions on the
> instance and _could_ cause outages.
Automatic backups occur **daily** during the instances' backup window, configured in 30 minute periods. Should backups
require more time than allotted to the backup window, they will continue after the window ends and until they finish.
Backups are retained for up to 35 days (_backup retention period_).<br/>
One can recover DB instances to **any** point in time that sits inside the backup retention period.
The backup window **must not overlap** with the weekly maintenance window for DB instance or Multi-AZ DB cluster.<br/>
During automatic backup windows storage I/O might be suspended briefly while the backup process initializes.
Initialization typically takes up to a few seconds. One might also experience elevated latencies for a few minutes
during backups for Multi-AZ deployments.<br/>
For MariaDB, MySQL, Oracle and PostgreSQL Multi-AZ deployments, I/O activity isn't suspended on the primary instance as
the backup is taken from the standby.<br/>
Automated backups might occasionally be skipped if instances or clusters are running heavy workloads at the time backups
are supposed to start.
DB instances must be in the `available` state for automated backups to occur.<br/>
Automated backups don't occur while DB instances are in other states (i.e., `storage_full`).
Automated backups are **not** created while a DB instance or cluster is stopped.<br/>
RDS does **not** include time spent in the stopped state when the backup retention window is calculated. This means that
backups can be retained longer than the backup retention period if a DB instance has been stopped.
Automated backups will **not** occur while a DB snapshot copy is running in the same AWS Region for the same database.
### Manual backups
Back up DB instances manually by creating DB snapshots.<br/>
The first snapshot contains the data for the full database. Subsequent snapshots of the same database are incremental.
One can copy both automatic and manual DB snapshots, but only share manual DB snapshots.
Manual snapshots **never** expire and are retained indefinitely.
One can store up to 100 manual snapshots per Region.
### Export snapshots to S3
One can export DB snapshot data to [S3] buckets.<br/>
RDS spins up an instance from the snapshot, extracts data from it and stores the data in Apache Parquet format.<br/>
By default **all** data in the snapshots is exported, but one can specify specific sets of databases, schemas, or tables
to export.
- The export process runs in the background and does **not** affect the performance of active DB instances.
- Multiple export tasks for the same DB snapshot cannot run simultaneously. This applies to both full and partial
exports.
- Exporting snapshots from DB instances that use magnetic storage isn't supported.
- The following characters aren't supported in table column names:
```plaintext
, ; { } ( ) \n \t = (space) /
```
Tables containing those characters in column names are skipped during export.
- PostgreSQL _temporary_ and _unlogged_ tables are skipped during export.
- Large objects in the data, like BLOBs or CLOBs, close to or greater than 500 MB will make the export fail.
- Large rows close to or greater than 2 GB will make their table being skipped during export.
- Data exported from snapshots to S3 cannot be restored to new DB instances.
- The snapshot export tasks require a role with write-access permission to the destination S3 bucket:
```json
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Principal": {
"Service": "export.rds.amazonaws.com"
}
}]
}
```
```json
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"s3:PutObject*",
"s3:ListBucket",
"s3:GetObject*",
"s3:DeleteObject*",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::bucket",
"arn:aws:s3:::bucket/*"
]
}]
}
```
After the export, one can analyze the data directly through
[Athena](https://docs.aws.amazon.com/athena/latest/ug/parquet-serde.html) or
[Redshift Spectrum](https://docs.aws.amazon.com/redshift/latest/dg/copy-usage_notes-copy-from-columnar.html).
<details>
<summary>In the Console</summary>
The _Export to Amazon S3_ console option appears only for snapshots that can be exported to Amazon S3.<br/>
Snapshots might not be available for export because of the following reasons:
- The DB engine isn't supported for S3 export.
- The DB instance version isn't supported for S3 export.
- S3 export isn't supported in the AWS Region where the snapshot was created.
</details>
<details>
<summary>Using the CLI</summary>
```sh
# Start new tasks.
$ aws rds start-export-task \
--export-task-identifier 'db-finalSnapshot-2024' \
--source-arn 'arn:aws:rds:eu-west-1:012345678901:snapshot:db-prod-final-2024' \
--s3-bucket-name 'backups' --s3-prefix 'rds' \
--iam-role-arn 'arn:aws:iam::012345678901:role/CustomRdsS3Exporter' \
--kms-key-id 'arn:aws:kms:eu-west-1:012345678901:key/abcdef01-2345-6789-abcd-ef0123456789'
{
"ExportTaskIdentifier": "db-finalSnapshot-2024",
"IamRoleArn": "arn:aws:iam::012345678901:role/CustomRdsS3Exporter",
"KmsKeyId": "arn:aws:kms:eu-west-1:012345678901:key/abcdef01-2345-6789-abcd-ef0123456789",
"PercentProgress": 0,
"S3Bucket": "backups",
"S3Prefix": "rds",
"SnapshotTime": "2024-06-17T09:04:41.387000+00:00",
"SourceArn": "arn:aws:rds:eu-west-1:012345678901:snapshot:db-prod-final-2024",
"Status": "STARTING",
"TotalExtractedDataInGB": 0
}
# Get tasks' status.
$ aws rds describe-export-tasks
$ aws rds describe-export-tasks --export-task-identifier 'db-finalSnapshot-2024'
$ aws rds describe-export-tasks --query 'ExportTasks[].WarningMessage' --output 'yaml'
# Cancel tasks.
$ aws rds cancel-export-task --export-task-identifier 'my_export'
{
"Status": "CANCELING",
"S3Prefix": "",
"ExportTime": "2019-08-12T01:23:53.109Z",
"S3Bucket": "DOC-EXAMPLE-BUCKET",
"PercentProgress": 0,
"KmsKeyId": "arn:aws:kms:AWS_Region:123456789012:key/K7MDENG/bPxRfiCYEXAMPLEKEY",
"ExportTaskIdentifier": "my_export",
"IamRoleArn": "arn:aws:iam::123456789012:role/export-to-s3",
"TotalExtractedDataInGB": 0,
"TaskStartTime": "2019-11-13T19:46:00.173Z",
"SourceArn": "arn:aws:rds:AWS_Region:123456789012:snapshot:export-example-1"
}
```
</details>
## Restore
DB instances **can** be restored from DB snapshots.<br/>
Restoring instances from snapshots requires the new instances to have **equal or more** allocated storage than what the
original instance had allocated at the time the snapshot was taken.
```sh
aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier 'myNewDbInstance' --db-snapshot-identifier 'myDbSnapshot'
aws rds restore-db-instance-to-point-in-time \
--target-db-instance-identifier 'myNewDbInstance' --source-db-instance-identifier 'oldDbInstance' \
--use-latest-restorable-time
```
Should source snapshots be from an instance with automatic backups enabled, the restored DB instance will backup itself
right after creation.<br/>
Refer the [Backup] section for what this means.
> There is currently **no** way to prevent the backup being generated at instance creation time.<br/>
> That process is triggered automatically and the feature can only be toggled on and off for existing instances.
>
> The `BackupRetentionPeriod` flag is part of both instances and snapshot definitions, but can only be set on
> instances.<br/>
> To create instances with this flag set to `0` and thus have **no** backup automatically taken, the source snapshot
> _must_ have this flag already set to `0`. This can only happen if the source instance had the flag set to `0` when the
> snapshot was taken in the first place.
## Encryption
RDS automatically integrates with AWS KMS for key management.
By default, RDS uses the _RDS AWS managed key_ (`aws/rds`) from KMS for encryption.<br/>
This key can't be managed, rotated, nor deleted by users.
RDS will automatically put databases into a terminal state when access to the KMS key is required but the key has been
disabled or deleted, or its permissions have been somehow revoked.<br/>
This change could be immediate or deferred depending on the use case that required access to the KMS key.<br/>
In this terminal state, DB instances are no longer available and their databases' current state can't be recovered. To
restore DB instances, one must first re-enable access to the KMS key for RDS, and then restore the instances from their
latest available backup.
## Operations
### PostgreSQL: reduce allocated storage by migrating using transportable databases
Refer [Migrating databases using RDS PostgreSQL Transportable Databases],
[Transporting PostgreSQL databases between DB instances] and
[Transport PostgreSQL databases between two Amazon RDS DB instances using pg_transport].
The `pg_transport` enables streaming the database files with minimal processing by making a target DB instance import
a database from a source DB instance.
> When the transport begins, all current sessions on the **source** database are ended and the DB is put in ReadOnly
> mode.<br/>
> Only the specific source database that is being transported is affected. Others are **not** affected.
Primary instances with replicas **can** be used as source instances.<br/>
TODO: test using a RO replica **as** the source instance. I expect this will **not** work due to the transport extension
putting the source DB in RO mode.
> The in-transit database will be **inaccessible** on the **target** DB instance for the duration of the transport.<br/>
> During transport, the target DB instance **cannot be restored** to a point in time, as the transport is **not**
> transactional and does **not** use the PostgreSQL write-ahead log to record changes.
<details>
<summary>Limitations</summary>
- The access privileges (including the _default_ ones) and ownership from the source database are **not** transferred to
the target instance.<br/>
Dump them from the source, or (preferred) keep sql files with their definitions close to recreate them in other ways.
- Databases **cannot** be transported onto read replicas or parent instances of read replicas.<br/>
They _can_ be read _from_ instances with replicas, though.
- `reg` data types **cannot** be used in any source database's table that are about to be transported.
- There can be **up to 32** total transports (including both imports and exports) active at the same time on any DB
instance.
- All the DB's data is migrated **as is**.
- Triggers and functions are apparently not transported either.<br/>
Noticed after a production DB migration.
- All extensions must be dropped from the source database.<br/>
> This means that, for some extensions, the data they manage is also dropped.
</details>
<details>
<summary>Requirements</summary>
- A **source DB** to copy data from.
- A **target instance** to copy the DB to.
> Since the transport will create the DB on the target, the target instance must **not** contain the database that
> needs to be transported.<br/>
> Should the target contain the DB already, it **will** need to be dropped beforehand.
- Both DB instances **must** run the same **major** version of PostgreSQL.<br/>
Differences in **minor** versions seem to be fine.
- Should the source DB have the `pgaudit` extension _loaded_, that extension will **need** to be _installed_ on the
target instance.
- The target instance **must** be able to connect to the source instance.
- All source database objects **must** reside in the default `pg_default` tablespace.
- The source DB (but not _other_ DBs on the same source instance) will need to:
- Be put in **Read Only** mode (automatic, done during transport).
- Have all installed extensions **removed**.
To avoid locking the operator's machine for the time needed by the transport, it is suggested the use of an EC2 instance
as the middleman to operate on both DBs.
> Keep the DBs identifiers under 22 characters.<br/>
> The `pg_transport` extension will try and truncate any `host` argument to 63 characters, and RDS FQDNs are something
> like `{{instance-id}}.{{12-char-internal-id}}.{{region}}.rds.amazonaws.com`.
</details>
<details>
<summary>Procedure</summary>
1. Enable the required configuration parameters and `pg_transport` extension on the source and target RDS instances.<br/>
Create a new RDS Parameter Group or modify the existing one used by the source.
Required parameters:
- `shared_preload_libraries` **must** include `pg_transport`.<br/>
Static parameter, requires reboot.
- `pg_transport.num_workers` **must** be tuned.<br/>
Its value determines the number of `transport.send_file` workers that will be created in the source. Defaults to 3.
- `max_worker_processes` **must** be at least (3 * `pg_transport.num_workers`) + 9.<br/>
Required on the destination to handle various background worker processes involved in the transport.<br/>
Static parameter, requires reboot.
- `pg_transport.work_mem` _can_ be tuned.<br/>
Specifies the maximum memory to allocate to each worker. Defaults to 131072 (128 MB) or 262144 (256 MB) depending
on the PostgreSQL version.
- `pg_transport.timing` _can_ be set to `1`.<br/>
Specifies whether to report timing information during the transport. Defaults to 1 (true), meaning that timing
information is reported.
1. Assign the Parameter Group to the source instance and reboot it to apply static changes.
1. Create a new _target_ instance with the required allocated storage.<br/>
Check the requirements again.
1. Make sure the middleman can connect to both DBs.
1. Make sure the _target_ DB instance can connect to the _source_.
1. Make sure one has a way to reinstate existing roles and permissions onto the target.<br/>
Dump existing roles and permissions from the source if required on the target.
RDS does **not** grant _full_ SuperUser permissions even to instances' master users. This makes impossible to use
`pg_dumpall -r` to _fully_ dump rules and permissions from the source.<br/>
One **_can_** export them by **excluding the passwords** from the dump:
```sh
pg_dumpall -h 'source-instance.5f7mp3pt3n6e.eu-west-1.rds.amazonaws.com' -U 'admin' -l 'postgres' -W \
-rf 'roles.sql' --no-role-passwords
```
but statements involving protected roles (like `rdsadmin` and any other matching `rds_*`) and change in 'superuser'
or 'replication' attributes will fail on restore.<br/>
Clean them up from the dump:
```sh
# Ignore *everything* involving the 'rdsadmin' user.
# Ignore the creation or alteration of AWS-managed RDS roles.
# Ignore changes involving protected attributes.
sed -Ei'.backup' \
-e '/rdsadmin/d' \
-e '/(CREATE|ALTER) ROLE rds_/d' \
-e 's/(NO)(SUPERUSER|REPLICATION)\s?//g' \
'roles.sql'
```
1. Prepare the **source** DB for transport:
1. Connect to the DB:
```sh
psql -h 'source-instance.5f7mp3pt3n6e.eu-west-1.rds.amazonaws.com' -p '5432' -d 'source_db' -U 'admin' --password
```
1. Only the `pg_transport` extension is allowed in the source DB during the actual transport operation.<br/>
**Remove** all extensions but `pg_transport` from the public schema of the DB instance:
```sql
SELECT "extname" FROM "pg_extension";
DROP EXTENSION IF EXISTS "btree_gist", "pgcrypto", …, "postgis" CASCADE;
```
1. Load the `pg_transport` extension if missing:
```sql
CREATE EXTENSION IF NOT EXISTS "pg_transport";
```
1. Prepare the **target** DB for transport:
1. The instance must **not** contain a DB with the same name of the source, as the transport will create it on the
target.<br/>
Connect to a _different_ DB than the source's:
```sh
psql -h 'target-instance.5f7mp3pt3n6e.eu-west-1.rds.amazonaws.com' -p '5432' -d 'postgres' -U 'admin' --password
```
1. Make sure no DB exists with the same name of the source DB:
```sql
DROP DATABASE IF EXISTS "source_db";
```
1. Load the `pg_transport` extension if missing:
```sql
CREATE EXTENSION IF NOT EXISTS "pg_transport";
```
1. \[optional] Test the transport by running the `transport.import_from_server` function on the **target** DB instance:
```sql
-- Keep arguments in *single* quotes here
SELECT transport.import_from_server(
'source-instance.5f7mp3pt3n6e.eu-west-1.rds.amazonaws.com', 5432,
'admin', 'source-user-password', 'source_db',
'target-user-password',
true
);
```
1. Run the transport by running the `transport.import_from_server` function on the **target** DB instance:
```sql
SELECT transport.import_from_server( …, …, …, …, …, …, false );
```
1. Validate the data in the target.
1. Restore uninstalled extensions in the public schema of **both** DB instances.<br/>
`pg_transport` _can_ be now dropped if not necessary anymore.
1. Restore all the needed roles and permissions onto the target:
```sh
psql -h 'target-instance.5f7mp3pt3n6e.eu-west-1.rds.amazonaws.com' -p '5432' -U 'admin' -d 'postgres' --password \
-f 'roles.sql'
```
> Restoring roles from raw dumps **will** throw a lot of errors about altering superuser attributes or protected
> roles. Check the list item about dumping data above.
1. Revert the value of the `max_worker_processes` parameter if necessary.<br/>
This will require a restart of the instance.
</details>
<br/>
If the target DB instance has automatic backups enabled, a backup is automatically taken after transport completes.<br/>
Point-in-time restores will be available for times after the backup finishes.
Should the transport fail, the `pg_transport` extension will attempt to undo all changes to the source **and** target DB
instances. This includes removing the destination's partially transported database.<br/>
Depending on the type of failure, the source database might continue to reject write-enabled queries. Should this
happen, allow write-enabled queries manually:
```sql
ALTER DATABASE db-name SET default_transaction_read_only = false;
```
<details>
<summary>Performance tests</summary>
<details style="margin: 0 0 0 1em">
<summary><code>db.t4g.medium</code> to <code>db.t4g.medium</code>, gp3 storage, ~ 350 GB database</summary>
Interruptions are due to the exhaustion of I/O burst credits, which tainted the benchmark.
| | 1st run | 2nd run | 3rd and 6th run | 4 | 5 |
| -------------------------- | ------------------- | ------------------- | ----------------- | ------------------- | ------------------- |
| `pg_transport.num_workers` | 2 | 4 | 8 | 8 | 12 |
| `max_worker_processes` | 15 | 21 | 33 | 33 | 45 |
| `pg_transport.work_mem` | 131072 (128 MB) | 131072 (128 MB) | 131072 (128 MB) | 262144 (256 MB) | 131072 (128 MB) |
| Minimum transfer rate | ~ 19 MB/s | ~ 19 MB/s | ~ 50 MB/s | ~ 4 MB/s | ~ 25 MB/s |
| Maximum transfer rate | ~ 58 MB/s | ~ 95 MB/s | ~ 255 MB/s | ~ 255 MB/s | ~ 165 MB/s |
| Average transfer rate | ~ 31 MB/s | ~ 66 MB/s | ~ 138 MB/s | ~ 101 MB/s | ~ 85 MB/s |
| Time estimated after 10m | ~ 3h 13m | ~ 1h 36m | ~ 52m | ~ 1h | ~ 1h 11m |
| Time taken | N/A (interrupted) | N/A (interrupted) | N/A (interrupted) | N/A (interrupted) | N/A (interrupted) |
| Source CPU usage | ~ 10% | ~ 15% | ~ 40% | ~ 39% | ~ 37% |
| Source RAM usage delta | N/A (did not check) | N/A (did not check) | + ~ 1.5 GB | N/A (did not check) | N/A (did not check) |
| Target CPU usage | ~ 12% | ~ 18% | ~ 34% | ~ 28% | ~ 25% |
| Target RAM usage delta | N/A (did not check) | N/A (did not check) | + ~ 1.5 GB | N/A (did not check) | N/A (did not check) |
</details>
<details style="margin: 0 0 0 1em">
<summary><code>db.m6i.xlarge</code> to <code>db.m6i.xlarge</code>, gp3 storage, ~ 390 GB database</summary>
| | 1st run | 2nd to 5th run |
| -------------------------- | --------------- | --------------- |
| `pg_transport.num_workers` | 8 | 16 |
| `max_worker_processes` | 33 | 57 |
| `pg_transport.work_mem` | 131072 (128 MB) | 131072 (128 MB) |
| Minimum transfer rate | ~ 97 MB/s | ~ 248 MB/s |
| Maximum transfer rate | ~ 155 MB/s | ~ 545 MB/s |
| Average transfer rate | ~ 135 MB/s | ~ 490 MB/s |
| Time estimated after 10m | ~ 46m | ~ 14m |
| Time taken | ~ 48m | ~ 14m |
| Source CPU usage | ~ 12% | ~ 42% |
| Source RAM usage delta | + ~ 940 MB | + ~ 1.5 GB |
| Target CPU usage | ~ 17% | ~ 65% |
| Target RAM usage delta | + ~ 1.3 GB | + ~ 3.3 GB |
</details>
</details>
## Troubleshooting
### ERROR: extension must be loaded via shared_preload_libraries
Refer [How can I resolve the "ERROR: <module/extension> must be loaded via shared_preload_libraries" error?]
1. Include the module or extension in the `shared_preload_libraries` parameter in the Parameter Group.
1. Reboot the instance to apply the change.
1. Try reloading it again.
### ERROR: must be superuser to alter _X_ roles or change _X_ attribute
Error message examples:
> ERROR: must be superuser to alter superuser roles or change superuser attribute<br/>
> ERROR: must be superuser to alter replication roles or change replication attribute
RDS does **not** grant _full_ SuperUser permissions even to instances' master users.<br/>
Actions involving altering protected roles or changing protected attributes are practically blocked on RDS.
### Transport fails asking for the remote user must have superuser, but it already does
Error message example:
> ```plaintext
> Cannot execute SQL 'SELECT transport.import_from_server(
> 'source.ab0123456789.eu-west-1.rds.amazonaws.com',
> 5432,
> 'mastarr',
> '********',
> 'sales',
> '********',
> true
> );' None: remote user must have superuser (or rds_superuser if on RDS)
> ```
_Speculative_ root cause: RDS did not finish to properly apply the settings.
Solution: reboot the source and target instance and retry.
## Further readings
- [Working with DB instance read replicas]
- [Working with parameter groups]
- [How can I resolve the "ERROR: <module/extension> must be loaded via shared_preload_libraries" error?]
- [Understanding PostgreSQL roles and permissions]
### Sources
- [Pricing and data retention for Performance Insights]
- [Introduction to backups]
- [Restoring from a DB snapshot]
- [AWS KMS key management]
- [Amazon RDS DB instance storage]
- [How can I decrease the total provisioned storage size of my Amazon RDS DB instance?]
- [What is AWS Database Migration Service?]
- [Migrating databases to their Amazon RDS equivalents with AWS DMS]
- [Transporting PostgreSQL databases between DB instances]
- [Migrating databases using RDS PostgreSQL Transportable Databases]
- [Importing data into PostgreSQL on Amazon RDS]
- [Working with parameters on your RDS for PostgreSQL DB instance]
- [Backing up login roles aka users and group roles]
- [Renaming a DB instance]
- [Amazon RDS DB instances]
- [Maintaining a DB instance]
- [Disabling AWS RDS backups when creating/updating instances?]
<!--
Reference
═╬═Time══
-->
<!-- In-article sections -->
[backup]: #backup
<!-- Knowledge base -->
[s3]: s3.md
<!-- Files -->
<!-- Upstream -->
[amazon rds db instance storage]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html
[amazon rds db instances]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.DBInstance.html
[aws kms key management]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.Encryption.Keys.html
[how can i decrease the total provisioned storage size of my amazon rds db instance?]: https://repost.aws/knowledge-center/rds-db-storage-size
[how can i resolve the "error: <module/extension> must be loaded via shared_preload_libraries" error?]: https://repost.aws/knowledge-center/rds-postgresql-resolve-preload-error
[importing data into postgresql on amazon rds]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL.Procedural.Importing.html
[introduction to backups]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithAutomatedBackups.html
[maintaining a db instance]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_UpgradeDBInstance.Maintenance.html
[migrating databases to their amazon rds equivalents with aws dms]: https://docs.aws.amazon.com/dms/latest/userguide/data-migrations.html
[migrating databases using rds postgresql transportable databases]: https://aws.amazon.com/blogs/database/migrating-databases-using-rds-postgresql-transportable-databases/
[pricing and data retention for performance insights]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.Overview.cost.html
[renaming a db instance]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_RenameInstance.html
[restoring from a db snapshot]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_RestoreFromSnapshot.html
[transport postgresql databases between two amazon rds db instances using pg_transport]: https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/transport-postgresql-databases-between-two-amazon-rds-db-instances-using-pg_transport.html
[transporting postgresql databases between db instances]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL.TransportableDB.html
[understanding postgresql roles and permissions]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.Roles.html
[what is aws database migration service?]: https://docs.aws.amazon.com/dms/latest/userguide/Welcome.html
[working with db instance read replicas]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html
[working with parameter groups]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_WorkingWithParamGroups.html
[working with parameters on your rds for postgresql db instance]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.Parameters.html
<!-- Others -->
[backing up login roles aka users and group roles]: https://www.postgresonline.com/article_pfriendly/81.html
[disabling aws rds backups when creating/updating instances?]: https://stackoverflow.com/questions/35709153/disabling-aws-rds-backups-when-creating-updating-instances