brandon/oam

Fork 0

mirror of https://gitea.com/mcereda/oam.git synced 2026-02-09 05:44:23 +00:00

Files

Michele Cereda cb8e9d1590 fix(kb/aws/rds): specify reference where needed

2025-11-11 11:43:46 +01:00

52 KiB

Raw Blame History

Amazon Relational Database Service

TL;DR
Engine
1. PostgreSQL
Burstable instances
Storage
1. Storage optimization
2. Storage encryption
Parameter Groups
Option Groups
Backup
Restore
Multi-AZ instances
1. Converting instances between Multi-AZ and Single-AZ
Operations
Troubleshooting
Further readings
1. Sources

TL;DR

RDS Instances are managed database environments.
Instances can be part of a cluster, or standalone deployments.

RDS Clusters are collections of RDS Instances built on the Aurora engine.
Cluster-specific resources (snapshots, etc) are prefixed by Cluster in the APIs, e.g. create-db-cluster-snapshot, DBClusterIdentifier and DBClusterSnapshotIdentifier.

T instances are burst for CPU, disk, and network.
They are always configured to burst in Unlimited mode in RDS.

Instances can be renamed.
Renaming them has some effects and requirements. Check the reference.

Try and keep the DBs identifiers under 22 characters when using PostgreSQL.
The pg_transport extension will try and truncate any host argument to 63 characters.

RDS creates FQDNs for the Instances by suffixing the instance identifier with .{{12-char-internal-id}}.{{region}}.rds.amazonaws.com.
That internal ID is generated by RDS and is based on the combination of the AWS Region and Account the instance is in.

Read replicas can be promoted to standalone DB instances.
See Working with DB instance read replicas.

Disk free metrics are available in CloudWatch.

Turning Performance Insights on and off does not cause downtime, a reboot, or a failover.
One can choose any of the following retention periods for instances' Performance Insights data:

7 days (default, free tier).
n months, where n is a number from 1 to 24.
This must be n*31 for API calls (including the CLI).
731 days.

Each and every DB instance has a 30-minutes weekly maintenance window defining when modifications and software patching occur. Should it not be defined during creation, one will be assigned automatically at random from the default time block for the region.
If any maintenance event is scheduled before the window, it's initiated in that time frame. Most maintenance events complete during the 30-minute maintenance window, while larger events may take more.
Maintenance windows are paused when their DB instances are stopped.

Watch out for changes application order and timing.

Example: creating a DB instance from snapshot with defined Parameter Group

The request of creation from snapshot is received by the AWS APIs.
The Parameter Group's name is defined here.

The DB instance is created with a default Parameter Group.

The Parameter group is due for change, but this does NOT come up as a pending modified value.
Checks for pending changes will miss it.

The DB instance's state goes from creating to backing-up.
This backup usually takes very little for some unknown reason.

The change in Parameter Group is applied now, requiring the DB instance to be rebooted.
The instance's state goes to modifying, then rebooting.

NOW the instance is ready for use.

CLI commands

# Show details of RDS instances.
aws rds describe-db-instances
aws rds describe-db-instances --output 'json' --query "DBInstances[?(DBInstanceIdentifier=='master-prod')]"

# Enable Performance Insights.
aws rds modify-db-cluster --db-cluster-identifier 'staging-cluster' \
  --enable-performance-insights --performance-insights-retention-period '93' \
  --database-insights-mode 'standard'


# Show Parameter Groups.
aws rds describe-db-parameters --db-parameter-group-name 'default.postgres15'

# Create Parameter Groups.
aws rds create-db-parameter-group --db-parameter-group-name 'pg15-source-transport-group' \
  --db-parameter-group-family 'postgres15' --description 'Parameter group with transport parameters enabled'

# Modify Parameter Groups.
aws rds modify-db-parameter-group --db-parameter-group-name 'pg15-source-transport-group' \
  --parameters \
    'ParameterName=pg_transport.num_workers,ParameterValue=4,ApplyMethod=pending-reboot' \
    'ParameterName=pg_transport.timing,ParameterValue=1,ApplyMethod=pending-reboot' \
    'ParameterName=pg_transport.work_mem,ParameterValue=131072,ApplyMethod=pending-reboot' \
    'ParameterName=shared_preload_libraries,ParameterValue="pg_stat_statements,pg_transport",ApplyMethod=pending-reboot' \
    'ParameterName=max_worker_processes,ParameterValue=24,ApplyMethod=pending-reboot'


# Restore instances from snapshots.
aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier 'myNewDbInstance' --db-snapshot-identifier 'myDbSnapshot'

# Restore instances to point in time.
aws rds restore-db-instance-to-point-in-time \
  --target-db-instance-identifier 'myNewDbInstance' --source-db-instance-identifier 'oldDbInstance' \
  --use-latest-restorable-time

# Start export tasks.
aws rds start-export-task \
  --export-task-identifier 'db-finalSnapshot-2024' \
  --source-arn 'arn:aws:rds:eu-west-1:012345678901:snapshot:db-prod-final-2024' \
  --s3-bucket-name 'backups' \
  --iam-role-arn 'arn:aws:iam::012345678901:role/CustomRdsS3Exporter' \
  --kms-key-id 'arn:aws:kms:eu-west-1:012345678901:key/abcdef01-2345-6789-abcd-ef0123456789'

# Get export tasks' status.
aws rds describe-export-tasks
aws rds describe-export-tasks --export-task-identifier 'my-snapshot-export'

# Cancel export tasks.
aws rds cancel-export-task --export-task-identifier 'my_export'


# Change the storage type.
aws rds modify-db-instance --db-instance-identifier 'instance-name' --storage-type 'gp3' --apply-immediately

Engine

PostgreSQL

Refer Understanding PostgreSQL roles and permissions.

PostgreSQL-flavoured RDS multi-AZ clusters do not ensure Snapshot Isolation. Instead, they may provide Parallel Snapshot Isolation, a slightly weaker model.
Refer Kyle Kingsbury's Amazon RDS for PostgreSQL 17.4 analysis.

Burstable instances

T instances are burstable.

Refer the relative section in the EC2 article, with the difference that RDS instances are always configured for Unlimited mode.

Storage

Refer Amazon RDS DB instance storage and EBS.

When selecting General Purpose SSD or Provisioned IOPS SSD, RDS automatically stripes storage across multiple EBS volumes.
This enhances performance depending on the selected engine, and the amount of storage requested:

DB engine	Storage size	Number of EBS volumes
MariaDB MySQL PostgreSQL	Less than 400 GiB	1
MariaDB MySQL PostgreSQL	400 to 65,536 GiB	4
Db2	Less than 400 GiB	1
Db2	400 to 65,536 GiB	4
Oracle	Less than 200 GiB	1
Oracle	200 to 65,536 GiB	4
SQL Server	Any	1

When modifying a General Purpose SSD or Provisioned IOPS SSD volume, it goes through a sequence of states.
While the volume is in the optimizing state, volume performance is between the source and target configuration specifications.
Transitional volume performance will be no less than the lower of the two specifications.

When increasing allocated storage, increases must be by at least of 10%. Trying to increase the value by less than 10% will result in an error.
The allocated storage cannot be increased when restoring RDS for SQL Server DB instances.

Warning

The allocated storage size of any DB instance cannot be reduced after creation.

Decrease the storage size of DB instances by creating a new instance with lower provisioned storage size, then migrate the data into the new instance.
Use one of the following methods:

Use the database engine's native dump and restore method. This will require long downtime.
Consider using transportable DBs when dealing with PostgreSQL DBs should the requirements match.
This will require some downtime.
Perform an homogeneous data migration using AWS's DMS
This should require minimal downtime.

RDS instances using GP2 storage can convert their volumes to GP3 just by modifying the DB instance.
This operation does cause downtime, but performances will be impacted until the process ends and the change will trigger storage optimization for the instance.
Refer Changing RDS storage from gp2 to gp3.

Storage optimization

Refer Why is my Amazon RDS DB instance in the storage-optimization state for a long time?.

When modifying the storage size or type of a DB instance, that instance enters the storage-optimization state.
RDS automatically performs the storage optimization process and evenly distributes the data to the EBS volumes after storage modification.

Warning

One cannot make further storage modifications for either 6 hours, or until storage optimization completes on the instance (whatever is longer).
One can still perform any other instance modifications, such as scaling the instance size or performing a reboot.

In most cases, scaled storage does not cause outages or performance degradation. However, the storage optimization process still takes typically several hours and up to more than a day.
The impacted instance is operational and still available during the whole process, unless reboots are required for specific cases, such as a change to the storage type between SSD and magnetic disks.

Important

One cannot speed up storage optimization, and must wait for the process to complete.
The process takes longer for larger storage size increases and storage usage. Because it's automated, there's no real way to determine how long it will takes to complete.

Storage encryption

RDS automatically integrates with AWS KMS for key management.

By default, RDS uses the RDS AWS managed key (aws/rds) from KMS for encryption.
This key can't be managed, rotated, nor deleted by users.

RDS will automatically put databases into a terminal state when access to the KMS key is required but the key has been disabled or deleted, or its permissions have been somehow revoked.
This change could be immediate or deferred depending on the use case that required access to the KMS key.
In this terminal state, DB instances are no longer available and their databases' current state can't be recovered. To restore DB instances, one must first re-enable access to the KMS key for RDS, and then restore the instances from their latest available backup.

Parameter Groups

Refer Working with parameter groups.

Used to specify how a DB is configured.

Static parameters require instances to be rebooted after a change for the new value to take effect.
Dynamic parameters are applied at runtime and do not require instances to reboot after changing.

RDS instances using custom DB parameter groups allow for changes to values of dynamic parameters while running.
Make changes by using the AWS Management Console, the AWS CLI, or the Amazon RDS API.

If one has enough privileges to do so, one can also change parameter values by using the ALTER DATABASE, ALTER ROLE, and SET commands.

Learn about available parameters by describing the existing default ones:

aws rds describe-db-parameters --db-parameter-group-name 'default.postgres15'
aws rds describe-db-parameters --db-parameter-group-name 'default.postgres15' \
  --query "Parameters[?ParameterName=='shared_preload_libraries']" --output 'table'

aws rds describe-db-parameters --db-parameter-group-name 'default.postgres15' \
  --query "Parameters[?ParameterName=='shared_preload_libraries'].ApplyType" --output 'text'

aws rds describe-db-parameters --db-parameter-group-name 'default.postgres15' --output 'table' \
  --query "Parameters[?ApplyType!='dynamic']"

Option Groups

Used to enable and configure additional features and functionalities in a DB.

Backup

RDS snapshot storage is calculated per Region.
Both the automated backups and manual DB snapshots for that Region concur to the total value.
Moving snapshots to other Regions increases the backup storage value for the destination Regions.

Snapshots are stored in S3.

Should one choose to retain automated backups when deleting DB instances, those backups are saved for the full retention period; otherwise, all automated backups are deleted with the instance.
After automated backups are deleted, they cannot be recovered.

Should one choose to have RDS create a final DB snapshot before deleting a DB instance, one can use that or previously created manual snapshots to recover it.

Taking backups can be unbearably slow depending on the amount of data needing to be copied.
For reference, the first snapshot of a DB instance with standard 100 GiB gp3 storage took about 3h to complete.

Automatic backups

Automatic backups are storage volume snapshots of entire DB instances.

Automatic backups are enabled by default.
Setting the backup retention period to 0 disables them, setting it to a nonzero value (re)enables them.

Enabling automatic backups takes the affected instances offline to have a backup created immediately.
While the backup is created, the instance is kept in the Modifying state. This will block actions on the instance and could cause outages.

Automatic backups occur daily during the instances' backup window, configured in 30 minute periods. Should backups require more time than allotted to the backup window, they will continue after the window ends and until they finish.

Backups are retained for up to 35 days (backup retention period).
One can recover DB instances to any point in time that sits inside the backup retention period.

The backup window must not overlap with the weekly maintenance window for DB instance or Multi-AZ DB cluster.
During automatic backup windows storage I/O might be suspended briefly while the backup process initializes. Initialization typically takes up to a few seconds. One might also experience elevated latencies for a few minutes during backups for Multi-AZ deployments.
For MariaDB, MySQL, Oracle and PostgreSQL Multi-AZ deployments, I/O activity isn't suspended on the primary instance as the backup is taken from the standby.
Automated backups might occasionally be skipped if instances or clusters are running heavy workloads at the time backups are supposed to start.

DB instances must be in the available state for automated backups to occur.
Automated backups don't occur while DB instances are in other states (i.e., storage_full).

Automated backups are not created while a DB instance or cluster is stopped.
RDS does not include time spent in the stopped state when the backup retention window is calculated. This means that backups can be retained longer than the backup retention period if a DB instance has been stopped.

Automated backups will not occur while a DB snapshot copy is running in the same AWS Region for the same database.

Manual backups

Back up DB instances manually by creating DB snapshots.
The first snapshot contains the data for the full database. Subsequent snapshots of the same database are incremental.

One can copy both automatic and manual DB snapshots, but only share manual DB snapshots.

Manual snapshots never expire and are retained indefinitely.

One can store up to 100 manual snapshots per Region.

Export snapshots to S3

One can export DB snapshot data to S3 buckets.
RDS spins up an instance from the snapshot, extracts data from it and stores the data in Apache Parquet format.
By default all data in the snapshots is exported, but one can specify specific sets of databases, schemas, or tables to export.

The export process runs in the background and does not affect the performance of active DB instances.
Multiple export tasks for the same DB snapshot cannot run simultaneously. This applies to both full and partial exports.
Exporting snapshots from DB instances that use magnetic storage isn't supported.
The following characters aren't supported in table column names:
```
, ; { } ( ) \n \t = (space) /
```
Tables containing those characters in column names are skipped during export.
PostgreSQL temporary and unlogged tables are skipped during export.
Large objects in the data, like BLOBs or CLOBs, close to or greater than 500 MB will make the export fail.
Large rows close to or greater than 2 GB will make their table being skipped during export.
Data exported from snapshots to S3 cannot be restored to new DB instances.

The snapshot export tasks require a role with write-access permission to the destination S3 bucket:

{
  "Version": "2012-10-17",
  "Statement": [{
      "Effect": "Allow",
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "export.rds.amazonaws.com"
      }
  }]
}

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
        "s3:PutObject*",
        "s3:ListBucket",
        "s3:GetObject*",
        "s3:DeleteObject*",
        "s3:GetBucketLocation"
    ],
    "Resource": [
        "arn:aws:s3:::bucket",
        "arn:aws:s3:::bucket/*"
    ]
  }]
}

After the export, one can analyze the data directly through Athena or Redshift Spectrum.

In the Console

The Export to Amazon S3 console option appears only for snapshots that can be exported to Amazon S3.
Snapshots might not be available for export because of the following reasons:

The DB engine isn't supported for S3 export.
The DB instance version isn't supported for S3 export.
S3 export isn't supported in the AWS Region where the snapshot was created.

Using the CLI

# Start new tasks.
$ aws rds start-export-task \
  --export-task-identifier 'db-finalSnapshot-2024' \
  --source-arn 'arn:aws:rds:eu-west-1:012345678901:snapshot:db-prod-final-2024' \
  --s3-bucket-name 'backups' --s3-prefix 'rds' \
  --iam-role-arn 'arn:aws:iam::012345678901:role/CustomRdsS3Exporter' \
  --kms-key-id 'arn:aws:kms:eu-west-1:012345678901:key/abcdef01-2345-6789-abcd-ef0123456789'
{
  "ExportTaskIdentifier": "db-finalSnapshot-2024",
  "IamRoleArn": "arn:aws:iam::012345678901:role/CustomRdsS3Exporter",
  "KmsKeyId": "arn:aws:kms:eu-west-1:012345678901:key/abcdef01-2345-6789-abcd-ef0123456789",
  "PercentProgress": 0,
  "S3Bucket": "backups",
  "S3Prefix": "rds",
  "SnapshotTime": "2024-06-17T09:04:41.387000+00:00",
  "SourceArn": "arn:aws:rds:eu-west-1:012345678901:snapshot:db-prod-final-2024",
  "Status": "STARTING",
  "TotalExtractedDataInGB": 0
}

# Get tasks' status.
$ aws rds describe-export-tasks
$ aws rds describe-export-tasks --export-task-identifier 'db-finalSnapshot-2024'
$ aws rds describe-export-tasks --query 'ExportTasks[].WarningMessage' --output 'yaml'

# Cancel tasks.
$ aws rds cancel-export-task --export-task-identifier 'my_export'
{
    "Status": "CANCELING",
    "S3Prefix": "",
    "ExportTime": "2019-08-12T01:23:53.109Z",
    "S3Bucket": "DOC-EXAMPLE-BUCKET",
    "PercentProgress": 0,
    "KmsKeyId": "arn:aws:kms:AWS_Region:123456789012:key/K7MDENG/bPxRfiCYEXAMPLEKEY",
    "ExportTaskIdentifier": "my_export",
    "IamRoleArn": "arn:aws:iam::123456789012:role/export-to-s3",
    "TotalExtractedDataInGB": 0,
    "TaskStartTime": "2019-11-13T19:46:00.173Z",
    "SourceArn": "arn:aws:rds:AWS_Region:123456789012:snapshot:export-example-1"
}

Restore

Since RDS does not allow physical access to its managed instances, one cannot restore physical backups.
It does allow restoring logical backups, though.

Warning

RDS does not restore data in the strictest sense of the word, e.g. by rolling it back or replacing it in the same RDS DB instance.
Instead, the service forces users to create a new RDS DB instance from the desired backup point.

Should one want to replace the data in an existing RDS DB instance, they will need to (either-or):

Restore a logical backup via other means (e.g., pg_restore).
Replace the RDS DB instance with a new one from the desired backup.

If an RDS DB instance has automated backups enabled, one can use it as source to create a new RDS DB instance that has the same attributes and data up to a specific point in time.
This does not modify the source DB instance.

Refer Restoring a DB instance to a specified time for Amazon RDS.

One can restore to any point in time within the source RDS DB instance's automatic backup retention period.

Restored DB instances are automatically associated with the default DB parameter and option groups, unless one specifies a custom parameter group and/or option group during the restore process.

If the source DB instance has resource tags, RDS adds them by default to the restored DB instance.

DB instances can be restored from DB snapshots.
This requires the new instances to have equal or more allocated storage than what the original instance had allocated at the time the snapshot was taken.

aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier 'myNewDbInstance' --db-snapshot-identifier 'myDbSnapshot'

aws rds restore-db-instance-to-point-in-time \
  --target-db-instance-identifier 'myNewDbInstance' --source-db-instance-identifier 'oldDbInstance' \
  --use-latest-restorable-time

Should snapshot one used as source be from an instance that had automatic backups enabled, the restored DB instance will have automatic backups enabled too, and will backup itself right after creation.
Refer the Backup section for what this means.

There is currently no way to prevent the backup being generated at instance creation time.
That process is triggered automatically and the feature can only be toggled on and off for existing instances.
Refer Disabling AWS RDS backups when creating/updating instances?.

The BackupRetentionPeriod flag is part of both instances and snapshot definitions, but can only be configured for instances.
To create instances with this flag set to 0 from snapshots, and thus have no backup automatically taken, the source snapshot must have this flag already set to 0. This can only happen if the original instance was configured that way when the snapshot was taken in the first place.

Multi-AZ instances

Refer Multi-AZ DB instance deployments for Amazon RDS.

DB instances can be configured for high availability and failover support by using Multi-AZ deployments.

RDS provisions and maintains a synchronous standby replica in a different AZ, which continuously syncs with the primary DB instance.
This provides data redundancy, minimizes latency spikes during system backups, enhances availability during planned system maintenance, and helps protect the database against DB instance failure and AZ disruption.

Important

One cannot use the standby replica to serve read traffic.
To serve read-only traffic, use a Multi-AZ DB cluster or a read replica instance instead.

Multi-AZ DB instance deployments have increased costs, and write and commit latency compared to Single-AZ deployments due to the synchronous data replication to the standby replica.

A Single-AZ configuration deploys a DB instance and its EBS storage volumes in one AZ.
A Multi-AZ configuration deploys a DB instance and its EBS storage volumes across two or more AZs.

RDS uses several different technologies to provide failover support. It supports MariaDB, MySQL, Oracle, PostgreSQL, and RDS Custom for SQL Server DB instances.
Microsoft SQL Server DB instances use SQL Server Database Mirroring or Always On Availability Groups.

If an infrastructure defect results in any outage of a Multi-AZ DB instance, RDS automatically switches to the standby replica.
The failover typically takes 60 to 120 seconds, but it depends on the database activity and other conditions at the time the primary DB instance became unavailable. Large transactions or a lengthy recovery process can increase failover time.
When the failover is complete, it can take additional time for the RDS console to reflect the new AZ.

For a Multi-AZ deployment, RDS creates DB snapshots and automated backups from the secondary instance during the automatic backup window. This prevents the backup process from suspending I/O activity on the primary instance on all engines but SQL Server.
In a Single-AZ deployment, the backup process does result in a brief I/O suspension that can last from a few seconds to a few minutes. The amount of time depends on the size and class of the DB instance.

A Single-AZ instance is unavailable during a scaling operation.
For Multi-AZ deployments, RDS achieves minimal downtime during certain OS patches or scaling operations by:

Applying OS maintenance and scaling operations to the secondary instance first.
Promoting the secondary instance to primary, and demoting the old primary instance to secondary.
Performing maintenance or modifications on the old primary, now secondary instance.

Converting instances between Multi-AZ and Single-AZ

Refer What happens when I change my RDS DB instance from a Single-AZ to a Multi-AZ deployment or a Multi-AZ to a Single-AZ deployment? and When modifying a Multi-AZ RDS to Single Instance the AZ was changed!!.

One can convert existing Single-AZ DB instances to Multi-AZ deployments just by modifying the DB instance.
This process involves minimal to no downtime, but requires planning around storage and performance impacts if done on active instances.

During a Single-AZ to Multi-AZ conversion, RDS:

Takes a snapshot of the primary DB instance's EBS volumes.
Creates new volumes for the standby replica from that snapshot in another AZ.
Turns on synchronous block-level replication between the volumes of the primary and standby replicas.
Creates the new standby replica instance in the AZ where the volumes were created, and attaches them to it.

One can convert existing Multi-AZ DB instances to Single-AZ deployments just by modifying the DB instance.
This process involves minimal to no downtime, but requires planning around storage and performance impacts if done on active instances.

During a Multi-AZ to Single-AZ conversion, RDS typically keeps the instance in the AZ where the primary was located and deletes only the secondary instance and volumes. The change does not typically affect the primary instance.

Operations

PostgreSQL: reduce allocated storage by migrating using transportable databases

Refer Migrating databases using RDS PostgreSQL Transportable Databases, Transporting PostgreSQL databases between DB instances and Transport PostgreSQL databases between two Amazon RDS DB instances using pg_transport.

The pg_transport enables streaming the database files with minimal processing by making a target DB instance import a database from a source DB instance.

When the transport begins, all current sessions on the source database are ended and the DB is put in ReadOnly mode.
Only the specific source database that is being transported is affected. Others are not affected.

Primary instances with replicas can be used as source instances.
TODO: test using a RO replica as the source instance. I expect this will not work due to the transport extension putting the source DB in RO mode.

The in-transit database will be inaccessible on the target DB instance for the duration of the transport.
During transport, the target DB instance cannot be restored to a point in time, as the transport is not transactional and does not use the PostgreSQL write-ahead log to record changes.

Limitations

The access privileges (including the default ones) and ownership from the source database are not transferred to the target instance.
Dump them from the source, or (preferred) keep sql files with their definitions close to recreate them in other ways.
Databases cannot be transported onto read replicas or parent instances of read replicas.
They can be read from instances with replicas, though.
reg data types cannot be used in any source database's table that are about to be transported.
There can be up to 32 total transports (including both imports and exports) active at the same time on any DB instance.
All the DB's data is migrated as is.
Triggers and functions are apparently not transported either.
Noticed after a production DB migration.
All extensions must be dropped from the source database.

This means that, for some extensions, the data they manage is also dropped.

Requirements

A source DB to copy data from.
A target instance to copy the DB to.

Since the transport will create the DB on the target, the target instance must not contain the database that needs to be transported.
Should the target contain the DB already, it will need to be dropped beforehand.
Both DB instances must run the same major version of PostgreSQL.
Differences in minor versions seem to be fine.
Should the source DB have the pgaudit extension loaded, that extension will need to be installed on the target instance.
The target instance must be able to connect to the source instance.
All source database objects must reside in the default pg_default tablespace.
The source DB (but not other DBs on the same source instance) will need to:
- Be put in Read Only mode (automatic, done during transport).
- Have all installed extensions removed.

To avoid locking the operator's machine for the time needed by the transport, it is suggested the use of an EC2 instance as the middleman to operate on both DBs.

Keep the DBs identifiers under 22 characters.
The pg_transport extension will try and truncate any host argument to 63 characters, and RDS FQDNs are something like {{instance-id}}.{{12-char-internal-id}}.{{region}}.rds.amazonaws.com.

Procedure

Enable the required configuration parameters and pg_transport extension on the source and target RDS instances.
Create a new RDS Parameter Group or modify the existing one used by the source.

Required parameters:
- shared_preload_libraries must include pg_transport.
  Static parameter, requires reboot.
- pg_transport.num_workers must be tuned.
  Its value determines the number of transport.send_file workers that will be created in the source. Defaults to 3.
- max_worker_processes must be at least (3 * pg_transport.num_workers) + 9.
  Required on the destination to handle various background worker processes involved in the transport.
  Static parameter, requires reboot.
- pg_transport.work_mem can be tuned.
  Specifies the maximum memory to allocate to each worker. Defaults to 131072 (128 MB) or 262144 (256 MB) depending on the PostgreSQL version.
- pg_transport.timing can be set to 1.
  Specifies whether to report timing information during the transport. Defaults to 1 (true), meaning that timing information is reported.
Assign the Parameter Group to the source instance and reboot it to apply static changes.
Create a new target instance with the required allocated storage.
Check the requirements again.
Make sure the middleman can connect to both DBs.
Make sure the target DB instance can connect to the source.
Make sure one has a way to reinstate existing roles and permissions onto the target.
Dump existing roles and permissions from the source if required on the target.

RDS does not grant full SuperUser permissions even to instances' master users. This makes impossible to use pg_dumpall -r to fully dump rules and permissions from the source.
One can export them by excluding the passwords from the dump:
```
pg_dumpall -h 'source-instance.5f7mp3pt3n6e.eu-west-1.rds.amazonaws.com' -U 'admin' -l 'postgres' -W \
  -rf 'roles.sql' --no-role-passwords
```
but statements involving protected roles (like rdsadmin and any other matching rds_*) and change in 'superuser' or 'replication' attributes will fail on restore.
Clean them up from the dump:
```
# Ignore *everything* involving the 'rdsadmin' user.
# Ignore the creation or alteration of AWS-managed RDS roles.
# Ignore changes involving protected attributes.
sed -Ei'.backup' \
  -e '/rdsadmin/d' \
  -e '/(CREATE|ALTER) ROLE rds_/d' \
  -e 's/(NO)(SUPERUSER|REPLICATION)\s?//g' \
  'roles.sql'
```

Prepare the source DB for transport:

Connect to the DB:

psql -h 'source-instance.5f7mp3pt3n6e.eu-west-1.rds.amazonaws.com' -p '5432' -d 'source_db' -U 'admin' --password

Only the pg_transport extension is allowed in the source DB during the actual transport operation.
Remove all extensions but pg_transport from the public schema of the DB instance:
```
SELECT "extname" FROM "pg_extension";
DROP EXTENSION IF EXISTS "btree_gist", "pgcrypto", …, "postgis" CASCADE;
```

Load the pg_transport extension if missing:

CREATE EXTENSION IF NOT EXISTS "pg_transport";

Prepare the target DB for transport:
1. The instance must not contain a DB with the same name of the source, as the transport will create it on the target.
  Connect to a different DB than the source's:
```
psql -h 'target-instance.5f7mp3pt3n6e.eu-west-1.rds.amazonaws.com' -p '5432' -d 'postgres' -U 'admin' --password
```
2. Make sure no DB exists with the same name of the source DB:
```
DROP DATABASE IF EXISTS "source_db";
```
3. Load the pg_transport extension if missing:
```
CREATE EXTENSION IF NOT EXISTS "pg_transport";
```

[optional] Test the transport by running the transport.import_from_server function on the target DB instance:

-- Keep arguments in *single* quotes here
SELECT transport.import_from_server(
  'source-instance.5f7mp3pt3n6e.eu-west-1.rds.amazonaws.com', 5432,
  'admin', 'source-user-password', 'source_db',
  'target-user-password',
  true
);

Run the transport by running the transport.import_from_server function on the target DB instance:
```
SELECT transport.import_from_server( …, …, …, …, …, …, false );
```
Validate the data in the target.
Restore uninstalled extensions in the public schema of both DB instances.
pg_transport can be now dropped if not necessary anymore.
Restore all the needed roles and permissions onto the target:
```
psql -h 'target-instance.5f7mp3pt3n6e.eu-west-1.rds.amazonaws.com' -p '5432' -U 'admin' -d 'postgres' --password \
  -f 'roles.sql'
```
Restoring roles from raw dumps will throw a lot of errors about altering superuser attributes or protected roles. Check the list item about dumping data above.
Revert the value of the max_worker_processes parameter if necessary.
This will require a restart of the instance.

If the target DB instance has automatic backups enabled, a backup is automatically taken after transport completes.
Point-in-time restores will be available for times after the backup finishes.

Should the transport fail, the pg_transport extension will attempt to undo all changes to the source and target DB instances. This includes removing the destination's partially transported database.
Depending on the type of failure, the source database might continue to reject write-enabled queries. Should this happen, allow write-enabled queries manually:

ALTER DATABASE db-name SET default_transaction_read_only = false;

Performance tests

db.t4g.medium to db.t4g.medium, gp3 storage, ~ 350 GB database

Interruptions are due to the exhaustion of I/O burst credits, which tainted the benchmark.

	1st run	2nd run	3rd and 6th run	4	5
`pg_transport.num_workers`	2	4	8	8	12
`max_worker_processes`	15	21	33	33	45
`pg_transport.work_mem`	131072 (128 MB)	131072 (128 MB)	131072 (128 MB)	262144 (256 MB)	131072 (128 MB)
Minimum transfer rate	~ 19 MB/s	~ 19 MB/s	~ 50 MB/s	~ 4 MB/s	~ 25 MB/s
Maximum transfer rate	~ 58 MB/s	~ 95 MB/s	~ 255 MB/s	~ 255 MB/s	~ 165 MB/s
Average transfer rate	~ 31 MB/s	~ 66 MB/s	~ 138 MB/s	~ 101 MB/s	~ 85 MB/s
Time estimated after 10m	~ 3h 13m	~ 1h 36m	~ 52m	~ 1h	~ 1h 11m
Time taken	N/A (interrupted)	N/A (interrupted)	N/A (interrupted)	N/A (interrupted)	N/A (interrupted)
Source CPU usage	~ 10%	~ 15%	~ 40%	~ 39%	~ 37%
Source RAM usage delta	N/A (did not check)	N/A (did not check)	+ ~ 1.5 GB	N/A (did not check)	N/A (did not check)
Target CPU usage	~ 12%	~ 18%	~ 34%	~ 28%	~ 25%
Target RAM usage delta	N/A (did not check)	N/A (did not check)	+ ~ 1.5 GB	N/A (did not check)	N/A (did not check)

db.m6i.xlarge to db.m6i.xlarge, gp3 storage, ~ 390 GB database

	1st run	2nd to 5th run
`pg_transport.num_workers`	8	16
`max_worker_processes`	33	57
`pg_transport.work_mem`	131072 (128 MB)	131072 (128 MB)
Minimum transfer rate	~ 97 MB/s	~ 248 MB/s
Maximum transfer rate	~ 155 MB/s	~ 545 MB/s
Average transfer rate	~ 135 MB/s	~ 490 MB/s
Time estimated after 10m	~ 46m	~ 14m
Time taken	~ 48m	~ 14m
Source CPU usage	~ 12%	~ 42%
Source RAM usage delta	+ ~ 940 MB	+ ~ 1.5 GB
Target CPU usage	~ 17%	~ 65%
Target RAM usage delta	+ ~ 1.3 GB	+ ~ 3.3 GB

Stop instances

Refer Stopping an Amazon RDS DB instance temporarily.

RDS instances can be stopped only up to 7 days.
The service will automatically start DB instances that have been stopped for 7 consecutive days so that they do not fall behind required maintenance updates.

One can still stop and start DB instances on a schedule via Step Functions.

Cancel pending modifications

Refer How do I cancel pending maintenance in Amazon RDS for PostgreSQL?.

Cancel maintenance actions

Explicitly issue a new pending maintenance action with opt-in-type set to undo-opt-in.

# FIXME: check
$ aws rds describe-pending-maintenance-actions --resource-identifier 'some-db' \
  --query 'PendingMaintenanceActions[]' --output 'yaml'
- ResourceIdentifier: arn:aws:rds:ap-southeast-2:123456789:db:testsnapshot,
  PendingMaintenanceActionDetails:
  - Action: system-update
    OptInStatus: next-maintenance
    CurrentApplyDate: 2024-07-10T12:51:00+00:00
    Description: New Operating System update is available
$ aws rds apply-pending-maintenance-action --resource-identifier 'some-db' \
  --apply-action 'system-update' --opt-in-type 'undo-opt-in' \
  --query 'PendingMaintenanceActions[]' --output 'yaml'
- {}

Cancel instance class change

Explicitly issue a new immediate modification with the current instance settings.

$ aws rds describe-db-instances --db-instance-identifier 'some-db' \
  --query 'DBInstances[*].PendingModifiedValues' --output 'yaml'
- DBInstanceClass: db.t3.medium
$ aws rds modify-db-instance --db-instance-identifier 'some-db' \
  --db-instance-class 'db.t3.small' --apply-immediately \
  --query 'DBInstances[*].PendingModifiedValues' --output 'yaml'
- {}

Troubleshooting

ERROR: extension must be loaded via shared_preload_libraries

Refer How can I resolve the "ERROR: <module/extension> must be loaded via shared_preload_libraries" error?

Include the module or extension in the shared_preload_libraries parameter in the Parameter Group.
Reboot the instance to apply the change.
Try reloading it again.

ERROR: must be superuser to alter X roles or change X attribute

Error message examples:

ERROR: must be superuser to alter superuser roles or change superuser attribute
ERROR: must be superuser to alter replication roles or change replication attribute

RDS does not grant full SuperUser permissions even to instances' master users.
Actions involving altering protected roles or changing protected attributes are practically blocked on RDS.

Transport fails asking for the remote user must have superuser, but it already does

Error message example

Cannot execute SQL 'SELECT transport.import_from_server(
  'source.ab0123456789.eu-west-1.rds.amazonaws.com',
  5432,
  'mastarr',
  '********',
  'sales',
  '********',
  true
);' None: remote user must have superuser (or rds_superuser if on RDS)

Speculative Root cause

RDS did not finish to properly apply the settings.

Solution

Reboot the source and target instance and retry.

The instance is unbearably slow

Root cause

The instance might be out of burst credits.

If the available burst credits are depleted or zero, then the CPU, storage, or network throughput includes heavy read or write workloads and exceeds the instance type quotas.