Files
oam/knowledge base/cloud computing/aws/rds.md
2024-07-02 00:08:50 +02:00

32 KiB

Amazon Relational Database Service

  1. TL;DR
  2. Storage
  3. Parameter Groups
  4. Option Groups
  5. Backup
    1. Automatic backups
    2. Manual backups
    3. Export snapshots to S3
  6. Restore
  7. Encryption
  8. Operations
    1. PostgreSQL
      1. Reduce allocated storage by migrating using transportable databases
  9. Troubleshooting
    1. ERROR: extension must be loaded via shared_preload_libraries
    2. ERROR: must be superuser to alter superuser roles or change superuser attribute
  10. Further readings
    1. Sources

TL;DR

CLI usage
# Show RDS instances.
aws rds describe-db-instances
aws rds describe-db-instances --output 'json' --query "DBInstances[?(DBInstanceIdentifier=='master-prod')]"

# Show Parameter Groups.
aws rds describe-db-parameters --db-parameter-group-name 'default.postgres15'

# Create parameter Groups.
aws rds create-db-parameter-group --db-parameter-group-name 'pg15-source-transport-group' \
  --db-parameter-group-family 'postgres15' --description 'Parameter group with transport parameters enabled'

# Modify Parameter Groups.
aws rds modify-db-parameter-group --db-parameter-group-name 'pg15-source-transport-group' \
  --parameters \
    'ParameterName=pg_transport.num_workers,ParameterValue=4,ApplyMethod=pending-reboot' \
    'ParameterName=pg_transport.timing,ParameterValue=1,ApplyMethod=pending-reboot' \
    'ParameterName=pg_transport.work_mem,ParameterValue=131072,ApplyMethod=pending-reboot' \
    'ParameterName=shared_preload_libraries,ParameterValue="pg_stat_statements,pg_transport",ApplyMethod=pending-reboot' \
    'ParameterName=max_worker_processes,ParameterValue=24,ApplyMethod=pending-reboot'

# Restore instances from snapshots.
aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier 'myNewBbInstance' \
  --db-snapshot-identifier 'myOldDbSnapshot'

# Start export tasks.
aws rds start-export-task \
  --export-task-identifier 'db-finalSnapshot-2024' \
  --source-arn 'arn:aws:rds:eu-west-1:012345678901:snapshot:db-prod-final-2024' \
  --s3-bucket-name 'backups' \
  --iam-role-arn 'arn:aws:iam::012345678901:role/CustomRdsS3Exporter' \
  --kms-key-id 'arn:aws:kms:eu-west-1:012345678901:key/abcdef01-2345-6789-abcd-ef0123456789'

# Get export tasks' status.
aws rds describe-export-tasks
aws rds describe-export-tasks --export-task-identifier 'my-snapshot-export'

# Cancel tasks.
aws rds cancel-export-task --export-task-identifier 'my_export'

Read replicas can be promoted to standalone DB instances.
See Working with DB instance read replicas.

Disk free metrics are available in CloudWatch.

One can choose any of the following retention periods for instances' Performance Insights data:

  • 7 days (default, free tier).
  • n months, where n is a number from 1 to 24.
    In CLI and IaC, this number must be n*31.

Storage

Refer Amazon RDS DB instance storage.

When selecting General Purpose SSD or Provisioned IOPS SSD, RDS automatically stripes storage across multiple volumes to enhance performance depending on the engine selected and the amount of storage requested:

DB engine Storage size Number of volumes provisioned
Db2 Less than 400 GiB 1
Db2 400 to 65,536 GiB 4
MariaDB
MySQL
PostgreSQL
Less than 400 GiB 1
MariaDB
MySQL
PostgreSQL
400 to 65,536 GiB 4
Oracle Less than 200 GiB 1
Oracle 200 to 65,536 GiB 4
SQL Server Any 1

When modifying a General Purpose SSD or Provisioned IOPS SSD volume, it goes through a sequence of states.
While the volume is in the optimizing state, volume performance is between the source and target configuration specifications.
Transitional volume performance will be no less than the lower of the two specifications.

When increasing allocated storage, increases must be by at least of 10%. Trying to increase the value by less than 10% will result in an error.
The allocated storage cannot be increased when restoring RDS for SQL Server DB instances.

The allocated storage size of any DB instance cannot be lowered after creation.

Decrease the storage size of DB instances by creating a new instance with lower provisioned storage size, then migrate the data into the new instance.
Use one of the following methods:

  • Use the database engine's native dump and restore method. This will require long downtime.
  • Consider using transportable DBs when dealing with PostgreSQL DBs should the requirements match.
    This will require some downtime.
  • Perform an homogeneous data migration using AWS's DMS
    This should require minimal downtime.

Parameter Groups

Refer Working with parameter groups.

Used to specify how a DB is configured.

  • Static parameters require instances to be rebooted after a change for the new value to take effect.

  • Dynamic parameters are applied at runtime and do not require instances to reboot after changing.

    RDS instances using custom DB parameter groups allow for changes to values of dynamic parameters while running.
    Make changes by using the AWS Management Console, the AWS CLI, or the Amazon RDS API.

    If one has enough privileges to do so, one can also change parameter values by using the ALTER DATABASE, ALTER ROLE, and SET commands.

Learn about available parameters by describing the existing default ones:

aws rds describe-db-parameters --db-parameter-group-name 'default.postgres15'
aws rds describe-db-parameters --db-parameter-group-name 'default.postgres15' \
  --query "Parameters[?ParameterName=='shared_preload_libraries']" --output 'table'

aws rds describe-db-parameters --db-parameter-group-name 'default.postgres15' \
  --query "Parameters[?ParameterName=='shared_preload_libraries'].ApplyType" --output 'text'

aws rds describe-db-parameters --db-parameter-group-name 'default.postgres15' --output 'table' \
  --query "Parameters[?ApplyType!='dynamic']"

Option Groups

Used to enable and configure additional features and functionalities in a DB.

Backup

RDS backup storage for each Region is calculated from both the automated backups and manual DB snapshots for that Region.
Moving snapshots to other Regions increases the backup storage in the destination Regions.

Backups are stored in S3.

Should one choose to retain automated backups when deleting DB instances, those backups are saved for the full retention period; otherwise, all automated backups are deleted with the instance.
After automated backups are deleted, they cannot be recovered.

Should one choose to have RDS create a final DB snapshot before deleting a DB instance, one can use that or previously created manual snapshots to recover it.

Automatic backups

Automatic backups are storage volume snapshots of entire DB instances.

Automatic backups are enabled by default.
Setting the backup retention period to 0 disables them, setting it to a nonzero value (re)enables them.

Enabling automatic backups takes the affected instances offline to have a backup created immediately.
It will cause outages.

Automatic backups occur daily during the instances' backup window, configured in 30 minute periods. Should backups require more time than allotted to the backup window, they will continue after the window ends and until they finish.

Backups are retained for up to 35 days (backup retention period).
One can recover DB instances to any point in time from the backup retention period.

The backup window can't overlap with the weekly maintenance window for DB instance or Multi-AZ DB cluster.
During automatic backup windows storage I/O might be suspended briefly while the backup process initializes. Initialization typically takes up to a few seconds. One might also experience elevated latencies for a few minutes during backups for Multi-AZ deployments.
For MariaDB, MySQL, Oracle and PostgreSQL Multi-AZ deployments, I/O activity isn't suspended on the primary instance as the backup is taken from the standby.
Automated backups might occasionally be skipped if instances or clusters are running heavy workloads at the time backups are supposed to start.

DB instances must be in the available state for automated backups to occur.
Automated backups don't occur while DB instances are in other states (i.e., storage_full).

Automated backups aren't created while a DB instance or cluster is stopped.
RDS doesn't include time spent in the stopped state when the backup retention window is calculated. This means backups can be retained longer than the backup retention period if a DB instance has been stopped.

Automated backups don't occur while a DB snapshot copy is running in the same AWS Region for the same database.

Manual backups

Back up DB instances manually by creating DB snapshots.
The first snapshot contains the data for the full database. Subsequent snapshots of the same database are incremental.

One can copy both automatic and manual DB snapshots, but only share manual DB snapshots.

Manual snapshots never expire and are retained indefinitely.

One can store up to 100 manual snapshots per Region.

Export snapshots to S3

One can export DB snapshot data to S3 buckets.
RDS spins up an instance from the snapshot, extracts data from it and stores the data in Apache Parquet format.
By default all data in the snapshots is exported, but one can specify specific sets of databases, schemas, or tables to export.

  • The export process runs in the background and does not affect the performance of active DB instances.

  • Multiple export tasks for the same DB snapshot cannot run simultaneously. This applies to both full and partial exports.

  • Exporting snapshots from DB instances that use magnetic storage isn't supported.

  • The following characters aren't supported in table column names:

    , ; { } ( ) \n \t = (space) /
    

    Tables containing those characters in column names are skipped during export.

  • PostgreSQL temporary and unlogged tables are skipped during export.

  • Large objects in the data, like BLOBs or CLOBs, close to or greater than 500 MB will make the export fail.

  • Large rows close to or greater than 2 GB will make their table being skipped during export.

  • Data exported from snapshots to S3 cannot be restored to new DB instances.

  • The snapshot export tasks require a role with write-access permission to the destination S3 bucket:

    {
      "Version": "2012-10-17",
      "Statement": [{
          "Effect": "Allow",
          "Action": "sts:AssumeRole",
          "Principal": {
            "Service": "export.rds.amazonaws.com"
          }
      }]
    }
    
    {
      "Version": "2012-10-17",
      "Statement": [{
        "Effect": "Allow",
        "Action": [
            "s3:PutObject*",
            "s3:ListBucket",
            "s3:GetObject*",
            "s3:DeleteObject*",
            "s3:GetBucketLocation"
        ],
        "Resource": [
            "arn:aws:s3:::bucket",
            "arn:aws:s3:::bucket/*"
        ]
      }]
    }
    

After the export, one can analyze the data directly through Athena or Redshift Spectrum.

In the Console

The Export to Amazon S3 console option appears only for snapshots that can be exported to Amazon S3.
Snapshots might not be available for export because of the following reasons:

  • The DB engine isn't supported for S3 export.
  • The DB instance version isn't supported for S3 export.
  • S3 export isn't supported in the AWS Region where the snapshot was created.
Using the CLI
# Start new tasks.
$ aws rds start-export-task \
  --export-task-identifier 'db-finalSnapshot-2024' \
  --source-arn 'arn:aws:rds:eu-west-1:012345678901:snapshot:db-prod-final-2024' \
  --s3-bucket-name 'backups' --s3-prefix 'rds' \
  --iam-role-arn 'arn:aws:iam::012345678901:role/CustomRdsS3Exporter' \
  --kms-key-id 'arn:aws:kms:eu-west-1:012345678901:key/abcdef01-2345-6789-abcd-ef0123456789'
{
  "ExportTaskIdentifier": "db-finalSnapshot-2024",
  "IamRoleArn": "arn:aws:iam::012345678901:role/CustomRdsS3Exporter",
  "KmsKeyId": "arn:aws:kms:eu-west-1:012345678901:key/abcdef01-2345-6789-abcd-ef0123456789",
  "PercentProgress": 0,
  "S3Bucket": "backups",
  "S3Prefix": "rds",
  "SnapshotTime": "2024-06-17T09:04:41.387000+00:00",
  "SourceArn": "arn:aws:rds:eu-west-1:012345678901:snapshot:db-prod-final-2024",
  "Status": "STARTING",
  "TotalExtractedDataInGB": 0
}

# Get tasks' status.
$ aws rds describe-export-tasks
$ aws rds describe-export-tasks --export-task-identifier 'db-finalSnapshot-2024'
$ aws rds describe-export-tasks --query 'ExportTasks[].WarningMessage' --output 'yaml'

# Cancel tasks.
$ aws rds cancel-export-task --export-task-identifier 'my_export'
{
    "Status": "CANCELING",
    "S3Prefix": "",
    "ExportTime": "2019-08-12T01:23:53.109Z",
    "S3Bucket": "DOC-EXAMPLE-BUCKET",
    "PercentProgress": 0,
    "KmsKeyId": "arn:aws:kms:AWS_Region:123456789012:key/K7MDENG/bPxRfiCYEXAMPLEKEY",
    "ExportTaskIdentifier": "my_export",
    "IamRoleArn": "arn:aws:iam::123456789012:role/export-to-s3",
    "TotalExtractedDataInGB": 0,
    "TaskStartTime": "2019-11-13T19:46:00.173Z",
    "SourceArn": "arn:aws:rds:AWS_Region:123456789012:snapshot:export-example-1"
}

Restore

DB instances can be restored from DB snapshots.
Restoring instances from snapshots requires the new instances to have equal or more allocated storage than what the original instance had allocated at the time the snapshot was taken.

aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier 'myNewDbInstance' \
  --db-snapshot-identifier 'myDbSnapshot'

Encryption

RDS automatically integrates with AWS KMS for key management.

By default, RDS uses the RDS AWS managed key (aws/rds) for encryption.
This key can't be managed, rotated, nor deleted by users.

RDS will automatically put databases into a terminal state when access to the KMS key is required but the key has been disabled or deleted, or its permissions have been somehow revoked.
This change could be immediate or deferred depending on the use case that required access to the KMS key.
In this terminal state, DB instances are no longer available and their databases' current state can't be recovered. To restore DB instances, one must first re-enable access to the KMS key for RDS, and then restore the instances from their latest available backup.

Operations

PostgreSQL

Reduce allocated storage by migrating using transportable databases

Refer Migrating databases using RDS PostgreSQL Transportable Databases, Transporting PostgreSQL databases between DB instances and Transport PostgreSQL databases between two Amazon RDS DB instances using pg_transport.

The pg_transport enables streaming the database files with minimal processing by making a target DB instance import a database from a source DB instance.

When the transport begins, all current sessions on the source database are ended and the DB is put in ReadOnly mode.
Only the specific source database that is being transported is affected. Others are not affected.

Primary instances with replicas can be used as source instances.
TODO: test using a RO replica as the source instance. I expect this will not work due to the transport extension putting the source DB in RO mode.

The in-transit database will be inaccessible on the target DB instance for the duration of the transport.
During transport, the target DB instance cannot be restored to a point in time, as the transport is not transactional and does not use the PostgreSQL write-ahead log to record changes.

Requirements
  • A source DB to copy data from.

  • A target instance to copy the DB to.

    Since the transport will create the DB on the target, the target instance must not contain the database that needs to be transported.
    Should the target contain the DB already, it will need to be dropped beforehand.

  • Both DB instances must run the same major version of PostgreSQL.
    Differences in minor versions seem to be fine.

  • Should the source DB have the pgaudit extension loaded, that extension will need to be installed on the target instance so that it can be ported.

  • The target instance must be able to connect to the source instance.

  • All source database objects must reside in the default pg_default tablespace.

  • The source DB (but not other DBs on the same source instance) will need to:

    • Be put in Read Only mode (automatic, done during transport).
    • Have all installed extensions removed.

To avoid locking the operator's machine for the time needed by the transport, it is suggested the use of an EC2 instance as the middleman to operate on both DBs.

Try and keep the DBs identifiers under 22 characters.
PostgreSQL will try and truncate the identifier after 63 characters, and AWS will add something like .{{12-char-id}}.{{region}}.rds.amazonaws.com to it.

Limitations
  • The access privileges and ownership from the source database are not transferred to the target database.
  • Databases cannot be transported onto read replicas or parent instances of read replicas.
  • reg data types cannot be used in any database table that are about to be transported with this method.
  • There can be up to 32 total transports (including both imports and exports) active at the same time on any DB instance.
  • All the DB data is migrated as is.
Procedure
  1. Enable the required configuration parameters and pg_transport extension on the source and target RDS instances.
    Create a new RDS Parameter Group or modify the existing one used by the source.

    Required parameters:

    • shared_preload_libraries must include pg_transport.
      Static parameter, requires reboot.
    • pg_transport.num_workers must be tuned.
      Its value determines the number of transport.send_file workers that will be created in the source. Defaults to 3.
    • max_worker_processes must be at least (3 * pg_transport.num_workers) + 9.
      Required on the destination to handle various background worker processes involved in the transport.
      Static parameter, requires reboot.
    • pg_transport.work_mem can be tuned.
      Specifies the maximum memory to allocate to each worker. Defaults to 131072 (128 MB) or 262144 (256 MB) depending on the PostgreSQL version.
    • pg_transport.timing can be set to 1.
      Specifies whether to report timing information during the transport. Defaults to 1 (true), meaning that timing information is reported.
  2. Reboot the instances equipped with the Parameter Group to apply changes.

  3. Create a new target instance with the required allocated storage.

  4. Make sure the middleman can connect to both DBs.

  5. Make sure the target DB instance can connect to the source.

  6. RDS does not grant full SuperUser permissions even to instances' master users. This makes impossible to use pg_dumpall -r to fully dump rules and permissions from the source.
    One can export them by excluding the passwords from the dump:

    pg_dumpall -h 'source-instance.5f7mp3pt3n6e.eu-west-1.rds.amazonaws.com' -U 'admin' -l 'postgres' -W \
      -rf 'roles.sql' --no-role-passwords
    

    but statements involving protected roles (like rdsadmin and any other matching rds_*) and change in 'superuser' or 'replication' attributes will fail on restore.
    Clean them up from the dump:

    # Ignore *everything* that has to do with 'rdsadmin'
    # Ignore the creation or alteration of AWS-managed RDS roles
    # Ignore changes involving protected attributes
    sed -Ei'.backup' \
      -e '/rdsadmin/d' \
      -e '/(CREATE|ALTER) ROLE rds_/d' \
      -e 's/(NO)(SUPERUSER|REPLICATION)\s?//g' \
      'roles.sql'
    

    Just make sure one has a way to reinstate existing roles and permissions onto the target.

  7. Prepare the source DB for transport:

    1. Connect to the DB:

      psql -h 'source-instance.5f7mp3pt3n6e.eu-west-1.rds.amazonaws.com' -p '5432' -d 'source-db' -U 'admin' --password
      
    2. Only the pg_transport extension is allowed in the source DB during the actual transport operation.
      Remove all extensions but pg_transport from the public schema of the DB instance:

      SELECT "extname" FROM "pg_extension";
      DROP EXTENSION IF EXISTS "btree_gist", "pgcrypto", , "postgis" CASCADE;
      
    3. Load the pg_transport extension if missing:

      CREATE EXTENSION IF NOT EXISTS "pg_transport";
      
  8. Prepare the target DB for transport:

    1. The instance must not contain a DB with the same name of the source, as the transport will create it on the target.
      Connect to a different DB than the source:

      psql -h 'target-instance.5f7mp3pt3n6e.eu-west-1.rds.amazonaws.com' -p '5432' -d 'postgres' -U 'admin' --password
      
    2. Make sure no DB exists with the same name of the source DB:

      DROP DATABASE IF EXISTS "source-db";
      
    3. Load the pg_transport extension if missing:

      CREATE EXTENSION IF NOT EXISTS "pg_transport";
      
  9. [optional] Test the transport by running the transport.import_from_server function on the target DB instance:

    -- Keep arguments in *single* quotes here
    SELECT transport.import_from_server(
      'source-instance.5f7mp3pt3n6e.eu-west-1.rds.amazonaws.com', 5432,
      'admin', 'source-user-password', 'source-db',
      'target-user-password',
      true
    );
    
  10. Run the transport by running the transport.import_from_server function on the target DB instance:

    SELECT transport.import_from_server( , , , , , , false );
    
  11. Validate the data in the target.

  12. Restore uninstalled extensions in the public schema of both DB instances.
    pg_transport can be dropped now.

  13. Restore all the needed roles and permissions onto the target:

    psql -h 'target-instance.5f7mp3pt3n6e.eu-west-1.rds.amazonaws.com' -p '5432' -U 'admin' -d 'postgres' --password \
      -f 'roles.sql'
    

    Restoring roles from raw dumps will throw a lot of errors about altering superuser attributes or protected roles. Check the list item about dumping data above.

  14. Revert the value of the max_worker_processes parameter if necessary.
    This will require a restart of the instance.


If the target DB instance has automatic backups enabled, a backup is automatically taken after transport completes.
Point-in-time restores will be available for times after the backup finishes.

Should the transport fail, the pg_transport extension will attempt to undo all changes to the source and target DB instances. This includes removing the destination's partially transported database.
Depending on the type of failure, the source database might continue to reject write-enabled queries. Should this happen, allow write-enabled queries manually:

ALTER DATABASE db-name SET default_transaction_read_only = false;
Performance tests
db.t4g.medium to db.t4g.medium, gp3 storage, ~ 350 GB database

Interruptions are due to the exhaustion of I/O burst credits, which tainted the benchmark.

1st run 2nd run 3rd and 6th run 4 5
pg_transport.num_workers 2 4 8 8 12
max_worker_processes 15 21 33 33 45
pg_transport.work_mem 131072 (128 MB) 131072 (128 MB) 131072 (128 MB) 262144 (256 MB) 131072 (128 MB)
Minimum transfer rate ~ 19 MB/s ~ 19 MB/s ~ 50 MB/s ~ 4 MB/s ~ 25 MB/s
Maximum transfer rate ~ 58 MB/s ~ 95 MB/s ~ 255 MB/s ~ 255 MB/s ~ 165 MB/s
Average transfer rate ~ 31 MB/s ~ 66 MB/s ~ 138 MB/s ~ 101 MB/s ~ 85 MB/s
Time estimated after 10m ~ 3h 13m ~ 1h 36m ~ 52m ~ 1h ~ 1h 11m
Time taken N/A (interrupted) N/A (interrupted) N/A (interrupted) N/A (interrupted) N/A (interrupted)
Source CPU usage ~ 10% ~ 15% ~ 40% ~ 39% ~ 37%
Source RAM usage delta N/A (did not check) N/A (did not check) + ~ 1.5 GB N/A (did not check) N/A (did not check)
Target CPU usage ~ 12% ~ 18% ~ 34% ~ 28% ~ 25%
Target RAM usage delta N/A (did not check) N/A (did not check) + ~ 1.5 GB N/A (did not check) N/A (did not check)
db.m6i.xlarge to db.m6i.xlarge, gp3 storage, ~ 390 GB database
1st run 2nd to 5th run
pg_transport.num_workers 8 16
max_worker_processes 33 57
pg_transport.work_mem 131072 (128 MB) 131072 (128 MB)
Minimum transfer rate ~ 97 MB/s ~ 248 MB/s
Maximum transfer rate ~ 155 MB/s ~ 545 MB/s
Average transfer rate ~ 135 MB/s ~ 490 MB/s
Time estimated after 10m ~ 46m ~ 14m
Time taken ~ 48m ~ 14m
Source CPU usage ~ 12% ~ 42%
Source RAM usage delta + ~ 940 MB + ~ 1.5 GB
Target CPU usage ~ 17% ~ 65%
Target RAM usage delta + ~ 1.3 GB + ~ 3.3 GB

Troubleshooting

ERROR: extension must be loaded via shared_preload_libraries

Refer How can I resolve the "ERROR: <module/extension> must be loaded via shared_preload_libraries" error?

  1. Include the module or extension in the shared_preload_libraries parameter in the Parameter Group.
  2. Reboot the instance to apply the change.
  3. Try reloading it again.

ERROR: must be superuser to alter superuser roles or change superuser attribute

Error message examples:

ERROR: must be superuser to alter superuser roles or change superuser attribute
ERROR: must be superuser to alter replication roles or change replication attribute

RDS does not grant full SuperUser permissions even to instances' master users.
Actions involving altering protected roles or changing protected attributes are practically blocked on RDS.

Further readings

Sources