Simple Storage Service

TL;DR
Storage classes
Lifecycle configuration
Cost-saving measures
Further readings
1. Sources

TL;DR

Usage

# List all buckets.
aws s3 ls
aws s3api list-buckets --output 'json' --query 'Buckets[].Name'
aws s3api list-buckets --output 'yaml-stream' | yq -r '.[].Buckets[].Name' -

# List prefixes and objects in buckets.
# Adding the trailing '/' or '--recurse' lists the content of prefixes.
aws s3 ls 's3://my-bucket'
aws s3 ls --recursive 's3://my-bucket/prefix/'
aws s3 ls 's3://arn:aws:s3:us-west-2:123456789012:accesspoint/myaccesspoint/'

# Find the size of buckets or objects.
# It will list all the contents *and* give a total size at the end.
aws s3 ls --human-readable --recursive --summarize 's3://my-bucket'
aws s3 ls … 's3://my-bucket/prefix/'

# Create buckets.
aws s3 mb 's3://my-bucket'

# Copy files to or from buckets.
aws s3 cp 'test.txt' 's3://my-bucket/test4.txt'
aws s3 cp 'test.txt' 's3://my-bucket/test2.txt' --expires '2024-10-01T20:30:00Z'
aws s3 cp 's3://my-bucket/test.txt' 'test2.txt'
aws s3 cp 's3://my-bucket/test.txt' 's3://my-bucket/test5.txt'
aws s3 cp 's3://my-bucket/test.txt' 's3://my-other-bucket/'
aws s3 cp 's3://my-bucket' '.' --recursive
aws s3 cp 'myDir' 's3://my-bucket/' --recursive --exclude "*.jpg"
aws s3 cp 's3://my-bucket/logs/' 's3://my-bucket2/logs/' --recursive \
  --exclude "*" --include "*.log"
aws s3 cp 's3://my-bucket/test.txt' 's3://my-bucket/test2.txt' \
    --acl 'public-read-write'
aws s3 cp 'file.txt' 's3://my-bucket/' \
  --grants read=uri='http://acs.amazonaws.com/groups/global/AllUsers' \
    'full=id=79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be'
aws s3 cp 'mydoc.txt' 's3://arn:aws:s3:us-west-2:123456789012:accesspoint/myaccesspoint/mykey'

# Handle file streams.
# Useful for piping:
# - setting the source to '-' sends data from stdin
# - setting the destination to '-' sends data to stdout
aws s3 cp - 's3://my-bucket/stream.txt'
aws s3 cp - 's3://my-bucket/stream.txt' --expected-size '54760833024'
aws s3 cp 's3://my-bucket/stream.txt' -

# Directly print the contents of files to stdout.
aws s3 cp --quiet 's3://my-bucket/file.txt' '-'
aws s3 cp --quiet 's3://my-bucket/file.txt' '/dev/stdout'

# Remove objects.
aws s3 rm 's3://my-bucket/prefix-name' --recursive --dryrun

# Sync buckets.
aws s3 sync '.' 's3://my-bucket'
aws s3 sync 's3://my-bucket' '.' --delete
aws s3 sync 's3://my-bucket' 's3://my-other-bucket' --exclude "*.jpg"
aws s3 sync 's3://my-us-west-2-bucket' 's3://my-eu-east-1-bucket' \
  --source-region 'us-west-2' --region 'eu-east-1'
aws s3 sync '.' 's3://arn:aws:s3:us-west-2:123456789012:accesspoint/myaccesspoint/'

# Delete buckets.
aws s3 rb 's3://my-bucket'
aws s3 rb 's3://my-bucket' --force

# Check permissions.
aws s3api get-bucket-acl --bucket 'my-bucket'

Lifecycle configurations

# Manage lifecycle configurations.
# Operations on lifecycle rules take a while.
aws s3api get-bucket-lifecycle-configuration --bucket 'bucketName'
aws s3api put-bucket-lifecycle-configuration --bucket 'bucketName' \
  --lifecycle-configuration 'file://lifecycle.definition.json'
aws s3api delete-bucket-lifecycle-configuration --bucket 'bucketName'

Real life use cases

# Get objects with their storage class.
aws s3api list-objects --bucket 'my-bucket' \
  --query 'Contents[].{Key: Key, StorageClass: StorageClass}'

# Show tags on objects.
aws s3api list-objects-v2 \
  --bucket 'my-bucket' --prefix 'someObjectsInHereAreTagged' \
  --query 'Contents[*].Key' --output text \
| xargs -n 1 \
    aws s3api get-object-tagging --bucket 'my-bucket' --query 'TagSet[*]' --key

Storage classes

Class name	Console name	Fees	Latency	Minimum storage charge	Minimum billed object size	# of AZs
Standard	`STANDARD`	✗	milliseconds	✗		3+
Express One Zone	`EXPRESS_ONEZONE`	✗	single-digit ms	1 hour		1
Intelligent Tiering	`INTELLIGENT_TIERING`	per monitored object	milliseconds	✗		3+
Standard Infrequent Access	`STANDARD_IA`	per GB retrieved	milliseconds	30 days	128 KB	3+
One Zone Infrequent Access	`ONEZONE_IA`	per GB retrieved	milliseconds	30 days	128 KB	1
Glacier Instant Retrieval	`GLACIER_IR`	per GB retrieved	milliseconds	90 days	128 KB	3+
Glacier Flexible Retrieval	`GLACIER`	per GB retrieved	minutes to hours	90 days		3+
Glacier Deep Archive	`DEEP_ARCHIVE`	per GB retrieved	hours	180 days		3+

Standard is the storage class used by default if none is specified when uploading objects.

Express One Zone is purpose-built for consistency and low latency. It has the highest performance, and lower request costs than standard, but is only available within a single Availability Zone at a time.

Intelligent Tiering optimizes storage costs by automatically moving data between access tiers depending on its usage, without performance impact or operational overhead.
Ideal for data that has unknown or changing access patterns.

Intelligent Tiering automatically moves objects that have not been accessed in some time to lower-cost access tiers that still offer low-latency and high-throughput.

Objects in Intelligent Tiering are stored automatically in the following tiers:

Frequent Access: contains objects that are uploaded, or transitioned, to the storage class.
Infrequent Access: contains objects that have not been accessed for 30 consecutive days.
Archive Instant Access: contains objects that have not been accessed for 90 consecutive days.

Important

Object less than 128 KB in size are not eligible for auto-tiering. These objects are kept in the Frequent Access tier at all times.

One can also enable automatic archiving capabilities within Intelligent Tiering for data that can be accessed asynchronously. In this case, it will eventually move objects to access tiers with even lower costs, but that require explicit retrieval processes.

The optional archive access tiers are the following:

Archive Access: archives objects that have not been accessed for at least 90 consecutive days.
Deep Archive Access: archives objects that have not been accessed for at least 180 consecutive days.

Objects in the Archive Access or Deep Archive Access tiers must first be restored to higher tiers by using the RestoreObject action.

Standard Infrequent Access and One Zone Infrequent Access are designed for data that is both long-lived and infrequently accessed, but still requires millisecond access.
Suitable for objects larger than 128 KB that are needed for at least 30 days.

Important

S3 charges for object smaller than 128 KB as if they were of 128 KB.
Objects deleted, overwritten, or transitioned to a different storage class before the end of the 30-day minimum storage duration period will still incur in charges for the full 30 days.

Glacier Instant Retrieval, Glacier Flexible Retrieval, and Glacier Deep Archive are designed for low-cost, long-term data storage and data archiving.
All these storage classes require minimum storage durations and charge retrieval fees.

Glacier Instant Retrieval is the only one in the Glacier set that offers milliseconds retrieval and real-time access.
Glacier Flexible Retrieval and Glacier Deep Archive archive the data they receive, making it not available for real-time access.

Lifecycle configuration

S3 supports specific lifecycle transitions between storage classes using Lifecycle configurations:

Objects can be transitioned down the storage classes, but not up.
Objects in need to be moved to a higher storage class need to be recreated in that storage class. This means that they will take new metadata.

Other constraints apply, e.g., objects smaller than 128KiB are not usually transitioned in tier.
See General considerations for transitions.

When multiple rules are applied through Lifecycle configurations, objects can become eligible for multiple Lifecycle actions. In such cases:

Permanent deletion takes precedence over transitions.
Transitions takes precedence over creation of delete markers.
When objects are eligible for transition to both S3 Glacier Flexible Retrieval and S3 Standard-IA (or One Zone-IA), precedence is given to S3 Glacier Flexible Retrieval transition.

Important

When adding Lifecycle configurations to buckets, there is usually some lag before a new, or updated, Lifecycle configuration is fully propagated to all S3's systems.
Expect a delay of a few minutes before any change in configuration starts taking effect. This includes configuration deletions.

Examples: 1, 2

Cost-saving measures

Prefer using lower storage classes for data that is not frequently accessed.
Lower storage classes have a minimum storage fee period and retrieval fees.
Consider using Lifecycle configuration to move down in storage tier that data that is not frequently accessed after some time.
Prefer using S3 Intelligent-Tiering when not knowing how frequently data is accessed.
Consider expiring old data after some time, if its retention is not needed.
Consider compressing data before uploading it.

14 KiB Raw Blame History

Simple Storage Service

TL;DR

Storage classes

Lifecycle configuration

Cost-saving measures

Further readings

Sources

14 KiB

Raw Blame History