oam/knowledge base/cloud computing/aws/s3.md

# Simple Storage Service

1. [TL;DR](#tldr)
1. [Storage classes](#storage-classes)
1. [Lifecycle configuration](#lifecycle-configuration)
1. [Further readings](#further-readings)
   1. [Sources](#sources)

## TL;DR

<details>
  <summary>Usage</summary>

```sh
# List all buckets.
aws s3 ls
aws s3api list-buckets --output 'json' --query 'Buckets[].Name'
aws s3api list-buckets --output 'yaml-stream' | yq -r '.[].Buckets[].Name' -

# List prefixes and objects in buckets.
# Adding the trailing '/' or '--recurse' lists the content of prefixes.
aws s3 ls 's3://my-bucket'
aws s3 ls --recursive 's3://my-bucket/prefix/'
aws s3 ls 's3://arn:aws:s3:us-west-2:123456789012:accesspoint/myaccesspoint/'

# Find the size of buckets or objects.
# It will list all the contents *and* give a total size at the end.
aws s3 ls --human-readable --recursive --summarize 's3://my-bucket'
aws s3 ls … 's3://my-bucket/prefix/'

# Create buckets.
aws s3 mb 's3://my-bucket'

# Copy files to or from buckets.
aws s3 cp 'test.txt' 's3://my-bucket/test4.txt'
aws s3 cp 'test.txt' 's3://my-bucket/test2.txt' --expires '2024-10-01T20:30:00Z'
aws s3 cp 's3://my-bucket/test.txt' 'test2.txt'
aws s3 cp 's3://my-bucket/test.txt' 's3://my-bucket/test5.txt'
aws s3 cp 's3://my-bucket/test.txt' 's3://my-other-bucket/'
aws s3 cp 's3://my-bucket' '.' --recursive
aws s3 cp 'myDir' 's3://my-bucket/' --recursive --exclude "*.jpg"
aws s3 cp 's3://my-bucket/logs/' 's3://my-bucket2/logs/' --recursive \
  --exclude "*" --include "*.log"
aws s3 cp 's3://my-bucket/test.txt' 's3://my-bucket/test2.txt' \
    --acl 'public-read-write'
aws s3 cp 'file.txt' 's3://my-bucket/' \
  --grants read=uri='http://acs.amazonaws.com/groups/global/AllUsers' \
    'full=id=79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be'
aws s3 cp 'mydoc.txt' 's3://arn:aws:s3:us-west-2:123456789012:accesspoint/myaccesspoint/mykey'

# Handle file streams.
# Useful for piping:
# - setting the source to '-' sends data from stdin
# - setting the destination to '-' sends data to stdout
aws s3 cp - 's3://my-bucket/stream.txt'
aws s3 cp - 's3://my-bucket/stream.txt' --expected-size '54760833024'
aws s3 cp 's3://my-bucket/stream.txt' -

# Directly print the contents of files to stdout.
aws s3 cp --quiet 's3://my-bucket/file.txt' '-'
aws s3 cp --quiet 's3://my-bucket/file.txt' '/dev/stdout'

# Remove objects.
aws s3 rm 's3://my-bucket/prefix-name' --recursive --dryrun

# Sync buckets.
aws s3 sync '.' 's3://my-bucket'
aws s3 sync 's3://my-bucket' '.' --delete
aws s3 sync 's3://my-bucket' 's3://my-other-bucket' --exclude "*.jpg"
aws s3 sync 's3://my-us-west-2-bucket' 's3://my-eu-east-1-bucket' \
  --source-region 'us-west-2' --region 'eu-east-1'
aws s3 sync '.' 's3://arn:aws:s3:us-west-2:123456789012:accesspoint/myaccesspoint/'

# Delete buckets.
aws s3 rb 's3://my-bucket'
aws s3 rb 's3://my-bucket' --force

# Check permissions.
aws s3api get-bucket-acl --bucket 'my-bucket'
```

</details>

<details>
  <summary>Lifecycle configurations</summary>

```sh
# Manage lifecycle configurations.
# Operations on lifecycle rules take a while.
aws s3api get-bucket-lifecycle-configuration --bucket 'bucketName'
aws s3api put-bucket-lifecycle-configuration --bucket 'bucketName' \
  --lifecycle-configuration 'file://lifecycle.definition.json'
aws s3api delete-bucket-lifecycle-configuration --bucket 'bucketName'
```

</details>

<details>
  <summary>Real life use cases</summary>

```sh
# Get objects with their storage class.
aws s3api list-objects --bucket 'my-bucket' \
  --query 'Contents[].{Key: Key, StorageClass: StorageClass}'

# Show tags on objects.
aws s3api list-objects-v2 \
  --bucket 'my-bucket' --prefix 'someObjectsInHereAreTagged' \
  --query 'Contents[*].Key' --output text \
| xargs -n 1 \
    aws s3api get-object-tagging --bucket 'my-bucket' --query 'TagSet[*]' --key
```

</details>

## Storage classes

| Class name                 | Console name          | Fees                 | Latency          | Minimum storage charge | Minimum billed object size | # of AZs |
| -------------------------- | --------------------- | -------------------- | ---------------- | ---------------------- | -------------------------- | -------- |
| Standard                   | `STANDARD`            | ✗                    | milliseconds     | ✗                      |                            | 3+       |
| Express One Zone           | `EXPRESS_ONEZONE`     | ✗                    | single-digit ms  | 1 hour                 |                            | 1        |
| Intelligent Tiering        | `INTELLIGENT_TIERING` | per monitored object | milliseconds     | ✗                      |                            | 3+       |
| Standard Infrequent Access | `STANDARD_IA`         | per GB retrieved     | milliseconds     | 30 days                | 128 KB                     | 3+       |
| One Zone Infrequent Access | `ONEZONE_IA`          | per GB retrieved     | milliseconds     | 30 days                | 128 KB                     | 1        |
| Glacier Instant Retrieval  | `GLACIER_IR`          | per GB retrieved     | milliseconds     | 90 days                | 128 KB                     | 3+       |
| Glacier Flexible Retrieval | `GLACIER`             | per GB retrieved     | minutes to hours | 90 days                |                            | 3+       |
| Glacier Deep Archive       | `DEEP_ARCHIVE`        | per GB retrieved     | hours            | 180 days               |                            | 3+       |

_Standard_ is the storage class used by default if none is specified when uploading objects.

_Express One Zone_ is purpose-built for consistency and low latency. It has the highest performance, and lower request
costs than standard, but is only available within a single Availability Zone at a time.

_Intelligent Tiering_ optimizes storage costs by automatically moving data between access tiers depending on its usage,
without performance impact or operational overhead.<br/>
Ideal for data that has unknown or changing access patterns.

Intelligent Tiering automatically moves objects that have not been accessed in some time to lower-cost access tiers that
still offer low-latency and high-throughput.

<details style='padding: 0 0 1rem 1rem'>

Objects in Intelligent Tiering are stored automatically in the following tiers:

- _Frequent Access_: contains objects that are uploaded, or transitioned, to the storage class.
- _Infrequent Access_: contains objects that have not been accessed for **30 consecutive days**.
- _Archive Instant Access_: contains objects that have not been accessed for **90 consecutive days**.

> [!important]
> Object less than 128 KB in size are **not** eligible for auto-tiering. These objects are kept in the Frequent Access
> tier at all times.

</details>

One can also enable automatic archiving capabilities within Intelligent Tiering for data that can be accessed
**asynchronously**. In this case, it will eventually move objects to access tiers with even lower costs, but that
require explicit retrieval processes.

<details style='padding: 0 0 1rem 1rem'>

The optional archive access tiers are the following:

- _Archive Access_: archives objects that have not been accessed for **at least 90 consecutive days**.
- _Deep Archive Access_: archives objects that have not been accessed for **at least 180 consecutive days**.

Objects in the Archive Access or Deep Archive Access tiers **must first be restored** to higher tiers by using the
`RestoreObject` action.

</details>

_Standard Infrequent Access_ and _One Zone Infrequent Access_ are designed for data that is both **long-lived** and
**infrequently accessed**, but still requires millisecond access.<br/>
Suitable for objects larger than 128 KB that are needed for at least 30 days.

> [!important]
> S3 charges for object smaller than 128 KB as if they were of 128 KB.<br/>
> Objects deleted, overwritten, or transitioned to a different storage class before the end of the 30-day minimum
> storage duration period will still incur in charges for the full 30 days.

_Glacier Instant Retrieval_, _Glacier Flexible Retrieval_, and _Glacier Deep Archive_ are designed for low-cost,
long-term data storage and data archiving.<br/>
All these storage classes require minimum storage durations and charge retrieval fees.

Glacier Instant Retrieval is the only one in the Glacier set that offers milliseconds retrieval and real-time
access.<br/>
Glacier Flexible Retrieval and Glacier Deep Archive archive the data they receive, making it **not** available for
real-time access.

## Lifecycle configuration

S3 supports specific lifecycle transitions between storage classes using Lifecycle configurations:

![supported storage classes transitions](s3%20supported%20storage%20classes%20transitions.png)

Objects can be transitioned **down** the storage classes, but **not** up.<br/>
Objects in need to be moved to a higher storage class need to be **_recreated_** in that storage class. This means that
they will take new metadata.

Other constraints apply, e.g., objects smaller than 128KiB are not usually transitioned in tier.<br/>
See [General considerations for transitions][lifecycle  general considerations for transitions].

When multiple rules are applied through Lifecycle configurations, objects can become eligible for multiple Lifecycle
actions. In such cases:

1. Permanent deletion takes precedence over transitions.
1. Transitions takes precedence over creation of delete markers.
1. When objects are eligible for transition to both S3 Glacier Flexible Retrieval and S3 Standard-IA (or One Zone-IA),
   precedence is given to S3 Glacier Flexible Retrieval transition.

> [!important]
> When adding Lifecycle configurations to buckets, there is usually some lag before a new, or updated, Lifecycle
> configuration is fully propagated to all S3's systems.<br/>
> Expect a delay of a few minutes before any change in configuration starts taking effect. This includes configuration
> deletions.

Examples: [1][lifecycle  configuration examples], [2][s3 lifecycle rules examples]

## Further readings

- [Amazon Web Services]
- [Configure notification for lifecycle rules][lifecycle  configure notification]
- AWS' [CLI]
- [Expiring Amazon S3 objects based on last accessed date to decrease costs]
- [Understanding and managing Amazon S3 storage classes]
- [Using S3 Intelligent-Tiering]
- [Amazon S3 cost optimization for predictable and dynamic access patterns]
- [Gateway Endpoints vs Internet Routing for S3]
- [Understanding your AWS billing and usage reports for Amazon S3]
- [Downloading objects]
- [Download and upload objects with presigned URLs]

### Sources

- [Amazon S3 Storage Classes]
- [General considerations for transitions][lifecycle  general considerations for transitions]
- [Lifecycle configuration examples][lifecycle  configuration examples]
- [CLI subcommand reference]
- [Find out the size of your Amazon S3 buckets]
- [How S3 Intelligent-Tiering works]
- [Amazon S3 Intelligent Tiering]

<!--
  Reference
  ═╬═Time══
  -->

<!-- In-article sections -->
<!-- Knowledge base -->
[amazon web services]: README.md
[cli]: cli.md

<!-- Files -->
[s3 lifecycle rules examples]: ../../../examples/aws/s3.lifecycle-rules

<!-- Upstream -->
[Amazon S3 cost optimization for predictable and dynamic access patterns]: https://aws.amazon.com/blogs/storage/amazon-s3-cost-optimization-for-predictable-and-dynamic-access-patterns/
[amazon s3 storage classes]: https://aws.amazon.com/s3/storage-classes/
[cli subcommand reference]: https://docs.aws.amazon.com/cli/latest/reference/s3/
[Download and upload objects with presigned URLs]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html
[Downloading objects]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/download-objects.html
[expiring amazon s3 objects based on last accessed date to decrease costs]: https://aws.amazon.com/blogs/architecture/expiring-amazon-s3-objects-based-on-last-accessed-date-to-decrease-costs/
[find out the size of your amazon s3 buckets]: https://aws.amazon.com/blogs/storage/find-out-the-size-of-your-amazon-s3-buckets/
[how s3 intelligent-tiering works]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/intelligent-tiering-overview.html
[lifecycle  configuration examples]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-configuration-examples.html
[lifecycle  configure notification]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-configure-notification.html
[lifecycle  general considerations for transitions]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-transition-general-considerations.html
[Understanding and managing Amazon S3 storage classes]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-class-intro.html
[Understanding your AWS billing and usage reports for Amazon S3]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/aws-usage-report-understand.html
[Using S3 Intelligent-Tiering]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-intelligent-tiering.html

<!-- Others -->
[Amazon S3 Intelligent Tiering]: https://awsfundamentals.com/blog/amazon-s3-intelligent-tiering
[Gateway Endpoints vs Internet Routing for S3]: https://awsfundamentals.com/blog/gateway-endpoints-vs-internet-routing-s3