feat: cost-saving measures

This commit is contained in:
Michele Cereda
2026-01-13 17:58:40 +01:00
parent baaf27513a
commit 264353654a
10 changed files with 199 additions and 5 deletions

View File

@@ -5,6 +5,7 @@ Observability service. with functions for logging, monitoring and alerting.
1. [TL;DR](#tldr) 1. [TL;DR](#tldr)
1. [Queries of interest](#queries-of-interest) 1. [Queries of interest](#queries-of-interest)
1. [Stream logs](#stream-logs) 1. [Stream logs](#stream-logs)
1. [Cost-saving measures](#cost-saving-measures)
1. [Further readings](#further-readings) 1. [Further readings](#further-readings)
1. [Sources](#sources) 1. [Sources](#sources)
@@ -31,6 +32,19 @@ The [CloudWatch console] offers some default good queries.
Logs in Log Groups can be [streamed][stream logs] elsewhere. Logs in Log Groups can be [streamed][stream logs] elsewhere.
CloudWatch retains metrics' data as follows:
- Data points with a period of less than 60 seconds are available for 3 hours.<br/>
These are high-resolution custom metrics.
- Data points with a period of 60 seconds (1 minute) are available for 15 days.
- Data points with a period of 300 seconds (5 minutes) are available for 63 days.
- Data points with a period of 3600 seconds (1 hour) are available for 455 days (15 months).
Data points are aggregated together for long-term storage after the initial period.<br/>
E.g., data using a period of 1 minute remains available for 15 days with 1-minute resolution, then it is aggregated and
made available with a resolution of 5 minutes; after 63 days, it is further aggregated and made available with a
resolution of 1 hour for 15 months.
<details> <details>
<summary>CLI commands</summary> <summary>CLI commands</summary>
@@ -101,6 +115,12 @@ Also refer [Streaming CloudWatch Logs data to Amazon OpenSearch Service] to stre
Logs in CloudWatch Log Groups can be streamed [Kinesis], [Firehose] or [Lambda] by leveraging Logs subscriptions. Logs in CloudWatch Log Groups can be streamed [Kinesis], [Firehose] or [Lambda] by leveraging Logs subscriptions.
## Cost-saving measures
- Configure an _appropriate_ log retention period for any log groups.<br/>
Log groups containing development logs should not usually need more than 1w worth.
- When in doubt, still configure a default, long log retention period for all log groups (10y?).
## Further readings ## Further readings
- [Website] - [Website]
@@ -113,6 +133,7 @@ Logs in CloudWatch Log Groups can be streamed [Kinesis], [Firehose] or [Lambda]
- [Real-time processing of log data with subscriptions] - [Real-time processing of log data with subscriptions]
- [Streaming CloudWatch Logs data to Amazon OpenSearch Service] - [Streaming CloudWatch Logs data to Amazon OpenSearch Service]
- [Which log group is causing a sudden increase in my CloudWatch Logs bill?] - [Which log group is causing a sudden increase in my CloudWatch Logs bill?]
- [Metrics concepts]
<!-- <!--
Reference Reference
@@ -130,6 +151,7 @@ Logs in CloudWatch Log Groups can be streamed [Kinesis], [Firehose] or [Lambda]
[firehose]: https://docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.html [firehose]: https://docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.html
[kinesis]: https://docs.aws.amazon.com/kinesis/ [kinesis]: https://docs.aws.amazon.com/kinesis/
[lambda]: https://docs.aws.amazon.com/lambda/ [lambda]: https://docs.aws.amazon.com/lambda/
[Metrics concepts]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_concepts.html
[real-time processing of log data with subscriptions]: https://docs.aws.amazon.com/cloudwatch/latest/logs/Subscriptions.html [real-time processing of log data with subscriptions]: https://docs.aws.amazon.com/cloudwatch/latest/logs/Subscriptions.html
[services that publish cloudwatch metrics]: https://docs.aws.amazon.com/cloudwatch/latest/monitoring/aws-services-cloudwatch-metrics.html [services that publish cloudwatch metrics]: https://docs.aws.amazon.com/cloudwatch/latest/monitoring/aws-services-cloudwatch-metrics.html
[streaming cloudwatch logs data to amazon opensearch service]: https://docs.aws.amazon.com/cloudwatch/latest/logs/CWL_OpenSearch_Stream.html [streaming cloudwatch logs data to amazon opensearch service]: https://docs.aws.amazon.com/cloudwatch/latest/logs/CWL_OpenSearch_Stream.html

View File

@@ -6,9 +6,11 @@ Persistent [block storage][what is block storage?] for [EC2 Instances][ec2].
1. [Volume types](#volume-types) 1. [Volume types](#volume-types)
1. [Snapshots](#snapshots) 1. [Snapshots](#snapshots)
1. [Encryption](#encryption) 1. [Encryption](#encryption)
1. [Archiving](#archiving)
1. [Operations](#operations) 1. [Operations](#operations)
1. [Increase disks' size](#increase-disks-size) 1. [Increase disks' size](#increase-disks-size)
1. [Migrate `gp2` volumes to `gp3`](#migrate-gp2-volumes-to-gp3) 1. [Migrate `gp2` volumes to `gp3`](#migrate-gp2-volumes-to-gp3)
1. [Cost-saving measures](#cost-saving-measures)
1. [Further readings](#further-readings) 1. [Further readings](#further-readings)
1. [Sources](#sources) 1. [Sources](#sources)
@@ -79,14 +81,16 @@ details about EBS balances.
## Volume types ## Volume types
Refer [Amazon EBS volume types]. Refer [Amazon EBS volume types] and [Amazon EBS General Purpose SSD volumes].
| | `gp3` | `gp2` | `io2` | `io1` | `st1` | `sc1` | | | `gp3` | `gp2` | `io2` | `io1` | `st1` | `sc1` |
| ------------------- | ------------------------------------------------ | -------------- | ----------------- | ----------------- | ---------------- | ---------------- | | ------------------- | ------------------------------------------------ | -------------- | ----------------- | ----------------- | ---------------- | ---------------- |
| Class | SSD | SSD | SSD | SSD | HDD | HDD | | Class | SSD | SSD | SSD | SSD | HDD | HDD |
| Annual failure rate | 0.1% - 0.2% | 0.1% - 0.2% | 0.001% | 0.1% - 0.2% | 0.1% - 0.2% | 0.1% - 0.2% | | Annual failure rate | 0.1% - 0.2% | 0.1% - 0.2% | 0.001% | 0.1% - 0.2% | 0.1% - 0.2% | 0.1% - 0.2% |
| Size | 1 GiB - 16 TiB | 1 GiB - 16 TiB | 4 GiB - 64 TiB | 4 GiB - 16 TiB | 125 GiB - 16 TiB | 125 GiB - 16 TiB | | Size | 1 GiB - 16 TiB | 1 GiB - 16 TiB | 4 GiB - 64 TiB | 4 GiB - 16 TiB | 125 GiB - 16 TiB | 125 GiB - 16 TiB |
| Baseline IOPS | 3000 | 100 | 4,000 | 100 | N/A | N/A |
| Max IOPS | 16,000 | 16,000 | 256,000 | 64,000 | 500 | 250 | | Max IOPS | 16,000 | 16,000 | 256,000 | 64,000 | 500 | 250 |
| Baseline throughput | 125 MiB/s | 128 MiB/s | 1,000 MiB/s | 1 MiB/s | 5 MiB/s | 1.5 MiB/s |
| Max throughput | 1,000 MiB/s | 250 MiB/s | 4,000 MiB/s | 1,000 MiB/s | 500 MiB/s | 250 MiB/s | | Max throughput | 1,000 MiB/s | 250 MiB/s | 4,000 MiB/s | 1,000 MiB/s | 500 MiB/s | 250 MiB/s |
| Multi-attach | No | No | Yes | Yes | No | No | | Multi-attach | No | No | Yes | Yes | No | No |
| NVMe reservations | No | No | Yes | No | No | No | | NVMe reservations | No | No | Yes | No | No | No |
@@ -117,6 +121,33 @@ Total: $1.71 + $0.00 + $1.66 = $3.37
</details> </details>
`gp3` volumes are normally much better and more cost-effective than `gp2` ones.<br/>
There are still specific situations where `gp2` volumes _might_ be _slightly_ better than `gp3` ones, namely:
- Burst-based, spiky workloads with low sustained demand, like cron jobs, backups, compactions, and batch analytics.
<details>
`gp2` volumes accumulate burst credits when they are underutilized.<br/>
Large volumes accumulate credits quickly, and can then burst for a long time.
Sustained performance for `gp3` volumes costs a pretty penny, even if it's only needed briefly.
</details>
- Very large volumes (> 1 TiB) that do not require guaranteed throughput tuning.
<details>
`gp2` volumes' performance is based on their size, and larger volumes can provide more **baseline** performance for
free.<br/>
E.g., at 4 TB a `gp2` volume as a baseline of 12000 IOPS, while a `gp3` volume still has a baseline of 3000 IOPS.
The maximum throughput will still be lower than `gp3` volumes, but as long as up to 250 MiB/s in bursts is fine it
can be a better deal.
</details>
## Snapshots ## Snapshots
A volume's first snapshot is a **complete** snapshot of it, with _all the volume's blocks_ being copied over.<br/> A volume's first snapshot is a **complete** snapshot of it, with _all the volume's blocks_ being copied over.<br/>
@@ -218,6 +249,12 @@ Attaching EBS volumes which data keys are encrypted with unusable KMS keys to EC
not be able to use the KMS keys to decrypt the data key used for the volume.<br/> not be able to use the KMS keys to decrypt the data key used for the volume.<br/>
Make the KMS key usable again to be able to attach such EBS volumes. Make the KMS key usable again to be able to attach such EBS volumes.
## Archiving
Refer [Amazon EBS General Purpose SSD volumes].
Archiving has a 90d minimum storage fee, **and** archived resources have retrieval fees.
## Operations ## Operations
### Increase disks' size ### Increase disks' size
@@ -279,6 +316,16 @@ If changing the volume type from `gp2` to `gp3` **without** specifying IOPS or t
automatically provisions either equivalent performance to that of the source `gp2` volume, or the baseline `gp3` automatically provisions either equivalent performance to that of the source `gp2` volume, or the baseline `gp3`
performance, whichever is higher. performance, whichever is higher.
## Cost-saving measures
- Prefer using `gp3` volumes unless an application requires specific IOPS or throughput.
- Still prefer `gp3` volumes to `gp2`.<br/>
`gp3` volumes cost less, and have better performance per GB (except some specific corner cases).<br/>
Performance of `gp3` volumes can also be somewhat tuned, while `gp2`'s only increase with size.
- Consider using `gp2` volumes _only_ when encountering those corner cases, usually where size > 1 TiB and comparable
higher-than-baseline bandwidth is needed only in bursts.
- Consider [archiving] when snapshots should not be accessed for 90d or more.
## Further readings ## Further readings
- [Amazon Web Services] - [Amazon Web Services]
@@ -309,16 +356,20 @@ performance, whichever is higher.
═╬═Time══ ═╬═Time══
--> -->
<!-- In-article sections -->
[archiving]: #archiving
<!-- Knowledge base --> <!-- Knowledge base -->
[amazon web services]: README.md [amazon web services]: README.md
[cli]: cli.md [cli]: cli.md
[ec2]: ec2.md [ec2]: ec2.md
<!-- Upstream --> <!-- Upstream -->
[Amazon EBS General Purpose SSD volumes]: https://docs.aws.amazon.com/ebs/latest/userguide/general-purpose.html
[amazon ebs pricing]: https://aws.amazon.com/ebs/pricing/ [amazon ebs pricing]: https://aws.amazon.com/ebs/pricing/
[amazon ebs volume types]: https://docs.aws.amazon.com/ebs/latest/userguide/ebs-volume-types.html [amazon ebs volume types]: https://docs.aws.amazon.com/ebs/latest/userguide/ebs-volume-types.html
[amazon ebs-optimized instance types]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html [amazon ebs-optimized instance types]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html
[archive amazon ebs snapshots]: https://docs.aws.amazon.com/ebs/latest/userguide/snapshot-archive.html [Archive Amazon EBS snapshots]: https://docs.aws.amazon.com/ebs/latest/userguide/snapshot-archive.html
[automate snapshot lifecycles]: https://docs.aws.amazon.com/ebs/latest/userguide/snapshot-ami-policy.html [automate snapshot lifecycles]: https://docs.aws.amazon.com/ebs/latest/userguide/snapshot-ami-policy.html
[choose the best amazon ebs volume type for your self-managed database deployment]: https://aws.amazon.com/blogs/storage/how-to-choose-the-best-amazon-ebs-volume-type-for-your-self-managed-database-deployment/ [choose the best amazon ebs volume type for your self-managed database deployment]: https://aws.amazon.com/blogs/storage/how-to-choose-the-best-amazon-ebs-volume-type-for-your-self-managed-database-deployment/
[delete-volume]: https://docs.aws.amazon.com/cli/latest/reference/ec2/delete-volume.html [delete-volume]: https://docs.aws.amazon.com/cli/latest/reference/ec2/delete-volume.html

View File

@@ -9,6 +9,7 @@
1. [Lifecycle hooks](#lifecycle-hooks) 1. [Lifecycle hooks](#lifecycle-hooks)
1. [Image customization](#image-customization) 1. [Image customization](#image-customization)
1. [Automatic recovery](#automatic-recovery) 1. [Automatic recovery](#automatic-recovery)
1. [Cost-saving measures](#cost-saving-measures)
1. [Further readings](#further-readings) 1. [Further readings](#further-readings)
1. [Sources](#sources) 1. [Sources](#sources)
@@ -19,13 +20,15 @@ The API for EC2 are [**eventually** consistent][Eventual consistency in the Amaz
EC2 instances are billed by the second, with a minimum of 60s, EC2 instances are billed by the second, with a minimum of 60s,
[since 2017-10-02][announcing amazon ec2 per second billing]. [since 2017-10-02][announcing amazon ec2 per second billing].
Use an instance profile to allow an EC2 instance to use an IAM role. Use an IAM Instance Profile to allow an EC2 instance to use an IAM role.
`T` instances launch as `unlimited` by default. Launch them in `standard` mode to avoid paying for surplus credits. `T` instances launch as `unlimited` by default. Launch them in `standard` mode to avoid paying for surplus credits.
The instance type [_can_ be changed][change the instance type]. The procedure depends on the root volume, and **does** The instance type [_can_ be changed][change the instance type]. The procedure depends on the root volume, and **does**
require downtime. require downtime.
When using spot instances, prefer instrumenting the application to be aware of [termination notifications].
Clone EC2 instances by: Clone EC2 instances by:
1. Creating an AMI from the original instance. 1. Creating an AMI from the original instance.
@@ -222,6 +225,25 @@ Refer [Image Builder].
Also see [Automatic instance recovery]. Also see [Automatic instance recovery].
## Cost-saving measures
- Prefer using the most adequate instance type for the job.<br/>
E.g., prefer `r*` instances instead of `m*` ones where a lot of RAM is needed, but almost no CPU power is.
- Prefer using ARM-based (`g`) instances, unless a different architecture is required.
- Prefer _shared_ instances over _dedicated_ ones unless necessary.
Refer [Understanding AWS Tenancy Options].
- Prefer dedicated _instances_ over dedicated _hosts_ unless necessary.
Refer [Understanding AWS Tenancy Options].
- Prefer using [burstable (`t`) instances][burstable instances], unless steady performance is required and specially
for burstable workloads.
- When employing **underused** burstable instances, prefer re-launching them in `standard` mode to avoid paying for
surplus credits.
- Prefer using [spot instances] instead of on-demand ones where possible.
- Consider **stopping** or (even better) deleting non-production hosts after working hours.
- Consider applying for EC2 Instance and/or Compute Savings Plans.
- Consider [archiving snapshots] should they not be accessed for 90d or more.<br/>
Archiving has a 90d minimum storage fee, **and** archived resources have retrieval fees.
## Further readings ## Further readings
- [Amazon Web Services] - [Amazon Web Services]
@@ -269,8 +291,12 @@ Also see [Automatic instance recovery].
═╬═Time══ ═╬═Time══
--> -->
<!-- In-article sections -->
[burstable instances]: #burstable-instances
<!-- Knowledge base --> <!-- Knowledge base -->
[amazon web services]: README.md [amazon web services]: README.md
[archiving snapshots]: ebs.md#archiving
[cli]: cli.md [cli]: cli.md
[ebs]: ebs.md [ebs]: ebs.md
[image builder]: image%20builder.md [image builder]: image%20builder.md
@@ -302,7 +328,9 @@ Also see [Automatic instance recovery].
[Manually create or edit the CloudWatch agent configuration file]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html [Manually create or edit the CloudWatch agent configuration file]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html
[recommended alarms]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Best_Practice_Recommended_Alarms_AWS_Services.html#EC2 [recommended alarms]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Best_Practice_Recommended_Alarms_AWS_Services.html#EC2
[retrieve instance metadata]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html [retrieve instance metadata]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
[Spot Instances]: https://aws.amazon.com/ec2/spot/
[standard mode for burstable performance instances]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-standard-mode.html [standard mode for burstable performance instances]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-standard-mode.html
[termination notifications]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-instance-termination-notices.html
[unlimited mode for burstable performance instances]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-unlimited-mode.html [unlimited mode for burstable performance instances]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-unlimited-mode.html
[using al2023 based amazon ecs amis to host containerized workloads]: https://docs.aws.amazon.com/linux/al2023/ug/ecs.html [using al2023 based amazon ecs amis to host containerized workloads]: https://docs.aws.amazon.com/linux/al2023/ug/ecs.html
[using instance profiles]: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html [using instance profiles]: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html

View File

@@ -36,6 +36,7 @@
1. [Best practices](#best-practices) 1. [Best practices](#best-practices)
1. [Troubleshooting](#troubleshooting) 1. [Troubleshooting](#troubleshooting)
1. [Invalid 'cpu' setting for task](#invalid-cpu-setting-for-task) 1. [Invalid 'cpu' setting for task](#invalid-cpu-setting-for-task)
1. [Cost-saving measures](#cost-saving-measures)
1. [Further readings](#further-readings) 1. [Further readings](#further-readings)
1. [Sources](#sources) 1. [Sources](#sources)
@@ -133,6 +134,12 @@ curl -fs "http://$( \
)" --query "tasks[].attachments[].details[?(name=='privateDnsName')].value" --output 'text' \ )" --query "tasks[].attachments[].details[?(name=='privateDnsName')].value" --output 'text' \
):8080" ):8080"
# Get the image of specific containers.
aws ecs list-tasks --cluster 'someCluster' --service-name 'someService' --query 'taskArns[0]' --output 'text' \
| xargs -oI '%%' \
aws ecs describe-tasks --cluster 'someCluster' --task '%%' \
--query 'tasks[].containers[?name==`someContainer`].image' --output 'text'
# Delete services. # Delete services.
aws ecs delete-service --cluster 'testCluster' --service 'testService' --force aws ecs delete-service --cluster 'testCluster' --service 'testService' --force
@@ -148,7 +155,8 @@ while [[ $(aws ecs list-tasks --query 'taskArns' --output 'text' --cluster 'test
# Restart tasks. # Restart tasks.
# No real way to do that, just stop the tasks and new ones will be eventually started in their place. # No real way to do that, just stop the tasks and new ones will be eventually started in their place.
# To mimic a blue-green deployment, scale the service up by doubling its tasks, then down again to the normal amount. # To mimic a blue-green deployment, scale the service up by doubling its tasks, then down again to the normal amount.
aws ecs update-service --cluster 'someCluster' --service 'someService' --desired-count '0' \
&& aws ecs update-service --cluster 'someCluster' --service 'someService' --desired-count '1'
``` ```
</details> </details>
@@ -1725,6 +1733,23 @@ Cost-saving measures:
capacity provider. capacity provider.
<details style='padding: 0 0 1rem 1rem'> <details style='padding: 0 0 1rem 1rem'>
<summary> Percentage-like </summary>
```json
{
"capacityProvider": "FARGATE",
"weight": 5
}
{
"capacityProvider": "FARGATE_SPOT",
"weight": 95
}
```
</details>
<details style='padding: 0 0 1rem 1rem'>
<summary> Ratio-like </summary>
```json ```json
{ {
@@ -1775,6 +1800,13 @@ Specify a supported value for the task CPU and memory in your task definition.
</details> </details>
## Cost-saving measures
- Prefer using [spot capacity][effectively using spot instances in aws ecs for production workloads] for non-critical
services and tasks.
- Consider applying for EC2 Instance and/or Compute Savings Plans if using EC2 capacity.<br/>
Consider applying for Compute Savings Plans if using Fargate capacity.
## Further readings ## Further readings
- [Amazon Web Services] - [Amazon Web Services]
@@ -1820,6 +1852,8 @@ Specify a supported value for the task CPU and memory in your task definition.
- [Amazon ECS Service Discovery] - [Amazon ECS Service Discovery]
- [AWS Fargate Pricing Explained] - [AWS Fargate Pricing Explained]
- [The Ultimate Beginner's Guide to AWS ECS] - [The Ultimate Beginner's Guide to AWS ECS]
- [Amazon Amazon ECS launch types and capacity providers]
- [Effectively Using Spot Instances in AWS ECS for Production Workloads]
<!-- <!--
Reference Reference
@@ -1847,6 +1881,7 @@ Specify a supported value for the task CPU and memory in your task definition.
[efs]: efs.md [efs]: efs.md
<!-- Upstream --> <!-- Upstream -->
[Amazon Amazon ECS launch types and capacity providers]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/capacity-launch-type-comparison.html
[Amazon ECS capacity providers for the EC2 launch type]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/asg-capacity-providers.html [Amazon ECS capacity providers for the EC2 launch type]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/asg-capacity-providers.html
[Amazon ECS clusters for Fargate]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-capacity-providers.html [Amazon ECS clusters for Fargate]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-capacity-providers.html
[Amazon ECS environment variables]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-environment-variables.html [Amazon ECS environment variables]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-environment-variables.html
@@ -1865,6 +1900,7 @@ Specify a supported value for the task CPU and memory in your task definition.
[AWS Fargate Spot Now Generally Available]: https://aws.amazon.com/blogs/aws/aws-fargate-spot-now-generally-available/ [AWS Fargate Spot Now Generally Available]: https://aws.amazon.com/blogs/aws/aws-fargate-spot-now-generally-available/
[Centralized Container Logging with Fluent Bit]: https://aws.amazon.com/blogs/opensource/centralized-container-logging-fluent-bit/ [Centralized Container Logging with Fluent Bit]: https://aws.amazon.com/blogs/opensource/centralized-container-logging-fluent-bit/
[ecs execute-command proposal]: https://github.com/aws/containers-roadmap/issues/1050 [ecs execute-command proposal]: https://github.com/aws/containers-roadmap/issues/1050
[Effectively Using Spot Instances in AWS ECS for Production Workloads]: https://medium.com/@ankur.ecb/effectively-using-spot-instances-in-aws-ecs-for-production-workloads-d46985d0ae2d
[EventBridge Scheduler]: https://docs.aws.amazon.com/scheduler/latest/UserGuide/what-is-scheduler.html [EventBridge Scheduler]: https://docs.aws.amazon.com/scheduler/latest/UserGuide/what-is-scheduler.html
[Example Amazon ECS task definition: Route logs to FireLens]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/firelens-taskdef.html [Example Amazon ECS task definition: Route logs to FireLens]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/firelens-taskdef.html
[fargate tasks sizes]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-tasks-services.html#fargate-tasks-size [fargate tasks sizes]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-tasks-services.html#fargate-tasks-size

View File

@@ -26,6 +26,7 @@
1. [Identify common issues](#identify-common-issues) 1. [Identify common issues](#identify-common-issues)
1. [The worker nodes fail to join the cluster](#the-worker-nodes-fail-to-join-the-cluster) 1. [The worker nodes fail to join the cluster](#the-worker-nodes-fail-to-join-the-cluster)
1. [AWS ELB controller fails to get the region from the host's metadata](#aws-elb-controller-fails-to-get-the-region-from-the-hosts-metadata) 1. [AWS ELB controller fails to get the region from the host's metadata](#aws-elb-controller-fails-to-get-the-region-from-the-hosts-metadata)
1. [Cost-saving measures](#cost-saving-measures)
1. [Further readings](#further-readings) 1. [Further readings](#further-readings)
1. [Sources](#sources) 1. [Sources](#sources)
@@ -1413,6 +1414,13 @@ helm upgrade -i --repo 'https://aws.github.io/eks-charts' \
--set 'vpcId'='vpc-01234567' --set 'vpcId'='vpc-01234567'
``` ```
## Cost-saving measures
- Consider [using spot instances][building for cost optimization and resilience for eks with spot instances] for
non-critical workloads.
- Consider applying for EC2 Instance and/or Compute Savings Plans if using EC2 worker nodes.
Consider applying for Compute Savings Plans if using Fargate profiles.
## Further readings ## Further readings
- [Amazon Web Services] - [Amazon Web Services]
@@ -1512,6 +1520,7 @@ helm upgrade -i --repo 'https://aws.github.io/eks-charts' \
[aws eks create-nodegroup]: https://docs.aws.amazon.com/cli/latest/reference/eks/create-nodegroup.html [aws eks create-nodegroup]: https://docs.aws.amazon.com/cli/latest/reference/eks/create-nodegroup.html
[AWS Node Termination Handler]: https://github.com/aws/aws-node-termination-handler [AWS Node Termination Handler]: https://github.com/aws/aws-node-termination-handler
[awssupport-troubleshooteksworkernode runbook]: https://docs.aws.amazon.com/systems-manager-automation-runbooks/latest/userguide/automation-awssupport-troubleshooteksworkernode.html [awssupport-troubleshooteksworkernode runbook]: https://docs.aws.amazon.com/systems-manager-automation-runbooks/latest/userguide/automation-awssupport-troubleshooteksworkernode.html
[Building for Cost optimization and Resilience for EKS with Spot Instances]: https://aws.amazon.com/blogs/compute/cost-optimization-and-resilience-eks-with-spot-instances/
[choosing an amazon ec2 instance type]: https://docs.aws.amazon.com/eks/latest/userguide/choosing-instance-type.html [choosing an amazon ec2 instance type]: https://docs.aws.amazon.com/eks/latest/userguide/choosing-instance-type.html
[configure instance permissions required for systems manager]: https://docs.aws.amazon.com/systems-manager/latest/userguide/setup-instance-profile.html#instance-profile-policies-overview [configure instance permissions required for systems manager]: https://docs.aws.amazon.com/systems-manager/latest/userguide/setup-instance-profile.html#instance-profile-policies-overview
[create an amazon ebs csi driver iam role]: https://docs.aws.amazon.com/eks/latest/userguide/csi-iam-role.html [create an amazon ebs csi driver iam role]: https://docs.aws.amazon.com/eks/latest/userguide/csi-iam-role.html

View File

@@ -380,7 +380,7 @@ can manage.
## Cost-saving measures ## Cost-saving measures
- Choose appropriate [instance types and sizes][supported instance types in amazon opensearch service].<br/> - Choose _appropriate_ [instance types and sizes][supported instance types in amazon opensearch service].<br/>
Leverage the ability to select them to tailor the service offering to one's needs. Leverage the ability to select them to tailor the service offering to one's needs.
> [OR1 instances][or1 storage for amazon opensearch service] **cannot** (currently?) be selected as master nodes.<br/> > [OR1 instances][or1 storage for amazon opensearch service] **cannot** (currently?) be selected as master nodes.<br/>

View File

@@ -28,6 +28,7 @@
1. [ERROR: must be superuser to alter _X_ roles or change _X_ attribute](#error-must-be-superuser-to-alter-x-roles-or-change-x-attribute) 1. [ERROR: must be superuser to alter _X_ roles or change _X_ attribute](#error-must-be-superuser-to-alter-x-roles-or-change-x-attribute)
1. [Transport fails asking for the remote user must have superuser, but it already does](#transport-fails-asking-for-the-remote-user-must-have-superuser-but-it-already-does) 1. [Transport fails asking for the remote user must have superuser, but it already does](#transport-fails-asking-for-the-remote-user-must-have-superuser-but-it-already-does)
1. [The instance is unbearably slow](#the-instance-is-unbearably-slow) 1. [The instance is unbearably slow](#the-instance-is-unbearably-slow)
1. [Cost-saving measures](#cost-saving-measures)
1. [Further readings](#further-readings) 1. [Further readings](#further-readings)
1. [Sources](#sources) 1. [Sources](#sources)
@@ -103,6 +104,10 @@ Maintenance windows are paused when their DB instances are stopped.
# Show details of RDS instances. # Show details of RDS instances.
aws rds describe-db-instances aws rds describe-db-instances
aws rds describe-db-instances --output 'json' --query "DBInstances[?(DBInstanceIdentifier=='master-prod')]" aws rds describe-db-instances --output 'json' --query "DBInstances[?(DBInstanceIdentifier=='master-prod')]"
aws rds describe-db-instances --db-instance-identifier 'some-db-instance' \
--query 'DBInstances[0].InstanceCreateTime' --output 'text'
aws rds describe-db-instances --db-instance-identifier 'some-db-instance' --output 'text' \
--query 'DBInstances[0]|join(``,[`postgresql://`,MasterUsername,`@`,Endpoint.Address,to_string(Endpoint.Port),`/`,DBname||`postgres`])'
# Enable Performance Insights. # Enable Performance Insights.
aws rds modify-db-cluster --db-cluster-identifier 'staging-cluster' \ aws rds modify-db-cluster --db-cluster-identifier 'staging-cluster' \
@@ -1073,6 +1078,16 @@ or write workloads and exceeds the instance type quotas.
</details> </details>
## Cost-saving measures
- Choose _appropriate_ instance types and sizes.
- Prefer using [reserved instances][rds reserved instances] when one can stay on a single instance type for the whole
duration of the reservation.<br/>
Should the DB type **not** change in time, prefer _Standard RIs_. Otherwise, prefer _Convertible RIs_ for
flexibility.
RDS does **not** support Savings Plans at the time of writing.
## Further readings ## Further readings
- [Working with DB instance read replicas] - [Working with DB instance read replicas]
@@ -1136,6 +1151,7 @@ or write workloads and exceeds the instance type quotas.
[migrating databases using rds postgresql transportable databases]: https://aws.amazon.com/blogs/database/migrating-databases-using-rds-postgresql-transportable-databases/ [migrating databases using rds postgresql transportable databases]: https://aws.amazon.com/blogs/database/migrating-databases-using-rds-postgresql-transportable-databases/
[Multi-AZ DB instance deployments for Amazon RDS]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.MultiAZSingleStandby.html [Multi-AZ DB instance deployments for Amazon RDS]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.MultiAZSingleStandby.html
[pricing and data retention for performance insights]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.Overview.cost.html [pricing and data retention for performance insights]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.Overview.cost.html
[RDS reserved instances]: https://aws.amazon.com/rds/reserved-instances/
[Recommended alarms for RDS]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Best_Practice_Recommended_Alarms_AWS_Services.html#RDS [Recommended alarms for RDS]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Best_Practice_Recommended_Alarms_AWS_Services.html#RDS
[renaming a db instance]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_RenameInstance.html [renaming a db instance]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_RenameInstance.html
[Restoring a DB instance to a specified time for Amazon RDS]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIT.html [Restoring a DB instance to a specified time for Amazon RDS]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIT.html

View File

@@ -3,6 +3,7 @@
1. [TL;DR](#tldr) 1. [TL;DR](#tldr)
1. [Storage classes](#storage-classes) 1. [Storage classes](#storage-classes)
1. [Lifecycle configuration](#lifecycle-configuration) 1. [Lifecycle configuration](#lifecycle-configuration)
1. [Cost-saving measures](#cost-saving-measures)
1. [Further readings](#further-readings) 1. [Further readings](#further-readings)
1. [Sources](#sources) 1. [Sources](#sources)
@@ -215,6 +216,17 @@ actions. In such cases:
Examples: [1][lifecycle configuration examples], [2][s3 lifecycle rules examples] Examples: [1][lifecycle configuration examples], [2][s3 lifecycle rules examples]
## Cost-saving measures
- Prefer using lower storage classes for data that is not frequently accessed.<br/>
Lower storage classes have a minimum storage fee period **and** retrieval fees.
- Consider using [Lifecycle configuration] to move down in storage tier that data that is not frequently accessed after
some time.
- Prefer using [S3 Intelligent-Tiering][how s3 intelligent-tiering works] when not knowing how frequently data is
accessed.
- Consider expiring old data after some time, if its retention is not needed.
- Consider compressing data before uploading it.
## Further readings ## Further readings
- [Amazon Web Services] - [Amazon Web Services]
@@ -245,6 +257,8 @@ Examples: [1][lifecycle configuration examples], [2][s3 lifecycle rules example
--> -->
<!-- In-article sections --> <!-- In-article sections -->
[Lifecycle configuration]: #lifecycle-configuration
<!-- Knowledge base --> <!-- Knowledge base -->
[amazon web services]: README.md [amazon web services]: README.md
[cli]: cli.md [cli]: cli.md

View File

@@ -1,6 +1,7 @@
# Sagemaker # Sagemaker
1. [TL;DR](#tldr) 1. [TL;DR](#tldr)
1. [Cost-saving measures](#cost-saving-measures)
1. [Further readings](#further-readings) 1. [Further readings](#further-readings)
1. [Sources](#sources) 1. [Sources](#sources)
@@ -11,6 +12,11 @@
- Serverless Endpoints' backend use **a snapshot** of the Endpoint Configuration at the time each host is created.<br/> - Serverless Endpoints' backend use **a snapshot** of the Endpoint Configuration at the time each host is created.<br/>
To make a serverless Endpoint use a new Configuration or Model, its hosts need to be replaced. To make a serverless Endpoint use a new Configuration or Model, its hosts need to be replaced.
## Cost-saving measures
- Use a single endpoint for multiple models where it makes sense.
- Delete endpoints when they are not used anymore.
## Further readings ## Further readings
- [Amazon Web Services] - [Amazon Web Services]

View File

@@ -52,6 +52,7 @@ Hosted by the [Cloud Native Computing Foundation][cncf].
1. [Run a command just before a Pod stops](#run-a-command-just-before-a-pod-stops) 1. [Run a command just before a Pod stops](#run-a-command-just-before-a-pod-stops)
1. [Examples](#examples) 1. [Examples](#examples)
1. [Create an admission webhook](#create-an-admission-webhook) 1. [Create an admission webhook](#create-an-admission-webhook)
1. [Cost-saving measures](#cost-saving-measures)
1. [Further readings](#further-readings) 1. [Further readings](#further-readings)
1. [Sources](#sources) 1. [Sources](#sources)
@@ -1256,6 +1257,16 @@ you need:
See the example's [README][create an admission webhook]. See the example's [README][create an admission webhook].
## Cost-saving measures
- Reconsider one's choices.<br/>
Does one really need a Kubernetes cluster? They introduce multiple redundancy, and have high complexity.<br/>
Consider the resources and maintenance efforts that will inevitably go into that.
- Consider leveraging autoscaling.<br/>
See [Horizontal Pod Autoscaling] and [KEDA] to scale Pods depending on metrics.<br/>
See [Node Autoscaling][node scaling] to scale Nodes depending on number of Pods, node features, or resource
consumption.
## Further readings ## Further readings
Usage: Usage:
@@ -1350,6 +1361,7 @@ Others:
<!-- In-article sections --> <!-- In-article sections -->
[horizontal pod autoscaler]: #horizontal-pod-autoscaler [horizontal pod autoscaler]: #horizontal-pod-autoscaler
[node scaling]: #node-scaling
[vertical pod autoscaler]: #vertical-pod-autoscaler [vertical pod autoscaler]: #vertical-pod-autoscaler
[pods]: #pods [pods]: #pods
[privileged container vs privilege escalation]: #privileged-container-vs-privilege-escalation [privileged container vs privilege escalation]: #privileged-container-vs-privilege-escalation