mirror of
https://gitea.com/mcereda/oam.git
synced 2026-02-09 05:44:23 +00:00
feat: cost-saving measures
This commit is contained in:
@@ -5,6 +5,7 @@ Observability service. with functions for logging, monitoring and alerting.
|
|||||||
1. [TL;DR](#tldr)
|
1. [TL;DR](#tldr)
|
||||||
1. [Queries of interest](#queries-of-interest)
|
1. [Queries of interest](#queries-of-interest)
|
||||||
1. [Stream logs](#stream-logs)
|
1. [Stream logs](#stream-logs)
|
||||||
|
1. [Cost-saving measures](#cost-saving-measures)
|
||||||
1. [Further readings](#further-readings)
|
1. [Further readings](#further-readings)
|
||||||
1. [Sources](#sources)
|
1. [Sources](#sources)
|
||||||
|
|
||||||
@@ -31,6 +32,19 @@ The [CloudWatch console] offers some default good queries.
|
|||||||
|
|
||||||
Logs in Log Groups can be [streamed][stream logs] elsewhere.
|
Logs in Log Groups can be [streamed][stream logs] elsewhere.
|
||||||
|
|
||||||
|
CloudWatch retains metrics' data as follows:
|
||||||
|
|
||||||
|
- Data points with a period of less than 60 seconds are available for 3 hours.<br/>
|
||||||
|
These are high-resolution custom metrics.
|
||||||
|
- Data points with a period of 60 seconds (1 minute) are available for 15 days.
|
||||||
|
- Data points with a period of 300 seconds (5 minutes) are available for 63 days.
|
||||||
|
- Data points with a period of 3600 seconds (1 hour) are available for 455 days (15 months).
|
||||||
|
|
||||||
|
Data points are aggregated together for long-term storage after the initial period.<br/>
|
||||||
|
E.g., data using a period of 1 minute remains available for 15 days with 1-minute resolution, then it is aggregated and
|
||||||
|
made available with a resolution of 5 minutes; after 63 days, it is further aggregated and made available with a
|
||||||
|
resolution of 1 hour for 15 months.
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
<summary>CLI commands</summary>
|
<summary>CLI commands</summary>
|
||||||
|
|
||||||
@@ -101,6 +115,12 @@ Also refer [Streaming CloudWatch Logs data to Amazon OpenSearch Service] to stre
|
|||||||
|
|
||||||
Logs in CloudWatch Log Groups can be streamed [Kinesis], [Firehose] or [Lambda] by leveraging Logs subscriptions.
|
Logs in CloudWatch Log Groups can be streamed [Kinesis], [Firehose] or [Lambda] by leveraging Logs subscriptions.
|
||||||
|
|
||||||
|
## Cost-saving measures
|
||||||
|
|
||||||
|
- Configure an _appropriate_ log retention period for any log groups.<br/>
|
||||||
|
Log groups containing development logs should not usually need more than 1w worth.
|
||||||
|
- When in doubt, still configure a default, long log retention period for all log groups (10y?).
|
||||||
|
|
||||||
## Further readings
|
## Further readings
|
||||||
|
|
||||||
- [Website]
|
- [Website]
|
||||||
@@ -113,6 +133,7 @@ Logs in CloudWatch Log Groups can be streamed [Kinesis], [Firehose] or [Lambda]
|
|||||||
- [Real-time processing of log data with subscriptions]
|
- [Real-time processing of log data with subscriptions]
|
||||||
- [Streaming CloudWatch Logs data to Amazon OpenSearch Service]
|
- [Streaming CloudWatch Logs data to Amazon OpenSearch Service]
|
||||||
- [Which log group is causing a sudden increase in my CloudWatch Logs bill?]
|
- [Which log group is causing a sudden increase in my CloudWatch Logs bill?]
|
||||||
|
- [Metrics concepts]
|
||||||
|
|
||||||
<!--
|
<!--
|
||||||
Reference
|
Reference
|
||||||
@@ -130,6 +151,7 @@ Logs in CloudWatch Log Groups can be streamed [Kinesis], [Firehose] or [Lambda]
|
|||||||
[firehose]: https://docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.html
|
[firehose]: https://docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.html
|
||||||
[kinesis]: https://docs.aws.amazon.com/kinesis/
|
[kinesis]: https://docs.aws.amazon.com/kinesis/
|
||||||
[lambda]: https://docs.aws.amazon.com/lambda/
|
[lambda]: https://docs.aws.amazon.com/lambda/
|
||||||
|
[Metrics concepts]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_concepts.html
|
||||||
[real-time processing of log data with subscriptions]: https://docs.aws.amazon.com/cloudwatch/latest/logs/Subscriptions.html
|
[real-time processing of log data with subscriptions]: https://docs.aws.amazon.com/cloudwatch/latest/logs/Subscriptions.html
|
||||||
[services that publish cloudwatch metrics]: https://docs.aws.amazon.com/cloudwatch/latest/monitoring/aws-services-cloudwatch-metrics.html
|
[services that publish cloudwatch metrics]: https://docs.aws.amazon.com/cloudwatch/latest/monitoring/aws-services-cloudwatch-metrics.html
|
||||||
[streaming cloudwatch logs data to amazon opensearch service]: https://docs.aws.amazon.com/cloudwatch/latest/logs/CWL_OpenSearch_Stream.html
|
[streaming cloudwatch logs data to amazon opensearch service]: https://docs.aws.amazon.com/cloudwatch/latest/logs/CWL_OpenSearch_Stream.html
|
||||||
|
|||||||
@@ -6,9 +6,11 @@ Persistent [block storage][what is block storage?] for [EC2 Instances][ec2].
|
|||||||
1. [Volume types](#volume-types)
|
1. [Volume types](#volume-types)
|
||||||
1. [Snapshots](#snapshots)
|
1. [Snapshots](#snapshots)
|
||||||
1. [Encryption](#encryption)
|
1. [Encryption](#encryption)
|
||||||
|
1. [Archiving](#archiving)
|
||||||
1. [Operations](#operations)
|
1. [Operations](#operations)
|
||||||
1. [Increase disks' size](#increase-disks-size)
|
1. [Increase disks' size](#increase-disks-size)
|
||||||
1. [Migrate `gp2` volumes to `gp3`](#migrate-gp2-volumes-to-gp3)
|
1. [Migrate `gp2` volumes to `gp3`](#migrate-gp2-volumes-to-gp3)
|
||||||
|
1. [Cost-saving measures](#cost-saving-measures)
|
||||||
1. [Further readings](#further-readings)
|
1. [Further readings](#further-readings)
|
||||||
1. [Sources](#sources)
|
1. [Sources](#sources)
|
||||||
|
|
||||||
@@ -79,14 +81,16 @@ details about EBS balances.
|
|||||||
|
|
||||||
## Volume types
|
## Volume types
|
||||||
|
|
||||||
Refer [Amazon EBS volume types].
|
Refer [Amazon EBS volume types] and [Amazon EBS General Purpose SSD volumes].
|
||||||
|
|
||||||
| | `gp3` | `gp2` | `io2` | `io1` | `st1` | `sc1` |
|
| | `gp3` | `gp2` | `io2` | `io1` | `st1` | `sc1` |
|
||||||
| ------------------- | ------------------------------------------------ | -------------- | ----------------- | ----------------- | ---------------- | ---------------- |
|
| ------------------- | ------------------------------------------------ | -------------- | ----------------- | ----------------- | ---------------- | ---------------- |
|
||||||
| Class | SSD | SSD | SSD | SSD | HDD | HDD |
|
| Class | SSD | SSD | SSD | SSD | HDD | HDD |
|
||||||
| Annual failure rate | 0.1% - 0.2% | 0.1% - 0.2% | 0.001% | 0.1% - 0.2% | 0.1% - 0.2% | 0.1% - 0.2% |
|
| Annual failure rate | 0.1% - 0.2% | 0.1% - 0.2% | 0.001% | 0.1% - 0.2% | 0.1% - 0.2% | 0.1% - 0.2% |
|
||||||
| Size | 1 GiB - 16 TiB | 1 GiB - 16 TiB | 4 GiB - 64 TiB | 4 GiB - 16 TiB | 125 GiB - 16 TiB | 125 GiB - 16 TiB |
|
| Size | 1 GiB - 16 TiB | 1 GiB - 16 TiB | 4 GiB - 64 TiB | 4 GiB - 16 TiB | 125 GiB - 16 TiB | 125 GiB - 16 TiB |
|
||||||
|
| Baseline IOPS | 3000 | 100 | 4,000 | 100 | N/A | N/A |
|
||||||
| Max IOPS | 16,000 | 16,000 | 256,000 | 64,000 | 500 | 250 |
|
| Max IOPS | 16,000 | 16,000 | 256,000 | 64,000 | 500 | 250 |
|
||||||
|
| Baseline throughput | 125 MiB/s | 128 MiB/s | 1,000 MiB/s | 1 MiB/s | 5 MiB/s | 1.5 MiB/s |
|
||||||
| Max throughput | 1,000 MiB/s | 250 MiB/s | 4,000 MiB/s | 1,000 MiB/s | 500 MiB/s | 250 MiB/s |
|
| Max throughput | 1,000 MiB/s | 250 MiB/s | 4,000 MiB/s | 1,000 MiB/s | 500 MiB/s | 250 MiB/s |
|
||||||
| Multi-attach | No | No | Yes | Yes | No | No |
|
| Multi-attach | No | No | Yes | Yes | No | No |
|
||||||
| NVMe reservations | No | No | Yes | No | No | No |
|
| NVMe reservations | No | No | Yes | No | No | No |
|
||||||
@@ -117,6 +121,33 @@ Total: $1.71 + $0.00 + $1.66 = $3.37
|
|||||||
|
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
|
`gp3` volumes are normally much better and more cost-effective than `gp2` ones.<br/>
|
||||||
|
There are still specific situations where `gp2` volumes _might_ be _slightly_ better than `gp3` ones, namely:
|
||||||
|
|
||||||
|
- Burst-based, spiky workloads with low sustained demand, like cron jobs, backups, compactions, and batch analytics.
|
||||||
|
|
||||||
|
<details>
|
||||||
|
|
||||||
|
`gp2` volumes accumulate burst credits when they are underutilized.<br/>
|
||||||
|
Large volumes accumulate credits quickly, and can then burst for a long time.
|
||||||
|
|
||||||
|
Sustained performance for `gp3` volumes costs a pretty penny, even if it's only needed briefly.
|
||||||
|
|
||||||
|
</details>
|
||||||
|
|
||||||
|
- Very large volumes (> 1 TiB) that do not require guaranteed throughput tuning.
|
||||||
|
|
||||||
|
<details>
|
||||||
|
|
||||||
|
`gp2` volumes' performance is based on their size, and larger volumes can provide more **baseline** performance for
|
||||||
|
free.<br/>
|
||||||
|
E.g., at 4 TB a `gp2` volume as a baseline of 12000 IOPS, while a `gp3` volume still has a baseline of 3000 IOPS.
|
||||||
|
|
||||||
|
The maximum throughput will still be lower than `gp3` volumes, but as long as up to 250 MiB/s in bursts is fine it
|
||||||
|
can be a better deal.
|
||||||
|
|
||||||
|
</details>
|
||||||
|
|
||||||
## Snapshots
|
## Snapshots
|
||||||
|
|
||||||
A volume's first snapshot is a **complete** snapshot of it, with _all the volume's blocks_ being copied over.<br/>
|
A volume's first snapshot is a **complete** snapshot of it, with _all the volume's blocks_ being copied over.<br/>
|
||||||
@@ -218,6 +249,12 @@ Attaching EBS volumes which data keys are encrypted with unusable KMS keys to EC
|
|||||||
not be able to use the KMS keys to decrypt the data key used for the volume.<br/>
|
not be able to use the KMS keys to decrypt the data key used for the volume.<br/>
|
||||||
Make the KMS key usable again to be able to attach such EBS volumes.
|
Make the KMS key usable again to be able to attach such EBS volumes.
|
||||||
|
|
||||||
|
## Archiving
|
||||||
|
|
||||||
|
Refer [Amazon EBS General Purpose SSD volumes].
|
||||||
|
|
||||||
|
Archiving has a 90d minimum storage fee, **and** archived resources have retrieval fees.
|
||||||
|
|
||||||
## Operations
|
## Operations
|
||||||
|
|
||||||
### Increase disks' size
|
### Increase disks' size
|
||||||
@@ -279,6 +316,16 @@ If changing the volume type from `gp2` to `gp3` **without** specifying IOPS or t
|
|||||||
automatically provisions either equivalent performance to that of the source `gp2` volume, or the baseline `gp3`
|
automatically provisions either equivalent performance to that of the source `gp2` volume, or the baseline `gp3`
|
||||||
performance, whichever is higher.
|
performance, whichever is higher.
|
||||||
|
|
||||||
|
## Cost-saving measures
|
||||||
|
|
||||||
|
- Prefer using `gp3` volumes unless an application requires specific IOPS or throughput.
|
||||||
|
- Still prefer `gp3` volumes to `gp2`.<br/>
|
||||||
|
`gp3` volumes cost less, and have better performance per GB (except some specific corner cases).<br/>
|
||||||
|
Performance of `gp3` volumes can also be somewhat tuned, while `gp2`'s only increase with size.
|
||||||
|
- Consider using `gp2` volumes _only_ when encountering those corner cases, usually where size > 1 TiB and comparable
|
||||||
|
higher-than-baseline bandwidth is needed only in bursts.
|
||||||
|
- Consider [archiving] when snapshots should not be accessed for 90d or more.
|
||||||
|
|
||||||
## Further readings
|
## Further readings
|
||||||
|
|
||||||
- [Amazon Web Services]
|
- [Amazon Web Services]
|
||||||
@@ -309,16 +356,20 @@ performance, whichever is higher.
|
|||||||
═╬═Time══
|
═╬═Time══
|
||||||
-->
|
-->
|
||||||
|
|
||||||
|
<!-- In-article sections -->
|
||||||
|
[archiving]: #archiving
|
||||||
|
|
||||||
<!-- Knowledge base -->
|
<!-- Knowledge base -->
|
||||||
[amazon web services]: README.md
|
[amazon web services]: README.md
|
||||||
[cli]: cli.md
|
[cli]: cli.md
|
||||||
[ec2]: ec2.md
|
[ec2]: ec2.md
|
||||||
|
|
||||||
<!-- Upstream -->
|
<!-- Upstream -->
|
||||||
|
[Amazon EBS General Purpose SSD volumes]: https://docs.aws.amazon.com/ebs/latest/userguide/general-purpose.html
|
||||||
[amazon ebs pricing]: https://aws.amazon.com/ebs/pricing/
|
[amazon ebs pricing]: https://aws.amazon.com/ebs/pricing/
|
||||||
[amazon ebs volume types]: https://docs.aws.amazon.com/ebs/latest/userguide/ebs-volume-types.html
|
[amazon ebs volume types]: https://docs.aws.amazon.com/ebs/latest/userguide/ebs-volume-types.html
|
||||||
[amazon ebs-optimized instance types]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html
|
[amazon ebs-optimized instance types]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html
|
||||||
[archive amazon ebs snapshots]: https://docs.aws.amazon.com/ebs/latest/userguide/snapshot-archive.html
|
[Archive Amazon EBS snapshots]: https://docs.aws.amazon.com/ebs/latest/userguide/snapshot-archive.html
|
||||||
[automate snapshot lifecycles]: https://docs.aws.amazon.com/ebs/latest/userguide/snapshot-ami-policy.html
|
[automate snapshot lifecycles]: https://docs.aws.amazon.com/ebs/latest/userguide/snapshot-ami-policy.html
|
||||||
[choose the best amazon ebs volume type for your self-managed database deployment]: https://aws.amazon.com/blogs/storage/how-to-choose-the-best-amazon-ebs-volume-type-for-your-self-managed-database-deployment/
|
[choose the best amazon ebs volume type for your self-managed database deployment]: https://aws.amazon.com/blogs/storage/how-to-choose-the-best-amazon-ebs-volume-type-for-your-self-managed-database-deployment/
|
||||||
[delete-volume]: https://docs.aws.amazon.com/cli/latest/reference/ec2/delete-volume.html
|
[delete-volume]: https://docs.aws.amazon.com/cli/latest/reference/ec2/delete-volume.html
|
||||||
|
|||||||
@@ -9,6 +9,7 @@
|
|||||||
1. [Lifecycle hooks](#lifecycle-hooks)
|
1. [Lifecycle hooks](#lifecycle-hooks)
|
||||||
1. [Image customization](#image-customization)
|
1. [Image customization](#image-customization)
|
||||||
1. [Automatic recovery](#automatic-recovery)
|
1. [Automatic recovery](#automatic-recovery)
|
||||||
|
1. [Cost-saving measures](#cost-saving-measures)
|
||||||
1. [Further readings](#further-readings)
|
1. [Further readings](#further-readings)
|
||||||
1. [Sources](#sources)
|
1. [Sources](#sources)
|
||||||
|
|
||||||
@@ -19,13 +20,15 @@ The API for EC2 are [**eventually** consistent][Eventual consistency in the Amaz
|
|||||||
EC2 instances are billed by the second, with a minimum of 60s,
|
EC2 instances are billed by the second, with a minimum of 60s,
|
||||||
[since 2017-10-02][announcing amazon ec2 per second billing].
|
[since 2017-10-02][announcing amazon ec2 per second billing].
|
||||||
|
|
||||||
Use an instance profile to allow an EC2 instance to use an IAM role.
|
Use an IAM Instance Profile to allow an EC2 instance to use an IAM role.
|
||||||
|
|
||||||
`T` instances launch as `unlimited` by default. Launch them in `standard` mode to avoid paying for surplus credits.
|
`T` instances launch as `unlimited` by default. Launch them in `standard` mode to avoid paying for surplus credits.
|
||||||
|
|
||||||
The instance type [_can_ be changed][change the instance type]. The procedure depends on the root volume, and **does**
|
The instance type [_can_ be changed][change the instance type]. The procedure depends on the root volume, and **does**
|
||||||
require downtime.
|
require downtime.
|
||||||
|
|
||||||
|
When using spot instances, prefer instrumenting the application to be aware of [termination notifications].
|
||||||
|
|
||||||
Clone EC2 instances by:
|
Clone EC2 instances by:
|
||||||
|
|
||||||
1. Creating an AMI from the original instance.
|
1. Creating an AMI from the original instance.
|
||||||
@@ -222,6 +225,25 @@ Refer [Image Builder].
|
|||||||
|
|
||||||
Also see [Automatic instance recovery].
|
Also see [Automatic instance recovery].
|
||||||
|
|
||||||
|
## Cost-saving measures
|
||||||
|
|
||||||
|
- Prefer using the most adequate instance type for the job.<br/>
|
||||||
|
E.g., prefer `r*` instances instead of `m*` ones where a lot of RAM is needed, but almost no CPU power is.
|
||||||
|
- Prefer using ARM-based (`g`) instances, unless a different architecture is required.
|
||||||
|
- Prefer _shared_ instances over _dedicated_ ones unless necessary.
|
||||||
|
Refer [Understanding AWS Tenancy Options].
|
||||||
|
- Prefer dedicated _instances_ over dedicated _hosts_ unless necessary.
|
||||||
|
Refer [Understanding AWS Tenancy Options].
|
||||||
|
- Prefer using [burstable (`t`) instances][burstable instances], unless steady performance is required and specially
|
||||||
|
for burstable workloads.
|
||||||
|
- When employing **underused** burstable instances, prefer re-launching them in `standard` mode to avoid paying for
|
||||||
|
surplus credits.
|
||||||
|
- Prefer using [spot instances] instead of on-demand ones where possible.
|
||||||
|
- Consider **stopping** or (even better) deleting non-production hosts after working hours.
|
||||||
|
- Consider applying for EC2 Instance and/or Compute Savings Plans.
|
||||||
|
- Consider [archiving snapshots] should they not be accessed for 90d or more.<br/>
|
||||||
|
Archiving has a 90d minimum storage fee, **and** archived resources have retrieval fees.
|
||||||
|
|
||||||
## Further readings
|
## Further readings
|
||||||
|
|
||||||
- [Amazon Web Services]
|
- [Amazon Web Services]
|
||||||
@@ -269,8 +291,12 @@ Also see [Automatic instance recovery].
|
|||||||
═╬═Time══
|
═╬═Time══
|
||||||
-->
|
-->
|
||||||
|
|
||||||
|
<!-- In-article sections -->
|
||||||
|
[burstable instances]: #burstable-instances
|
||||||
|
|
||||||
<!-- Knowledge base -->
|
<!-- Knowledge base -->
|
||||||
[amazon web services]: README.md
|
[amazon web services]: README.md
|
||||||
|
[archiving snapshots]: ebs.md#archiving
|
||||||
[cli]: cli.md
|
[cli]: cli.md
|
||||||
[ebs]: ebs.md
|
[ebs]: ebs.md
|
||||||
[image builder]: image%20builder.md
|
[image builder]: image%20builder.md
|
||||||
@@ -302,7 +328,9 @@ Also see [Automatic instance recovery].
|
|||||||
[Manually create or edit the CloudWatch agent configuration file]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html
|
[Manually create or edit the CloudWatch agent configuration file]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html
|
||||||
[recommended alarms]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Best_Practice_Recommended_Alarms_AWS_Services.html#EC2
|
[recommended alarms]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Best_Practice_Recommended_Alarms_AWS_Services.html#EC2
|
||||||
[retrieve instance metadata]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
|
[retrieve instance metadata]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
|
||||||
|
[Spot Instances]: https://aws.amazon.com/ec2/spot/
|
||||||
[standard mode for burstable performance instances]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-standard-mode.html
|
[standard mode for burstable performance instances]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-standard-mode.html
|
||||||
|
[termination notifications]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-instance-termination-notices.html
|
||||||
[unlimited mode for burstable performance instances]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-unlimited-mode.html
|
[unlimited mode for burstable performance instances]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-unlimited-mode.html
|
||||||
[using al2023 based amazon ecs amis to host containerized workloads]: https://docs.aws.amazon.com/linux/al2023/ug/ecs.html
|
[using al2023 based amazon ecs amis to host containerized workloads]: https://docs.aws.amazon.com/linux/al2023/ug/ecs.html
|
||||||
[using instance profiles]: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html
|
[using instance profiles]: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html
|
||||||
|
|||||||
@@ -36,6 +36,7 @@
|
|||||||
1. [Best practices](#best-practices)
|
1. [Best practices](#best-practices)
|
||||||
1. [Troubleshooting](#troubleshooting)
|
1. [Troubleshooting](#troubleshooting)
|
||||||
1. [Invalid 'cpu' setting for task](#invalid-cpu-setting-for-task)
|
1. [Invalid 'cpu' setting for task](#invalid-cpu-setting-for-task)
|
||||||
|
1. [Cost-saving measures](#cost-saving-measures)
|
||||||
1. [Further readings](#further-readings)
|
1. [Further readings](#further-readings)
|
||||||
1. [Sources](#sources)
|
1. [Sources](#sources)
|
||||||
|
|
||||||
@@ -133,6 +134,12 @@ curl -fs "http://$( \
|
|||||||
)" --query "tasks[].attachments[].details[?(name=='privateDnsName')].value" --output 'text' \
|
)" --query "tasks[].attachments[].details[?(name=='privateDnsName')].value" --output 'text' \
|
||||||
):8080"
|
):8080"
|
||||||
|
|
||||||
|
# Get the image of specific containers.
|
||||||
|
aws ecs list-tasks --cluster 'someCluster' --service-name 'someService' --query 'taskArns[0]' --output 'text' \
|
||||||
|
| xargs -oI '%%' \
|
||||||
|
aws ecs describe-tasks --cluster 'someCluster' --task '%%' \
|
||||||
|
--query 'tasks[].containers[?name==`someContainer`].image' --output 'text'
|
||||||
|
|
||||||
# Delete services.
|
# Delete services.
|
||||||
aws ecs delete-service --cluster 'testCluster' --service 'testService' --force
|
aws ecs delete-service --cluster 'testCluster' --service 'testService' --force
|
||||||
|
|
||||||
@@ -148,7 +155,8 @@ while [[ $(aws ecs list-tasks --query 'taskArns' --output 'text' --cluster 'test
|
|||||||
# Restart tasks.
|
# Restart tasks.
|
||||||
# No real way to do that, just stop the tasks and new ones will be eventually started in their place.
|
# No real way to do that, just stop the tasks and new ones will be eventually started in their place.
|
||||||
# To mimic a blue-green deployment, scale the service up by doubling its tasks, then down again to the normal amount.
|
# To mimic a blue-green deployment, scale the service up by doubling its tasks, then down again to the normal amount.
|
||||||
|
aws ecs update-service --cluster 'someCluster' --service 'someService' --desired-count '0' \
|
||||||
|
&& aws ecs update-service --cluster 'someCluster' --service 'someService' --desired-count '1'
|
||||||
```
|
```
|
||||||
|
|
||||||
</details>
|
</details>
|
||||||
@@ -1725,6 +1733,23 @@ Cost-saving measures:
|
|||||||
capacity provider.
|
capacity provider.
|
||||||
|
|
||||||
<details style='padding: 0 0 1rem 1rem'>
|
<details style='padding: 0 0 1rem 1rem'>
|
||||||
|
<summary> Percentage-like </summary>
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"capacityProvider": "FARGATE",
|
||||||
|
"weight": 5
|
||||||
|
}
|
||||||
|
{
|
||||||
|
"capacityProvider": "FARGATE_SPOT",
|
||||||
|
"weight": 95
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
</details>
|
||||||
|
|
||||||
|
<details style='padding: 0 0 1rem 1rem'>
|
||||||
|
<summary> Ratio-like </summary>
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
@@ -1775,6 +1800,13 @@ Specify a supported value for the task CPU and memory in your task definition.
|
|||||||
|
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
|
## Cost-saving measures
|
||||||
|
|
||||||
|
- Prefer using [spot capacity][effectively using spot instances in aws ecs for production workloads] for non-critical
|
||||||
|
services and tasks.
|
||||||
|
- Consider applying for EC2 Instance and/or Compute Savings Plans if using EC2 capacity.<br/>
|
||||||
|
Consider applying for Compute Savings Plans if using Fargate capacity.
|
||||||
|
|
||||||
## Further readings
|
## Further readings
|
||||||
|
|
||||||
- [Amazon Web Services]
|
- [Amazon Web Services]
|
||||||
@@ -1820,6 +1852,8 @@ Specify a supported value for the task CPU and memory in your task definition.
|
|||||||
- [Amazon ECS Service Discovery]
|
- [Amazon ECS Service Discovery]
|
||||||
- [AWS Fargate Pricing Explained]
|
- [AWS Fargate Pricing Explained]
|
||||||
- [The Ultimate Beginner's Guide to AWS ECS]
|
- [The Ultimate Beginner's Guide to AWS ECS]
|
||||||
|
- [Amazon Amazon ECS launch types and capacity providers]
|
||||||
|
- [Effectively Using Spot Instances in AWS ECS for Production Workloads]
|
||||||
|
|
||||||
<!--
|
<!--
|
||||||
Reference
|
Reference
|
||||||
@@ -1847,6 +1881,7 @@ Specify a supported value for the task CPU and memory in your task definition.
|
|||||||
[efs]: efs.md
|
[efs]: efs.md
|
||||||
|
|
||||||
<!-- Upstream -->
|
<!-- Upstream -->
|
||||||
|
[Amazon Amazon ECS launch types and capacity providers]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/capacity-launch-type-comparison.html
|
||||||
[Amazon ECS capacity providers for the EC2 launch type]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/asg-capacity-providers.html
|
[Amazon ECS capacity providers for the EC2 launch type]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/asg-capacity-providers.html
|
||||||
[Amazon ECS clusters for Fargate]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-capacity-providers.html
|
[Amazon ECS clusters for Fargate]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-capacity-providers.html
|
||||||
[Amazon ECS environment variables]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-environment-variables.html
|
[Amazon ECS environment variables]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-environment-variables.html
|
||||||
@@ -1865,6 +1900,7 @@ Specify a supported value for the task CPU and memory in your task definition.
|
|||||||
[AWS Fargate Spot Now Generally Available]: https://aws.amazon.com/blogs/aws/aws-fargate-spot-now-generally-available/
|
[AWS Fargate Spot Now Generally Available]: https://aws.amazon.com/blogs/aws/aws-fargate-spot-now-generally-available/
|
||||||
[Centralized Container Logging with Fluent Bit]: https://aws.amazon.com/blogs/opensource/centralized-container-logging-fluent-bit/
|
[Centralized Container Logging with Fluent Bit]: https://aws.amazon.com/blogs/opensource/centralized-container-logging-fluent-bit/
|
||||||
[ecs execute-command proposal]: https://github.com/aws/containers-roadmap/issues/1050
|
[ecs execute-command proposal]: https://github.com/aws/containers-roadmap/issues/1050
|
||||||
|
[Effectively Using Spot Instances in AWS ECS for Production Workloads]: https://medium.com/@ankur.ecb/effectively-using-spot-instances-in-aws-ecs-for-production-workloads-d46985d0ae2d
|
||||||
[EventBridge Scheduler]: https://docs.aws.amazon.com/scheduler/latest/UserGuide/what-is-scheduler.html
|
[EventBridge Scheduler]: https://docs.aws.amazon.com/scheduler/latest/UserGuide/what-is-scheduler.html
|
||||||
[Example Amazon ECS task definition: Route logs to FireLens]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/firelens-taskdef.html
|
[Example Amazon ECS task definition: Route logs to FireLens]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/firelens-taskdef.html
|
||||||
[fargate tasks sizes]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-tasks-services.html#fargate-tasks-size
|
[fargate tasks sizes]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-tasks-services.html#fargate-tasks-size
|
||||||
|
|||||||
@@ -26,6 +26,7 @@
|
|||||||
1. [Identify common issues](#identify-common-issues)
|
1. [Identify common issues](#identify-common-issues)
|
||||||
1. [The worker nodes fail to join the cluster](#the-worker-nodes-fail-to-join-the-cluster)
|
1. [The worker nodes fail to join the cluster](#the-worker-nodes-fail-to-join-the-cluster)
|
||||||
1. [AWS ELB controller fails to get the region from the host's metadata](#aws-elb-controller-fails-to-get-the-region-from-the-hosts-metadata)
|
1. [AWS ELB controller fails to get the region from the host's metadata](#aws-elb-controller-fails-to-get-the-region-from-the-hosts-metadata)
|
||||||
|
1. [Cost-saving measures](#cost-saving-measures)
|
||||||
1. [Further readings](#further-readings)
|
1. [Further readings](#further-readings)
|
||||||
1. [Sources](#sources)
|
1. [Sources](#sources)
|
||||||
|
|
||||||
@@ -1413,6 +1414,13 @@ helm upgrade -i --repo 'https://aws.github.io/eks-charts' \
|
|||||||
--set 'vpcId'='vpc-01234567'
|
--set 'vpcId'='vpc-01234567'
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Cost-saving measures
|
||||||
|
|
||||||
|
- Consider [using spot instances][building for cost optimization and resilience for eks with spot instances] for
|
||||||
|
non-critical workloads.
|
||||||
|
- Consider applying for EC2 Instance and/or Compute Savings Plans if using EC2 worker nodes.
|
||||||
|
Consider applying for Compute Savings Plans if using Fargate profiles.
|
||||||
|
|
||||||
## Further readings
|
## Further readings
|
||||||
|
|
||||||
- [Amazon Web Services]
|
- [Amazon Web Services]
|
||||||
@@ -1512,6 +1520,7 @@ helm upgrade -i --repo 'https://aws.github.io/eks-charts' \
|
|||||||
[aws eks create-nodegroup]: https://docs.aws.amazon.com/cli/latest/reference/eks/create-nodegroup.html
|
[aws eks create-nodegroup]: https://docs.aws.amazon.com/cli/latest/reference/eks/create-nodegroup.html
|
||||||
[AWS Node Termination Handler]: https://github.com/aws/aws-node-termination-handler
|
[AWS Node Termination Handler]: https://github.com/aws/aws-node-termination-handler
|
||||||
[awssupport-troubleshooteksworkernode runbook]: https://docs.aws.amazon.com/systems-manager-automation-runbooks/latest/userguide/automation-awssupport-troubleshooteksworkernode.html
|
[awssupport-troubleshooteksworkernode runbook]: https://docs.aws.amazon.com/systems-manager-automation-runbooks/latest/userguide/automation-awssupport-troubleshooteksworkernode.html
|
||||||
|
[Building for Cost optimization and Resilience for EKS with Spot Instances]: https://aws.amazon.com/blogs/compute/cost-optimization-and-resilience-eks-with-spot-instances/
|
||||||
[choosing an amazon ec2 instance type]: https://docs.aws.amazon.com/eks/latest/userguide/choosing-instance-type.html
|
[choosing an amazon ec2 instance type]: https://docs.aws.amazon.com/eks/latest/userguide/choosing-instance-type.html
|
||||||
[configure instance permissions required for systems manager]: https://docs.aws.amazon.com/systems-manager/latest/userguide/setup-instance-profile.html#instance-profile-policies-overview
|
[configure instance permissions required for systems manager]: https://docs.aws.amazon.com/systems-manager/latest/userguide/setup-instance-profile.html#instance-profile-policies-overview
|
||||||
[create an amazon ebs csi driver iam role]: https://docs.aws.amazon.com/eks/latest/userguide/csi-iam-role.html
|
[create an amazon ebs csi driver iam role]: https://docs.aws.amazon.com/eks/latest/userguide/csi-iam-role.html
|
||||||
|
|||||||
@@ -380,7 +380,7 @@ can manage.
|
|||||||
|
|
||||||
## Cost-saving measures
|
## Cost-saving measures
|
||||||
|
|
||||||
- Choose appropriate [instance types and sizes][supported instance types in amazon opensearch service].<br/>
|
- Choose _appropriate_ [instance types and sizes][supported instance types in amazon opensearch service].<br/>
|
||||||
Leverage the ability to select them to tailor the service offering to one's needs.
|
Leverage the ability to select them to tailor the service offering to one's needs.
|
||||||
|
|
||||||
> [OR1 instances][or1 storage for amazon opensearch service] **cannot** (currently?) be selected as master nodes.<br/>
|
> [OR1 instances][or1 storage for amazon opensearch service] **cannot** (currently?) be selected as master nodes.<br/>
|
||||||
|
|||||||
@@ -28,6 +28,7 @@
|
|||||||
1. [ERROR: must be superuser to alter _X_ roles or change _X_ attribute](#error-must-be-superuser-to-alter-x-roles-or-change-x-attribute)
|
1. [ERROR: must be superuser to alter _X_ roles or change _X_ attribute](#error-must-be-superuser-to-alter-x-roles-or-change-x-attribute)
|
||||||
1. [Transport fails asking for the remote user must have superuser, but it already does](#transport-fails-asking-for-the-remote-user-must-have-superuser-but-it-already-does)
|
1. [Transport fails asking for the remote user must have superuser, but it already does](#transport-fails-asking-for-the-remote-user-must-have-superuser-but-it-already-does)
|
||||||
1. [The instance is unbearably slow](#the-instance-is-unbearably-slow)
|
1. [The instance is unbearably slow](#the-instance-is-unbearably-slow)
|
||||||
|
1. [Cost-saving measures](#cost-saving-measures)
|
||||||
1. [Further readings](#further-readings)
|
1. [Further readings](#further-readings)
|
||||||
1. [Sources](#sources)
|
1. [Sources](#sources)
|
||||||
|
|
||||||
@@ -103,6 +104,10 @@ Maintenance windows are paused when their DB instances are stopped.
|
|||||||
# Show details of RDS instances.
|
# Show details of RDS instances.
|
||||||
aws rds describe-db-instances
|
aws rds describe-db-instances
|
||||||
aws rds describe-db-instances --output 'json' --query "DBInstances[?(DBInstanceIdentifier=='master-prod')]"
|
aws rds describe-db-instances --output 'json' --query "DBInstances[?(DBInstanceIdentifier=='master-prod')]"
|
||||||
|
aws rds describe-db-instances --db-instance-identifier 'some-db-instance' \
|
||||||
|
--query 'DBInstances[0].InstanceCreateTime' --output 'text'
|
||||||
|
aws rds describe-db-instances --db-instance-identifier 'some-db-instance' --output 'text' \
|
||||||
|
--query 'DBInstances[0]|join(``,[`postgresql://`,MasterUsername,`@`,Endpoint.Address,to_string(Endpoint.Port),`/`,DBname||`postgres`])'
|
||||||
|
|
||||||
# Enable Performance Insights.
|
# Enable Performance Insights.
|
||||||
aws rds modify-db-cluster --db-cluster-identifier 'staging-cluster' \
|
aws rds modify-db-cluster --db-cluster-identifier 'staging-cluster' \
|
||||||
@@ -1073,6 +1078,16 @@ or write workloads and exceeds the instance type quotas.
|
|||||||
|
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
|
## Cost-saving measures
|
||||||
|
|
||||||
|
- Choose _appropriate_ instance types and sizes.
|
||||||
|
- Prefer using [reserved instances][rds reserved instances] when one can stay on a single instance type for the whole
|
||||||
|
duration of the reservation.<br/>
|
||||||
|
Should the DB type **not** change in time, prefer _Standard RIs_. Otherwise, prefer _Convertible RIs_ for
|
||||||
|
flexibility.
|
||||||
|
|
||||||
|
RDS does **not** support Savings Plans at the time of writing.
|
||||||
|
|
||||||
## Further readings
|
## Further readings
|
||||||
|
|
||||||
- [Working with DB instance read replicas]
|
- [Working with DB instance read replicas]
|
||||||
@@ -1136,6 +1151,7 @@ or write workloads and exceeds the instance type quotas.
|
|||||||
[migrating databases using rds postgresql transportable databases]: https://aws.amazon.com/blogs/database/migrating-databases-using-rds-postgresql-transportable-databases/
|
[migrating databases using rds postgresql transportable databases]: https://aws.amazon.com/blogs/database/migrating-databases-using-rds-postgresql-transportable-databases/
|
||||||
[Multi-AZ DB instance deployments for Amazon RDS]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.MultiAZSingleStandby.html
|
[Multi-AZ DB instance deployments for Amazon RDS]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.MultiAZSingleStandby.html
|
||||||
[pricing and data retention for performance insights]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.Overview.cost.html
|
[pricing and data retention for performance insights]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.Overview.cost.html
|
||||||
|
[RDS reserved instances]: https://aws.amazon.com/rds/reserved-instances/
|
||||||
[Recommended alarms for RDS]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Best_Practice_Recommended_Alarms_AWS_Services.html#RDS
|
[Recommended alarms for RDS]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Best_Practice_Recommended_Alarms_AWS_Services.html#RDS
|
||||||
[renaming a db instance]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_RenameInstance.html
|
[renaming a db instance]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_RenameInstance.html
|
||||||
[Restoring a DB instance to a specified time for Amazon RDS]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIT.html
|
[Restoring a DB instance to a specified time for Amazon RDS]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIT.html
|
||||||
|
|||||||
@@ -3,6 +3,7 @@
|
|||||||
1. [TL;DR](#tldr)
|
1. [TL;DR](#tldr)
|
||||||
1. [Storage classes](#storage-classes)
|
1. [Storage classes](#storage-classes)
|
||||||
1. [Lifecycle configuration](#lifecycle-configuration)
|
1. [Lifecycle configuration](#lifecycle-configuration)
|
||||||
|
1. [Cost-saving measures](#cost-saving-measures)
|
||||||
1. [Further readings](#further-readings)
|
1. [Further readings](#further-readings)
|
||||||
1. [Sources](#sources)
|
1. [Sources](#sources)
|
||||||
|
|
||||||
@@ -215,6 +216,17 @@ actions. In such cases:
|
|||||||
|
|
||||||
Examples: [1][lifecycle configuration examples], [2][s3 lifecycle rules examples]
|
Examples: [1][lifecycle configuration examples], [2][s3 lifecycle rules examples]
|
||||||
|
|
||||||
|
## Cost-saving measures
|
||||||
|
|
||||||
|
- Prefer using lower storage classes for data that is not frequently accessed.<br/>
|
||||||
|
Lower storage classes have a minimum storage fee period **and** retrieval fees.
|
||||||
|
- Consider using [Lifecycle configuration] to move down in storage tier that data that is not frequently accessed after
|
||||||
|
some time.
|
||||||
|
- Prefer using [S3 Intelligent-Tiering][how s3 intelligent-tiering works] when not knowing how frequently data is
|
||||||
|
accessed.
|
||||||
|
- Consider expiring old data after some time, if its retention is not needed.
|
||||||
|
- Consider compressing data before uploading it.
|
||||||
|
|
||||||
## Further readings
|
## Further readings
|
||||||
|
|
||||||
- [Amazon Web Services]
|
- [Amazon Web Services]
|
||||||
@@ -245,6 +257,8 @@ Examples: [1][lifecycle configuration examples], [2][s3 lifecycle rules example
|
|||||||
-->
|
-->
|
||||||
|
|
||||||
<!-- In-article sections -->
|
<!-- In-article sections -->
|
||||||
|
[Lifecycle configuration]: #lifecycle-configuration
|
||||||
|
|
||||||
<!-- Knowledge base -->
|
<!-- Knowledge base -->
|
||||||
[amazon web services]: README.md
|
[amazon web services]: README.md
|
||||||
[cli]: cli.md
|
[cli]: cli.md
|
||||||
|
|||||||
@@ -1,6 +1,7 @@
|
|||||||
# Sagemaker
|
# Sagemaker
|
||||||
|
|
||||||
1. [TL;DR](#tldr)
|
1. [TL;DR](#tldr)
|
||||||
|
1. [Cost-saving measures](#cost-saving-measures)
|
||||||
1. [Further readings](#further-readings)
|
1. [Further readings](#further-readings)
|
||||||
1. [Sources](#sources)
|
1. [Sources](#sources)
|
||||||
|
|
||||||
@@ -11,6 +12,11 @@
|
|||||||
- Serverless Endpoints' backend use **a snapshot** of the Endpoint Configuration at the time each host is created.<br/>
|
- Serverless Endpoints' backend use **a snapshot** of the Endpoint Configuration at the time each host is created.<br/>
|
||||||
To make a serverless Endpoint use a new Configuration or Model, its hosts need to be replaced.
|
To make a serverless Endpoint use a new Configuration or Model, its hosts need to be replaced.
|
||||||
|
|
||||||
|
## Cost-saving measures
|
||||||
|
|
||||||
|
- Use a single endpoint for multiple models where it makes sense.
|
||||||
|
- Delete endpoints when they are not used anymore.
|
||||||
|
|
||||||
## Further readings
|
## Further readings
|
||||||
|
|
||||||
- [Amazon Web Services]
|
- [Amazon Web Services]
|
||||||
|
|||||||
@@ -52,6 +52,7 @@ Hosted by the [Cloud Native Computing Foundation][cncf].
|
|||||||
1. [Run a command just before a Pod stops](#run-a-command-just-before-a-pod-stops)
|
1. [Run a command just before a Pod stops](#run-a-command-just-before-a-pod-stops)
|
||||||
1. [Examples](#examples)
|
1. [Examples](#examples)
|
||||||
1. [Create an admission webhook](#create-an-admission-webhook)
|
1. [Create an admission webhook](#create-an-admission-webhook)
|
||||||
|
1. [Cost-saving measures](#cost-saving-measures)
|
||||||
1. [Further readings](#further-readings)
|
1. [Further readings](#further-readings)
|
||||||
1. [Sources](#sources)
|
1. [Sources](#sources)
|
||||||
|
|
||||||
@@ -1256,6 +1257,16 @@ you need:
|
|||||||
|
|
||||||
See the example's [README][create an admission webhook].
|
See the example's [README][create an admission webhook].
|
||||||
|
|
||||||
|
## Cost-saving measures
|
||||||
|
|
||||||
|
- Reconsider one's choices.<br/>
|
||||||
|
Does one really need a Kubernetes cluster? They introduce multiple redundancy, and have high complexity.<br/>
|
||||||
|
Consider the resources and maintenance efforts that will inevitably go into that.
|
||||||
|
- Consider leveraging autoscaling.<br/>
|
||||||
|
See [Horizontal Pod Autoscaling] and [KEDA] to scale Pods depending on metrics.<br/>
|
||||||
|
See [Node Autoscaling][node scaling] to scale Nodes depending on number of Pods, node features, or resource
|
||||||
|
consumption.
|
||||||
|
|
||||||
## Further readings
|
## Further readings
|
||||||
|
|
||||||
Usage:
|
Usage:
|
||||||
@@ -1350,6 +1361,7 @@ Others:
|
|||||||
|
|
||||||
<!-- In-article sections -->
|
<!-- In-article sections -->
|
||||||
[horizontal pod autoscaler]: #horizontal-pod-autoscaler
|
[horizontal pod autoscaler]: #horizontal-pod-autoscaler
|
||||||
|
[node scaling]: #node-scaling
|
||||||
[vertical pod autoscaler]: #vertical-pod-autoscaler
|
[vertical pod autoscaler]: #vertical-pod-autoscaler
|
||||||
[pods]: #pods
|
[pods]: #pods
|
||||||
[privileged container vs privilege escalation]: #privileged-container-vs-privilege-escalation
|
[privileged container vs privilege escalation]: #privileged-container-vs-privilege-escalation
|
||||||
|
|||||||
Reference in New Issue
Block a user