diff --git a/knowledge base/cloud computing/aws/cloudwatch.md b/knowledge base/cloud computing/aws/cloudwatch.md
index 30fcf6d..c14a188 100644
--- a/knowledge base/cloud computing/aws/cloudwatch.md
+++ b/knowledge base/cloud computing/aws/cloudwatch.md
@@ -5,6 +5,7 @@ Observability service. with functions for logging, monitoring and alerting.
1. [TL;DR](#tldr)
1. [Queries of interest](#queries-of-interest)
1. [Stream logs](#stream-logs)
+1. [Cost-saving measures](#cost-saving-measures)
1. [Further readings](#further-readings)
1. [Sources](#sources)
@@ -31,6 +32,19 @@ The [CloudWatch console] offers some default good queries.
Logs in Log Groups can be [streamed][stream logs] elsewhere.
+CloudWatch retains metrics' data as follows:
+
+- Data points with a period of less than 60 seconds are available for 3 hours.
+ These are high-resolution custom metrics.
+- Data points with a period of 60 seconds (1 minute) are available for 15 days.
+- Data points with a period of 300 seconds (5 minutes) are available for 63 days.
+- Data points with a period of 3600 seconds (1 hour) are available for 455 days (15 months).
+
+Data points are aggregated together for long-term storage after the initial period.
+E.g., data using a period of 1 minute remains available for 15 days with 1-minute resolution, then it is aggregated and
+made available with a resolution of 5 minutes; after 63 days, it is further aggregated and made available with a
+resolution of 1 hour for 15 months.
+
CLI commands
@@ -101,6 +115,12 @@ Also refer [Streaming CloudWatch Logs data to Amazon OpenSearch Service] to stre
Logs in CloudWatch Log Groups can be streamed [Kinesis], [Firehose] or [Lambda] by leveraging Logs subscriptions.
+## Cost-saving measures
+
+- Configure an _appropriate_ log retention period for any log groups.
+ Log groups containing development logs should not usually need more than 1w worth.
+- When in doubt, still configure a default, long log retention period for all log groups (10y?).
+
## Further readings
- [Website]
@@ -113,6 +133,7 @@ Logs in CloudWatch Log Groups can be streamed [Kinesis], [Firehose] or [Lambda]
- [Real-time processing of log data with subscriptions]
- [Streaming CloudWatch Logs data to Amazon OpenSearch Service]
- [Which log group is causing a sudden increase in my CloudWatch Logs bill?]
+- [Metrics concepts]
+
+[archiving]: #archiving
+
[amazon web services]: README.md
[cli]: cli.md
[ec2]: ec2.md
+[Amazon EBS General Purpose SSD volumes]: https://docs.aws.amazon.com/ebs/latest/userguide/general-purpose.html
[amazon ebs pricing]: https://aws.amazon.com/ebs/pricing/
[amazon ebs volume types]: https://docs.aws.amazon.com/ebs/latest/userguide/ebs-volume-types.html
[amazon ebs-optimized instance types]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html
-[archive amazon ebs snapshots]: https://docs.aws.amazon.com/ebs/latest/userguide/snapshot-archive.html
+[Archive Amazon EBS snapshots]: https://docs.aws.amazon.com/ebs/latest/userguide/snapshot-archive.html
[automate snapshot lifecycles]: https://docs.aws.amazon.com/ebs/latest/userguide/snapshot-ami-policy.html
[choose the best amazon ebs volume type for your self-managed database deployment]: https://aws.amazon.com/blogs/storage/how-to-choose-the-best-amazon-ebs-volume-type-for-your-self-managed-database-deployment/
[delete-volume]: https://docs.aws.amazon.com/cli/latest/reference/ec2/delete-volume.html
diff --git a/knowledge base/cloud computing/aws/ec2.md b/knowledge base/cloud computing/aws/ec2.md
index 06d6e54..14f8f4b 100644
--- a/knowledge base/cloud computing/aws/ec2.md
+++ b/knowledge base/cloud computing/aws/ec2.md
@@ -9,6 +9,7 @@
1. [Lifecycle hooks](#lifecycle-hooks)
1. [Image customization](#image-customization)
1. [Automatic recovery](#automatic-recovery)
+1. [Cost-saving measures](#cost-saving-measures)
1. [Further readings](#further-readings)
1. [Sources](#sources)
@@ -19,13 +20,15 @@ The API for EC2 are [**eventually** consistent][Eventual consistency in the Amaz
EC2 instances are billed by the second, with a minimum of 60s,
[since 2017-10-02][announcing amazon ec2 per second billing].
-Use an instance profile to allow an EC2 instance to use an IAM role.
+Use an IAM Instance Profile to allow an EC2 instance to use an IAM role.
`T` instances launch as `unlimited` by default. Launch them in `standard` mode to avoid paying for surplus credits.
The instance type [_can_ be changed][change the instance type]. The procedure depends on the root volume, and **does**
require downtime.
+When using spot instances, prefer instrumenting the application to be aware of [termination notifications].
+
Clone EC2 instances by:
1. Creating an AMI from the original instance.
@@ -222,6 +225,25 @@ Refer [Image Builder].
Also see [Automatic instance recovery].
+## Cost-saving measures
+
+- Prefer using the most adequate instance type for the job.
+ E.g., prefer `r*` instances instead of `m*` ones where a lot of RAM is needed, but almost no CPU power is.
+- Prefer using ARM-based (`g`) instances, unless a different architecture is required.
+- Prefer _shared_ instances over _dedicated_ ones unless necessary.
+ Refer [Understanding AWS Tenancy Options].
+- Prefer dedicated _instances_ over dedicated _hosts_ unless necessary.
+ Refer [Understanding AWS Tenancy Options].
+- Prefer using [burstable (`t`) instances][burstable instances], unless steady performance is required and specially
+ for burstable workloads.
+- When employing **underused** burstable instances, prefer re-launching them in `standard` mode to avoid paying for
+ surplus credits.
+- Prefer using [spot instances] instead of on-demand ones where possible.
+- Consider **stopping** or (even better) deleting non-production hosts after working hours.
+- Consider applying for EC2 Instance and/or Compute Savings Plans.
+- Consider [archiving snapshots] should they not be accessed for 90d or more.
+ Archiving has a 90d minimum storage fee, **and** archived resources have retrieval fees.
+
## Further readings
- [Amazon Web Services]
@@ -269,8 +291,12 @@ Also see [Automatic instance recovery].
═╬═Time══
-->
+
+[burstable instances]: #burstable-instances
+
[amazon web services]: README.md
+[archiving snapshots]: ebs.md#archiving
[cli]: cli.md
[ebs]: ebs.md
[image builder]: image%20builder.md
@@ -302,7 +328,9 @@ Also see [Automatic instance recovery].
[Manually create or edit the CloudWatch agent configuration file]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html
[recommended alarms]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Best_Practice_Recommended_Alarms_AWS_Services.html#EC2
[retrieve instance metadata]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
+[Spot Instances]: https://aws.amazon.com/ec2/spot/
[standard mode for burstable performance instances]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-standard-mode.html
+[termination notifications]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-instance-termination-notices.html
[unlimited mode for burstable performance instances]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-unlimited-mode.html
[using al2023 based amazon ecs amis to host containerized workloads]: https://docs.aws.amazon.com/linux/al2023/ug/ecs.html
[using instance profiles]: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html
diff --git a/knowledge base/cloud computing/aws/ecs.md b/knowledge base/cloud computing/aws/ecs.md
index 638854b..c9b068e 100644
--- a/knowledge base/cloud computing/aws/ecs.md
+++ b/knowledge base/cloud computing/aws/ecs.md
@@ -36,6 +36,7 @@
1. [Best practices](#best-practices)
1. [Troubleshooting](#troubleshooting)
1. [Invalid 'cpu' setting for task](#invalid-cpu-setting-for-task)
+1. [Cost-saving measures](#cost-saving-measures)
1. [Further readings](#further-readings)
1. [Sources](#sources)
@@ -133,6 +134,12 @@ curl -fs "http://$( \
)" --query "tasks[].attachments[].details[?(name=='privateDnsName')].value" --output 'text' \
):8080"
+# Get the image of specific containers.
+aws ecs list-tasks --cluster 'someCluster' --service-name 'someService' --query 'taskArns[0]' --output 'text' \
+| xargs -oI '%%' \
+ aws ecs describe-tasks --cluster 'someCluster' --task '%%' \
+ --query 'tasks[].containers[?name==`someContainer`].image' --output 'text'
+
# Delete services.
aws ecs delete-service --cluster 'testCluster' --service 'testService' --force
@@ -148,7 +155,8 @@ while [[ $(aws ecs list-tasks --query 'taskArns' --output 'text' --cluster 'test
# Restart tasks.
# No real way to do that, just stop the tasks and new ones will be eventually started in their place.
# To mimic a blue-green deployment, scale the service up by doubling its tasks, then down again to the normal amount.
-
+aws ecs update-service --cluster 'someCluster' --service 'someService' --desired-count '0' \
+&& aws ecs update-service --cluster 'someCluster' --service 'someService' --desired-count '1'
```
@@ -1725,6 +1733,23 @@ Cost-saving measures:
capacity provider.
+ Percentage-like
+
+ ```json
+ {
+ "capacityProvider": "FARGATE",
+ "weight": 5
+ }
+ {
+ "capacityProvider": "FARGATE_SPOT",
+ "weight": 95
+ }
+ ```
+
+
+
+
+ Ratio-like
```json
{
@@ -1775,6 +1800,13 @@ Specify a supported value for the task CPU and memory in your task definition.
+## Cost-saving measures
+
+- Prefer using [spot capacity][effectively using spot instances in aws ecs for production workloads] for non-critical
+ services and tasks.
+- Consider applying for EC2 Instance and/or Compute Savings Plans if using EC2 capacity.
+ Consider applying for Compute Savings Plans if using Fargate capacity.
+
## Further readings
- [Amazon Web Services]
@@ -1820,6 +1852,8 @@ Specify a supported value for the task CPU and memory in your task definition.
- [Amazon ECS Service Discovery]
- [AWS Fargate Pricing Explained]
- [The Ultimate Beginner's Guide to AWS ECS]
+- [Amazon Amazon ECS launch types and capacity providers]
+- [Effectively Using Spot Instances in AWS ECS for Production Workloads]
+[Amazon Amazon ECS launch types and capacity providers]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/capacity-launch-type-comparison.html
[Amazon ECS capacity providers for the EC2 launch type]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/asg-capacity-providers.html
[Amazon ECS clusters for Fargate]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-capacity-providers.html
[Amazon ECS environment variables]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-environment-variables.html
@@ -1865,6 +1900,7 @@ Specify a supported value for the task CPU and memory in your task definition.
[AWS Fargate Spot Now Generally Available]: https://aws.amazon.com/blogs/aws/aws-fargate-spot-now-generally-available/
[Centralized Container Logging with Fluent Bit]: https://aws.amazon.com/blogs/opensource/centralized-container-logging-fluent-bit/
[ecs execute-command proposal]: https://github.com/aws/containers-roadmap/issues/1050
+[Effectively Using Spot Instances in AWS ECS for Production Workloads]: https://medium.com/@ankur.ecb/effectively-using-spot-instances-in-aws-ecs-for-production-workloads-d46985d0ae2d
[EventBridge Scheduler]: https://docs.aws.amazon.com/scheduler/latest/UserGuide/what-is-scheduler.html
[Example Amazon ECS task definition: Route logs to FireLens]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/firelens-taskdef.html
[fargate tasks sizes]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-tasks-services.html#fargate-tasks-size
diff --git a/knowledge base/cloud computing/aws/eks.md b/knowledge base/cloud computing/aws/eks.md
index 3849b2b..3577555 100644
--- a/knowledge base/cloud computing/aws/eks.md
+++ b/knowledge base/cloud computing/aws/eks.md
@@ -26,6 +26,7 @@
1. [Identify common issues](#identify-common-issues)
1. [The worker nodes fail to join the cluster](#the-worker-nodes-fail-to-join-the-cluster)
1. [AWS ELB controller fails to get the region from the host's metadata](#aws-elb-controller-fails-to-get-the-region-from-the-hosts-metadata)
+1. [Cost-saving measures](#cost-saving-measures)
1. [Further readings](#further-readings)
1. [Sources](#sources)
@@ -1413,6 +1414,13 @@ helm upgrade -i --repo 'https://aws.github.io/eks-charts' \
--set 'vpcId'='vpc-01234567'
```
+## Cost-saving measures
+
+- Consider [using spot instances][building for cost optimization and resilience for eks with spot instances] for
+ non-critical workloads.
+- Consider applying for EC2 Instance and/or Compute Savings Plans if using EC2 worker nodes.
+ Consider applying for Compute Savings Plans if using Fargate profiles.
+
## Further readings
- [Amazon Web Services]
@@ -1512,6 +1520,7 @@ helm upgrade -i --repo 'https://aws.github.io/eks-charts' \
[aws eks create-nodegroup]: https://docs.aws.amazon.com/cli/latest/reference/eks/create-nodegroup.html
[AWS Node Termination Handler]: https://github.com/aws/aws-node-termination-handler
[awssupport-troubleshooteksworkernode runbook]: https://docs.aws.amazon.com/systems-manager-automation-runbooks/latest/userguide/automation-awssupport-troubleshooteksworkernode.html
+[Building for Cost optimization and Resilience for EKS with Spot Instances]: https://aws.amazon.com/blogs/compute/cost-optimization-and-resilience-eks-with-spot-instances/
[choosing an amazon ec2 instance type]: https://docs.aws.amazon.com/eks/latest/userguide/choosing-instance-type.html
[configure instance permissions required for systems manager]: https://docs.aws.amazon.com/systems-manager/latest/userguide/setup-instance-profile.html#instance-profile-policies-overview
[create an amazon ebs csi driver iam role]: https://docs.aws.amazon.com/eks/latest/userguide/csi-iam-role.html
diff --git a/knowledge base/cloud computing/aws/opensearch.md b/knowledge base/cloud computing/aws/opensearch.md
index d379b13..f79e726 100644
--- a/knowledge base/cloud computing/aws/opensearch.md
+++ b/knowledge base/cloud computing/aws/opensearch.md
@@ -380,7 +380,7 @@ can manage.
## Cost-saving measures
-- Choose appropriate [instance types and sizes][supported instance types in amazon opensearch service].
+- Choose _appropriate_ [instance types and sizes][supported instance types in amazon opensearch service].
Leverage the ability to select them to tailor the service offering to one's needs.
> [OR1 instances][or1 storage for amazon opensearch service] **cannot** (currently?) be selected as master nodes.
diff --git a/knowledge base/cloud computing/aws/rds.md b/knowledge base/cloud computing/aws/rds.md
index 49392bd..bf3c327 100644
--- a/knowledge base/cloud computing/aws/rds.md
+++ b/knowledge base/cloud computing/aws/rds.md
@@ -28,6 +28,7 @@
1. [ERROR: must be superuser to alter _X_ roles or change _X_ attribute](#error-must-be-superuser-to-alter-x-roles-or-change-x-attribute)
1. [Transport fails asking for the remote user must have superuser, but it already does](#transport-fails-asking-for-the-remote-user-must-have-superuser-but-it-already-does)
1. [The instance is unbearably slow](#the-instance-is-unbearably-slow)
+1. [Cost-saving measures](#cost-saving-measures)
1. [Further readings](#further-readings)
1. [Sources](#sources)
@@ -103,6 +104,10 @@ Maintenance windows are paused when their DB instances are stopped.
# Show details of RDS instances.
aws rds describe-db-instances
aws rds describe-db-instances --output 'json' --query "DBInstances[?(DBInstanceIdentifier=='master-prod')]"
+aws rds describe-db-instances --db-instance-identifier 'some-db-instance' \
+ --query 'DBInstances[0].InstanceCreateTime' --output 'text'
+aws rds describe-db-instances --db-instance-identifier 'some-db-instance' --output 'text' \
+ --query 'DBInstances[0]|join(``,[`postgresql://`,MasterUsername,`@`,Endpoint.Address,to_string(Endpoint.Port),`/`,DBname||`postgres`])'
# Enable Performance Insights.
aws rds modify-db-cluster --db-cluster-identifier 'staging-cluster' \
@@ -1073,6 +1078,16 @@ or write workloads and exceeds the instance type quotas.
+## Cost-saving measures
+
+- Choose _appropriate_ instance types and sizes.
+- Prefer using [reserved instances][rds reserved instances] when one can stay on a single instance type for the whole
+ duration of the reservation.
+ Should the DB type **not** change in time, prefer _Standard RIs_. Otherwise, prefer _Convertible RIs_ for
+ flexibility.
+
+ RDS does **not** support Savings Plans at the time of writing.
+
## Further readings
- [Working with DB instance read replicas]
@@ -1136,6 +1151,7 @@ or write workloads and exceeds the instance type quotas.
[migrating databases using rds postgresql transportable databases]: https://aws.amazon.com/blogs/database/migrating-databases-using-rds-postgresql-transportable-databases/
[Multi-AZ DB instance deployments for Amazon RDS]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.MultiAZSingleStandby.html
[pricing and data retention for performance insights]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.Overview.cost.html
+[RDS reserved instances]: https://aws.amazon.com/rds/reserved-instances/
[Recommended alarms for RDS]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Best_Practice_Recommended_Alarms_AWS_Services.html#RDS
[renaming a db instance]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_RenameInstance.html
[Restoring a DB instance to a specified time for Amazon RDS]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIT.html
diff --git a/knowledge base/cloud computing/aws/s3.md b/knowledge base/cloud computing/aws/s3.md
index e64bbf4..5e2b75a 100644
--- a/knowledge base/cloud computing/aws/s3.md
+++ b/knowledge base/cloud computing/aws/s3.md
@@ -3,6 +3,7 @@
1. [TL;DR](#tldr)
1. [Storage classes](#storage-classes)
1. [Lifecycle configuration](#lifecycle-configuration)
+1. [Cost-saving measures](#cost-saving-measures)
1. [Further readings](#further-readings)
1. [Sources](#sources)
@@ -215,6 +216,17 @@ actions. In such cases:
Examples: [1][lifecycle configuration examples], [2][s3 lifecycle rules examples]
+## Cost-saving measures
+
+- Prefer using lower storage classes for data that is not frequently accessed.
+ Lower storage classes have a minimum storage fee period **and** retrieval fees.
+- Consider using [Lifecycle configuration] to move down in storage tier that data that is not frequently accessed after
+ some time.
+- Prefer using [S3 Intelligent-Tiering][how s3 intelligent-tiering works] when not knowing how frequently data is
+ accessed.
+- Consider expiring old data after some time, if its retention is not needed.
+- Consider compressing data before uploading it.
+
## Further readings
- [Amazon Web Services]
@@ -245,6 +257,8 @@ Examples: [1][lifecycle configuration examples], [2][s3 lifecycle rules example
-->
+[Lifecycle configuration]: #lifecycle-configuration
+
[amazon web services]: README.md
[cli]: cli.md
diff --git a/knowledge base/cloud computing/aws/sagemaker.md b/knowledge base/cloud computing/aws/sagemaker.md
index b0c86f0..8e91e34 100644
--- a/knowledge base/cloud computing/aws/sagemaker.md
+++ b/knowledge base/cloud computing/aws/sagemaker.md
@@ -1,6 +1,7 @@
# Sagemaker
1. [TL;DR](#tldr)
+1. [Cost-saving measures](#cost-saving-measures)
1. [Further readings](#further-readings)
1. [Sources](#sources)
@@ -11,6 +12,11 @@
- Serverless Endpoints' backend use **a snapshot** of the Endpoint Configuration at the time each host is created.
To make a serverless Endpoint use a new Configuration or Model, its hosts need to be replaced.
+## Cost-saving measures
+
+- Use a single endpoint for multiple models where it makes sense.
+- Delete endpoints when they are not used anymore.
+
## Further readings
- [Amazon Web Services]
diff --git a/knowledge base/kubernetes/README.md b/knowledge base/kubernetes/README.md
index 3facc14..93f5377 100644
--- a/knowledge base/kubernetes/README.md
+++ b/knowledge base/kubernetes/README.md
@@ -52,6 +52,7 @@ Hosted by the [Cloud Native Computing Foundation][cncf].
1. [Run a command just before a Pod stops](#run-a-command-just-before-a-pod-stops)
1. [Examples](#examples)
1. [Create an admission webhook](#create-an-admission-webhook)
+1. [Cost-saving measures](#cost-saving-measures)
1. [Further readings](#further-readings)
1. [Sources](#sources)
@@ -1256,6 +1257,16 @@ you need:
See the example's [README][create an admission webhook].
+## Cost-saving measures
+
+- Reconsider one's choices.
+ Does one really need a Kubernetes cluster? They introduce multiple redundancy, and have high complexity.
+ Consider the resources and maintenance efforts that will inevitably go into that.
+- Consider leveraging autoscaling.
+ See [Horizontal Pod Autoscaling] and [KEDA] to scale Pods depending on metrics.
+ See [Node Autoscaling][node scaling] to scale Nodes depending on number of Pods, node features, or resource
+ consumption.
+
## Further readings
Usage:
@@ -1350,6 +1361,7 @@ Others:
[horizontal pod autoscaler]: #horizontal-pod-autoscaler
+[node scaling]: #node-scaling
[vertical pod autoscaler]: #vertical-pod-autoscaler
[pods]: #pods
[privileged container vs privilege escalation]: #privileged-container-vs-privilege-escalation