diff --git a/knowledge base/cloud computing/aws/cloudwatch.md b/knowledge base/cloud computing/aws/cloudwatch.md index 30fcf6d..c14a188 100644 --- a/knowledge base/cloud computing/aws/cloudwatch.md +++ b/knowledge base/cloud computing/aws/cloudwatch.md @@ -5,6 +5,7 @@ Observability service. with functions for logging, monitoring and alerting. 1. [TL;DR](#tldr) 1. [Queries of interest](#queries-of-interest) 1. [Stream logs](#stream-logs) +1. [Cost-saving measures](#cost-saving-measures) 1. [Further readings](#further-readings) 1. [Sources](#sources) @@ -31,6 +32,19 @@ The [CloudWatch console] offers some default good queries. Logs in Log Groups can be [streamed][stream logs] elsewhere. +CloudWatch retains metrics' data as follows: + +- Data points with a period of less than 60 seconds are available for 3 hours.
+ These are high-resolution custom metrics. +- Data points with a period of 60 seconds (1 minute) are available for 15 days. +- Data points with a period of 300 seconds (5 minutes) are available for 63 days. +- Data points with a period of 3600 seconds (1 hour) are available for 455 days (15 months). + +Data points are aggregated together for long-term storage after the initial period.
+E.g., data using a period of 1 minute remains available for 15 days with 1-minute resolution, then it is aggregated and +made available with a resolution of 5 minutes; after 63 days, it is further aggregated and made available with a +resolution of 1 hour for 15 months. +
CLI commands @@ -101,6 +115,12 @@ Also refer [Streaming CloudWatch Logs data to Amazon OpenSearch Service] to stre Logs in CloudWatch Log Groups can be streamed [Kinesis], [Firehose] or [Lambda] by leveraging Logs subscriptions. +## Cost-saving measures + +- Configure an _appropriate_ log retention period for any log groups.
+ Log groups containing development logs should not usually need more than 1w worth. +- When in doubt, still configure a default, long log retention period for all log groups (10y?). + ## Further readings - [Website] @@ -113,6 +133,7 @@ Logs in CloudWatch Log Groups can be streamed [Kinesis], [Firehose] or [Lambda] - [Real-time processing of log data with subscriptions] - [Streaming CloudWatch Logs data to Amazon OpenSearch Service] - [Which log group is causing a sudden increase in my CloudWatch Logs bill?] +- [Metrics concepts] + +[archiving]: #archiving + [amazon web services]: README.md [cli]: cli.md [ec2]: ec2.md +[Amazon EBS General Purpose SSD volumes]: https://docs.aws.amazon.com/ebs/latest/userguide/general-purpose.html [amazon ebs pricing]: https://aws.amazon.com/ebs/pricing/ [amazon ebs volume types]: https://docs.aws.amazon.com/ebs/latest/userguide/ebs-volume-types.html [amazon ebs-optimized instance types]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html -[archive amazon ebs snapshots]: https://docs.aws.amazon.com/ebs/latest/userguide/snapshot-archive.html +[Archive Amazon EBS snapshots]: https://docs.aws.amazon.com/ebs/latest/userguide/snapshot-archive.html [automate snapshot lifecycles]: https://docs.aws.amazon.com/ebs/latest/userguide/snapshot-ami-policy.html [choose the best amazon ebs volume type for your self-managed database deployment]: https://aws.amazon.com/blogs/storage/how-to-choose-the-best-amazon-ebs-volume-type-for-your-self-managed-database-deployment/ [delete-volume]: https://docs.aws.amazon.com/cli/latest/reference/ec2/delete-volume.html diff --git a/knowledge base/cloud computing/aws/ec2.md b/knowledge base/cloud computing/aws/ec2.md index 06d6e54..14f8f4b 100644 --- a/knowledge base/cloud computing/aws/ec2.md +++ b/knowledge base/cloud computing/aws/ec2.md @@ -9,6 +9,7 @@ 1. [Lifecycle hooks](#lifecycle-hooks) 1. [Image customization](#image-customization) 1. [Automatic recovery](#automatic-recovery) +1. [Cost-saving measures](#cost-saving-measures) 1. [Further readings](#further-readings) 1. [Sources](#sources) @@ -19,13 +20,15 @@ The API for EC2 are [**eventually** consistent][Eventual consistency in the Amaz EC2 instances are billed by the second, with a minimum of 60s, [since 2017-10-02][announcing amazon ec2 per second billing]. -Use an instance profile to allow an EC2 instance to use an IAM role. +Use an IAM Instance Profile to allow an EC2 instance to use an IAM role. `T` instances launch as `unlimited` by default. Launch them in `standard` mode to avoid paying for surplus credits. The instance type [_can_ be changed][change the instance type]. The procedure depends on the root volume, and **does** require downtime. +When using spot instances, prefer instrumenting the application to be aware of [termination notifications]. + Clone EC2 instances by: 1. Creating an AMI from the original instance. @@ -222,6 +225,25 @@ Refer [Image Builder]. Also see [Automatic instance recovery]. +## Cost-saving measures + +- Prefer using the most adequate instance type for the job.
+ E.g., prefer `r*` instances instead of `m*` ones where a lot of RAM is needed, but almost no CPU power is. +- Prefer using ARM-based (`g`) instances, unless a different architecture is required. +- Prefer _shared_ instances over _dedicated_ ones unless necessary. + Refer [Understanding AWS Tenancy Options]. +- Prefer dedicated _instances_ over dedicated _hosts_ unless necessary. + Refer [Understanding AWS Tenancy Options]. +- Prefer using [burstable (`t`) instances][burstable instances], unless steady performance is required and specially + for burstable workloads. +- When employing **underused** burstable instances, prefer re-launching them in `standard` mode to avoid paying for + surplus credits. +- Prefer using [spot instances] instead of on-demand ones where possible. +- Consider **stopping** or (even better) deleting non-production hosts after working hours. +- Consider applying for EC2 Instance and/or Compute Savings Plans. +- Consider [archiving snapshots] should they not be accessed for 90d or more.
+ Archiving has a 90d minimum storage fee, **and** archived resources have retrieval fees. + ## Further readings - [Amazon Web Services] @@ -269,8 +291,12 @@ Also see [Automatic instance recovery]. ═╬═Time══ --> + +[burstable instances]: #burstable-instances + [amazon web services]: README.md +[archiving snapshots]: ebs.md#archiving [cli]: cli.md [ebs]: ebs.md [image builder]: image%20builder.md @@ -302,7 +328,9 @@ Also see [Automatic instance recovery]. [Manually create or edit the CloudWatch agent configuration file]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html [recommended alarms]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Best_Practice_Recommended_Alarms_AWS_Services.html#EC2 [retrieve instance metadata]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html +[Spot Instances]: https://aws.amazon.com/ec2/spot/ [standard mode for burstable performance instances]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-standard-mode.html +[termination notifications]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-instance-termination-notices.html [unlimited mode for burstable performance instances]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-unlimited-mode.html [using al2023 based amazon ecs amis to host containerized workloads]: https://docs.aws.amazon.com/linux/al2023/ug/ecs.html [using instance profiles]: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html diff --git a/knowledge base/cloud computing/aws/ecs.md b/knowledge base/cloud computing/aws/ecs.md index 638854b..c9b068e 100644 --- a/knowledge base/cloud computing/aws/ecs.md +++ b/knowledge base/cloud computing/aws/ecs.md @@ -36,6 +36,7 @@ 1. [Best practices](#best-practices) 1. [Troubleshooting](#troubleshooting) 1. [Invalid 'cpu' setting for task](#invalid-cpu-setting-for-task) +1. [Cost-saving measures](#cost-saving-measures) 1. [Further readings](#further-readings) 1. [Sources](#sources) @@ -133,6 +134,12 @@ curl -fs "http://$( \ )" --query "tasks[].attachments[].details[?(name=='privateDnsName')].value" --output 'text' \ ):8080" +# Get the image of specific containers. +aws ecs list-tasks --cluster 'someCluster' --service-name 'someService' --query 'taskArns[0]' --output 'text' \ +| xargs -oI '%%' \ + aws ecs describe-tasks --cluster 'someCluster' --task '%%' \ + --query 'tasks[].containers[?name==`someContainer`].image' --output 'text' + # Delete services. aws ecs delete-service --cluster 'testCluster' --service 'testService' --force @@ -148,7 +155,8 @@ while [[ $(aws ecs list-tasks --query 'taskArns' --output 'text' --cluster 'test # Restart tasks. # No real way to do that, just stop the tasks and new ones will be eventually started in their place. # To mimic a blue-green deployment, scale the service up by doubling its tasks, then down again to the normal amount. - +aws ecs update-service --cluster 'someCluster' --service 'someService' --desired-count '0' \ +&& aws ecs update-service --cluster 'someCluster' --service 'someService' --desired-count '1' ```
@@ -1725,6 +1733,23 @@ Cost-saving measures: capacity provider.
+ Percentage-like + + ```json + { + "capacityProvider": "FARGATE", + "weight": 5 + } + { + "capacityProvider": "FARGATE_SPOT", + "weight": 95 + } + ``` + +
+ +
+ Ratio-like ```json { @@ -1775,6 +1800,13 @@ Specify a supported value for the task CPU and memory in your task definition.
+## Cost-saving measures + +- Prefer using [spot capacity][effectively using spot instances in aws ecs for production workloads] for non-critical + services and tasks. +- Consider applying for EC2 Instance and/or Compute Savings Plans if using EC2 capacity.
+ Consider applying for Compute Savings Plans if using Fargate capacity. + ## Further readings - [Amazon Web Services] @@ -1820,6 +1852,8 @@ Specify a supported value for the task CPU and memory in your task definition. - [Amazon ECS Service Discovery] - [AWS Fargate Pricing Explained] - [The Ultimate Beginner's Guide to AWS ECS] +- [Amazon Amazon ECS launch types and capacity providers] +- [Effectively Using Spot Instances in AWS ECS for Production Workloads] +[Amazon Amazon ECS launch types and capacity providers]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/capacity-launch-type-comparison.html [Amazon ECS capacity providers for the EC2 launch type]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/asg-capacity-providers.html [Amazon ECS clusters for Fargate]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-capacity-providers.html [Amazon ECS environment variables]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-environment-variables.html @@ -1865,6 +1900,7 @@ Specify a supported value for the task CPU and memory in your task definition. [AWS Fargate Spot Now Generally Available]: https://aws.amazon.com/blogs/aws/aws-fargate-spot-now-generally-available/ [Centralized Container Logging with Fluent Bit]: https://aws.amazon.com/blogs/opensource/centralized-container-logging-fluent-bit/ [ecs execute-command proposal]: https://github.com/aws/containers-roadmap/issues/1050 +[Effectively Using Spot Instances in AWS ECS for Production Workloads]: https://medium.com/@ankur.ecb/effectively-using-spot-instances-in-aws-ecs-for-production-workloads-d46985d0ae2d [EventBridge Scheduler]: https://docs.aws.amazon.com/scheduler/latest/UserGuide/what-is-scheduler.html [Example Amazon ECS task definition: Route logs to FireLens]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/firelens-taskdef.html [fargate tasks sizes]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-tasks-services.html#fargate-tasks-size diff --git a/knowledge base/cloud computing/aws/eks.md b/knowledge base/cloud computing/aws/eks.md index 3849b2b..3577555 100644 --- a/knowledge base/cloud computing/aws/eks.md +++ b/knowledge base/cloud computing/aws/eks.md @@ -26,6 +26,7 @@ 1. [Identify common issues](#identify-common-issues) 1. [The worker nodes fail to join the cluster](#the-worker-nodes-fail-to-join-the-cluster) 1. [AWS ELB controller fails to get the region from the host's metadata](#aws-elb-controller-fails-to-get-the-region-from-the-hosts-metadata) +1. [Cost-saving measures](#cost-saving-measures) 1. [Further readings](#further-readings) 1. [Sources](#sources) @@ -1413,6 +1414,13 @@ helm upgrade -i --repo 'https://aws.github.io/eks-charts' \ --set 'vpcId'='vpc-01234567' ``` +## Cost-saving measures + +- Consider [using spot instances][building for cost optimization and resilience for eks with spot instances] for + non-critical workloads. +- Consider applying for EC2 Instance and/or Compute Savings Plans if using EC2 worker nodes. + Consider applying for Compute Savings Plans if using Fargate profiles. + ## Further readings - [Amazon Web Services] @@ -1512,6 +1520,7 @@ helm upgrade -i --repo 'https://aws.github.io/eks-charts' \ [aws eks create-nodegroup]: https://docs.aws.amazon.com/cli/latest/reference/eks/create-nodegroup.html [AWS Node Termination Handler]: https://github.com/aws/aws-node-termination-handler [awssupport-troubleshooteksworkernode runbook]: https://docs.aws.amazon.com/systems-manager-automation-runbooks/latest/userguide/automation-awssupport-troubleshooteksworkernode.html +[Building for Cost optimization and Resilience for EKS with Spot Instances]: https://aws.amazon.com/blogs/compute/cost-optimization-and-resilience-eks-with-spot-instances/ [choosing an amazon ec2 instance type]: https://docs.aws.amazon.com/eks/latest/userguide/choosing-instance-type.html [configure instance permissions required for systems manager]: https://docs.aws.amazon.com/systems-manager/latest/userguide/setup-instance-profile.html#instance-profile-policies-overview [create an amazon ebs csi driver iam role]: https://docs.aws.amazon.com/eks/latest/userguide/csi-iam-role.html diff --git a/knowledge base/cloud computing/aws/opensearch.md b/knowledge base/cloud computing/aws/opensearch.md index d379b13..f79e726 100644 --- a/knowledge base/cloud computing/aws/opensearch.md +++ b/knowledge base/cloud computing/aws/opensearch.md @@ -380,7 +380,7 @@ can manage. ## Cost-saving measures -- Choose appropriate [instance types and sizes][supported instance types in amazon opensearch service].
+- Choose _appropriate_ [instance types and sizes][supported instance types in amazon opensearch service].
Leverage the ability to select them to tailor the service offering to one's needs. > [OR1 instances][or1 storage for amazon opensearch service] **cannot** (currently?) be selected as master nodes.
diff --git a/knowledge base/cloud computing/aws/rds.md b/knowledge base/cloud computing/aws/rds.md index 49392bd..bf3c327 100644 --- a/knowledge base/cloud computing/aws/rds.md +++ b/knowledge base/cloud computing/aws/rds.md @@ -28,6 +28,7 @@ 1. [ERROR: must be superuser to alter _X_ roles or change _X_ attribute](#error-must-be-superuser-to-alter-x-roles-or-change-x-attribute) 1. [Transport fails asking for the remote user must have superuser, but it already does](#transport-fails-asking-for-the-remote-user-must-have-superuser-but-it-already-does) 1. [The instance is unbearably slow](#the-instance-is-unbearably-slow) +1. [Cost-saving measures](#cost-saving-measures) 1. [Further readings](#further-readings) 1. [Sources](#sources) @@ -103,6 +104,10 @@ Maintenance windows are paused when their DB instances are stopped. # Show details of RDS instances. aws rds describe-db-instances aws rds describe-db-instances --output 'json' --query "DBInstances[?(DBInstanceIdentifier=='master-prod')]" +aws rds describe-db-instances --db-instance-identifier 'some-db-instance' \ + --query 'DBInstances[0].InstanceCreateTime' --output 'text' +aws rds describe-db-instances --db-instance-identifier 'some-db-instance' --output 'text' \ + --query 'DBInstances[0]|join(``,[`postgresql://`,MasterUsername,`@`,Endpoint.Address,to_string(Endpoint.Port),`/`,DBname||`postgres`])' # Enable Performance Insights. aws rds modify-db-cluster --db-cluster-identifier 'staging-cluster' \ @@ -1073,6 +1078,16 @@ or write workloads and exceeds the instance type quotas. +## Cost-saving measures + +- Choose _appropriate_ instance types and sizes. +- Prefer using [reserved instances][rds reserved instances] when one can stay on a single instance type for the whole + duration of the reservation.
+ Should the DB type **not** change in time, prefer _Standard RIs_. Otherwise, prefer _Convertible RIs_ for + flexibility. + + RDS does **not** support Savings Plans at the time of writing. + ## Further readings - [Working with DB instance read replicas] @@ -1136,6 +1151,7 @@ or write workloads and exceeds the instance type quotas. [migrating databases using rds postgresql transportable databases]: https://aws.amazon.com/blogs/database/migrating-databases-using-rds-postgresql-transportable-databases/ [Multi-AZ DB instance deployments for Amazon RDS]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.MultiAZSingleStandby.html [pricing and data retention for performance insights]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.Overview.cost.html +[RDS reserved instances]: https://aws.amazon.com/rds/reserved-instances/ [Recommended alarms for RDS]: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Best_Practice_Recommended_Alarms_AWS_Services.html#RDS [renaming a db instance]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_RenameInstance.html [Restoring a DB instance to a specified time for Amazon RDS]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIT.html diff --git a/knowledge base/cloud computing/aws/s3.md b/knowledge base/cloud computing/aws/s3.md index e64bbf4..5e2b75a 100644 --- a/knowledge base/cloud computing/aws/s3.md +++ b/knowledge base/cloud computing/aws/s3.md @@ -3,6 +3,7 @@ 1. [TL;DR](#tldr) 1. [Storage classes](#storage-classes) 1. [Lifecycle configuration](#lifecycle-configuration) +1. [Cost-saving measures](#cost-saving-measures) 1. [Further readings](#further-readings) 1. [Sources](#sources) @@ -215,6 +216,17 @@ actions. In such cases: Examples: [1][lifecycle configuration examples], [2][s3 lifecycle rules examples] +## Cost-saving measures + +- Prefer using lower storage classes for data that is not frequently accessed.
+ Lower storage classes have a minimum storage fee period **and** retrieval fees. +- Consider using [Lifecycle configuration] to move down in storage tier that data that is not frequently accessed after + some time. +- Prefer using [S3 Intelligent-Tiering][how s3 intelligent-tiering works] when not knowing how frequently data is + accessed. +- Consider expiring old data after some time, if its retention is not needed. +- Consider compressing data before uploading it. + ## Further readings - [Amazon Web Services] @@ -245,6 +257,8 @@ Examples: [1][lifecycle configuration examples], [2][s3 lifecycle rules example --> +[Lifecycle configuration]: #lifecycle-configuration + [amazon web services]: README.md [cli]: cli.md diff --git a/knowledge base/cloud computing/aws/sagemaker.md b/knowledge base/cloud computing/aws/sagemaker.md index b0c86f0..8e91e34 100644 --- a/knowledge base/cloud computing/aws/sagemaker.md +++ b/knowledge base/cloud computing/aws/sagemaker.md @@ -1,6 +1,7 @@ # Sagemaker 1. [TL;DR](#tldr) +1. [Cost-saving measures](#cost-saving-measures) 1. [Further readings](#further-readings) 1. [Sources](#sources) @@ -11,6 +12,11 @@ - Serverless Endpoints' backend use **a snapshot** of the Endpoint Configuration at the time each host is created.
To make a serverless Endpoint use a new Configuration or Model, its hosts need to be replaced. +## Cost-saving measures + +- Use a single endpoint for multiple models where it makes sense. +- Delete endpoints when they are not used anymore. + ## Further readings - [Amazon Web Services] diff --git a/knowledge base/kubernetes/README.md b/knowledge base/kubernetes/README.md index 3facc14..93f5377 100644 --- a/knowledge base/kubernetes/README.md +++ b/knowledge base/kubernetes/README.md @@ -52,6 +52,7 @@ Hosted by the [Cloud Native Computing Foundation][cncf]. 1. [Run a command just before a Pod stops](#run-a-command-just-before-a-pod-stops) 1. [Examples](#examples) 1. [Create an admission webhook](#create-an-admission-webhook) +1. [Cost-saving measures](#cost-saving-measures) 1. [Further readings](#further-readings) 1. [Sources](#sources) @@ -1256,6 +1257,16 @@ you need: See the example's [README][create an admission webhook]. +## Cost-saving measures + +- Reconsider one's choices.
+ Does one really need a Kubernetes cluster? They introduce multiple redundancy, and have high complexity.
+ Consider the resources and maintenance efforts that will inevitably go into that. +- Consider leveraging autoscaling.
+ See [Horizontal Pod Autoscaling] and [KEDA] to scale Pods depending on metrics.
+ See [Node Autoscaling][node scaling] to scale Nodes depending on number of Pods, node features, or resource + consumption. + ## Further readings Usage: @@ -1350,6 +1361,7 @@ Others: [horizontal pod autoscaler]: #horizontal-pod-autoscaler +[node scaling]: #node-scaling [vertical pod autoscaler]: #vertical-pod-autoscaler [pods]: #pods [privileged container vs privilege escalation]: #privileged-container-vs-privilege-escalation