chore(kb/aws/ecs): add part about capacity providers

2026-02-09 05:44:23 +00:00 · 2025-07-04 09:52:05 +02:00
parent a91aee3d67
commit 67bef584a6
1 changed files with 117 additions and 3 deletions
--- a/computing/aws/ecs.md
+++ b/computing/aws/ecs.md
@@ -6,6 +6,9 @@
   1. [Fargate launch type](#fargate-launch-type)
   1. [Standalone tasks](#standalone-tasks)
   1. [Services](#services)
+1. [Capacity providers](#capacity-providers)
+   1. [EC2 capacity providers](#ec2-capacity-providers)
+   1. [Fargate for ECS](#fargate-for-ecs)
 1. [Resource constraints](#resource-constraints)
 1. [Environment variables](#environment-variables)
 1. [Storage](#storage)
@@ -235,6 +238,113 @@ Available service scheduler strategies:

  Fargate does **not** support the `DAEMON` scheduling strategy.

+## Capacity providers
+
+Refer [Capacity providers].
+
+Clusters can contain a mix of tasks that are hosted on Fargate, Amazon EC2 instances, or external instances.<br/>
+Tasks can run on Fargate or EC2 infrastructure as a launch type or a capacity provider strategy.<br/>
+Capacity providers manage the scaling of infrastructure for tasks in one's clusters.
+
+Each cluster can have one or more _capacity providers_, and an optional _capacity provider strategy_.
+
+The capacity provider strategy determines how tasks are spread across a cluster's capacity providers.<br/>
+One can assign a **default** capacity provider strategy to a cluster. This strategy **only** applies when one does not
+specify a launch type nor a capacity provider strategy for a task or service. If either of these parameters is provided,
+the default strategy is ignored.<br/>
+When running a standalone task or creating a service, one can either use the cluster's default capacity provider
+strategy or provide one that overrides the default.
+
+One must associate a capacity provider with a cluster **before** associating it with a capacity provider strategy.<br/>
+Strategies allow to specify a maximum of 20 capacity providers.
+
+Strategies' weight value defaults to `1` when creating it from the Console, and to `0` if using the API or CLI.
+
+To run tasks on Fargate, one only needs to associate one or more of the pre-defined Fargate-specific capacity providers
+with the cluster. This takes away the need to create or manage that cluster's capacity.
+
+Clusters _can_ contain a mix of Fargate and Auto Scaling group capacity providers. However, a capacity provider strategy
+can only contain either Fargate or Auto Scaling group capacity providers, but **not both**.
+
+One **cannot** update a service that is using an Auto Scaling Group capacity provider to use a Fargate one, and
+vice versa.
+
+When multiple capacity providers are specified within a strategy, at least one of the providers **must** have a weight
+value greater than zero.<br/>
+Capacity providers with a weight of zero are **not** used to run tasks. Should all specified providers in a strategy
+have the same weight of zero, any RunTask or CreateService actions using that strategy will fail.
+
+In strategies, only **one** capacity provider can have a defined `base` value. If no base value is specified for a
+provider, it defaults to zero.
+
+A cluster can contain a mix of services and standalone tasks that use both capacity providers and launch types.<br/>
+Services _can_ be updated to use a capacity provider strategy instead of a launch type, but one will need to force a new
+deployment to do so.
+
+### EC2 capacity providers
+
+Refer [Amazon ECS capacity providers for the EC2 launch type].
+
+When using EC2 instances for capacity, one really uses Auto Scaling groups to manage the EC2 instances.<br/>
+Auto Scaling helps ensure that one has the correct number of EC2 instances available to handle the application's load.
+
+### Fargate for ECS
+
+Refer [AWS Fargate Spot Now Generally Available] and [Amazon ECS clusters for Fargate].
+
+ECS can run tasks on the `Fargate` and `Fargate Spot` capacity when they are associated with a cluster.
+
+Fargate Spot is intended for **interruption tolerant** tasks.<br/>
+runs tasks on spare compute capacity. This makes it cost less the normal Fargate price, but comes with the ability for
+AWS to interrupt tasks when it needs capacity back.
+
+During periods of extremely high demand, Fargate Spot capacity might be unavailable.<br/>
+When this happens, ECS services retry launching tasks until the required capacity becomes available.
+
+ECS sends **a two-minute warning** before Spot tasks are stopped due to a Spot interruption. This warning is sent as a
+task state change event to EventBridge and as a SIGTERM signal to the running task.
+
+<details style='padding: 0 0 1rem 1rem'>
+
+```json
+{
+  "version": "0",
+  "id": "9bcdac79-b31f-4d3d-9410-fbd727c29fab",
+  "detail-type": "ECS Task State Change",
+  "source": "aws.ecs",
+  "account": "111122223333",
+  "resources": [
+    "arn:aws:ecs:us-east-1:111122223333:task/b99d40b3-5176-4f71-9a52-9dbd6f1cebef"
+  ],
+  "detail": {
+    "clusterArn": "arn:aws:ecs:us-east-1:111122223333:cluster/default",
+    "createdAt": "2016-12-06T16:41:05.702Z",
+    "desiredStatus": "STOPPED",
+    "lastStatus": "RUNNING",
+    "stoppedReason": "Your Spot Task was interrupted.",
+    "stopCode": "SpotInterruption",
+    "taskArn": "arn:aws:ecs:us-east-1:111122223333:task/b99d40b3-5176-4f71-9a52-9dbd6fEXAMPLE",
+    ...
+  }
+}
+```
+
+</details>
+
+When Spot tasks are terminated, the service scheduler receives the interruption signal and attempts to launch additional
+tasks on Fargate Spot if such capacity is available, possibly from a different Availability Zone.
+
+Fargate will **not** replace Spot capacity with on-demand capacity.
+
+Ensure containers exit gracefully before the task stops by configuring the following:
+
+- Specify a `stopTimeout` value of 120 seconds or less in the container definition that the task is using.<br/>
+  The default value is 30 seconds. A higher value will provide more time between the moment that the task's state change
+  event is received and the point in time when the container is forcefully stopped.
+- Make sure the `SIGTERM` signal is caught from within the container and triggers any cleanup actions.<br/>
+  Not processing this signal results in the task receiving a `SIGKILL` signal after the configured `stopTimeout` value,
+  which may result in data loss or corruption.
+
 ## Resource constraints

 ECS uses the CPU period and the CPU quota to control the task's CPU **hard** limits **as a whole**.<br/>
@@ -1243,6 +1353,8 @@ Specify a supported value for the task CPU and memory in your task definition.
 [efs]: efs.md

 <!-- Upstream -->
+[Amazon ECS capacity providers for the EC2 launch type]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/asg-capacity-providers.html
+[Amazon ECS clusters for Fargate]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-capacity-providers.html
 [Amazon ECS environment variables]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-environment-variables.html
 [amazon ecs exec checker]: https://github.com/aws-containers/amazon-ecs-exec-checker
 [Amazon ECS FireLens Examples]: https://github.com/aws-samples/amazon-ecs-firelens-examples
@@ -1253,7 +1365,10 @@ Specify a supported value for the task CPU and memory in your task definition.
 [amazon ecs task lifecycle]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-lifecycle-explanation.html
 [amazon ecs task role]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html
 [Amazon VPC Lattice pricing]: https://aws.amazon.com/vpc/lattice/pricing/
+[Automatically scale your Amazon ECS service]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-auto-scaling.html
 [AWS Distro for OpenTelemetry]: https://aws-otel.github.io/
+[AWS Fargate Spot Now Generally Available]: https://aws.amazon.com/blogs/aws/aws-fargate-spot-now-generally-available/
+[Capacity providers]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/clusters.html#capacity-providers
 [Centralized Container Logging with Fluent Bit]: https://aws.amazon.com/blogs/opensource/centralized-container-logging-fluent-bit/
 [ecs execute-command proposal]: https://github.com/aws/containers-roadmap/issues/1050
 [Example Amazon ECS task definition: Route logs to FireLens]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/firelens-taskdef.html
@@ -1261,11 +1376,13 @@ Specify a supported value for the task CPU and memory in your task definition.
 [how amazon ecs manages cpu and memory resources]: https://aws.amazon.com/blogs/containers/how-amazon-ecs-manages-cpu-and-memory-resources/
 [how amazon elastic container service works with iam]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/security_iam_service-with-iam.html
 [How can I allow the tasks in my Amazon ECS services to communicate with each other?]: https://repost.aws/knowledge-center/ecs-tasks-services-communication
+[How target tracking scaling for Application Auto Scaling works]: https://docs.aws.amazon.com/autoscaling/application/userguide/target-tracking-scaling-policy-overview.html
 [identity and access management for amazon elastic container service]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/security-iam.html
 [install the session manager plugin for the aws cli]: https://docs.aws.amazon.com/systems-manager/latest/userguide/session-manager-working-with-install-plugin.html
 [Interconnect Amazon ECS services]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/interconnecting-services.html
 [Metrics collection from Amazon ECS using Amazon Managed Service for Prometheus]: https://aws.amazon.com/blogs/opensource/metrics-collection-from-amazon-ecs-using-amazon-managed-service-for-prometheus/
 [storage options for amazon ecs tasks]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_data_volumes.html
+[Target tracking scaling policies for Application Auto Scaling]: https://docs.aws.amazon.com/autoscaling/application/userguide/application-auto-scaling-target-tracking.html
 [troubleshoot amazon ecs deployment issues]: https://docs.aws.amazon.com/codedeploy/latest/userguide/troubleshooting-ecs.html
 [troubleshoot amazon ecs task definition invalid cpu or memory errors]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-cpu-memory-error.html
 [Under the hood: FireLens for Amazon ECS Tasks]: https://aws.amazon.com/blogs/containers/under-the-hood-firelens-for-amazon-ecs-tasks/
@@ -1278,9 +1395,6 @@ Specify a supported value for the task CPU and memory in your task definition.
 [using amazon ecs exec to access your containers on aws fargate and amazon ec2]: https://aws.amazon.com/blogs/containers/new-using-amazon-ecs-exec-access-your-containers-fargate-ec2/
 [What is Amazon VPC Lattice?]: https://docs.aws.amazon.com/vpc-lattice/latest/ug/what-is-vpc-lattice.html
 [What Is AWS Cloud Map?]: https://docs.aws.amazon.com/cloud-map/latest/dg/what-is-cloud-map.html
-[Automatically scale your Amazon ECS service]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-auto-scaling.html
-[Target tracking scaling policies for Application Auto Scaling]: https://docs.aws.amazon.com/autoscaling/application/userguide/application-auto-scaling-target-tracking.html
-[How target tracking scaling for Application Auto Scaling works]: https://docs.aws.amazon.com/autoscaling/application/userguide/target-tracking-scaling-policy-overview.html

 <!-- Others -->
 [`aws ecs execute-command` results in `TargetNotConnectedException` `The execute command failed due to an internal error`]: https://stackoverflow.com/questions/69261159/aws-ecs-execute-command-results-in-targetnotconnectedexception-the-execute