mirror of
https://gitea.com/mcereda/oam.git
synced 2026-02-09 05:44:23 +00:00
chore: make cluster-autoscaler work on eks
This commit is contained in:
@@ -1,7 +1,5 @@
|
||||
# AWS CLI
|
||||
|
||||
## Table of contents <!-- omit in toc -->
|
||||
|
||||
1. [TL;DR](#tldr)
|
||||
1. [Profiles](#profiles)
|
||||
1. [Configuration](#configuration)
|
||||
@@ -17,11 +15,12 @@ Do *not* use `--max-items` together with `--query`: the items limit is applied b
|
||||
to show no results.
|
||||
|
||||
<details>
|
||||
<summary>Installation and configuration</summary>
|
||||
<summary>Setup</summary>
|
||||
|
||||
```sh
|
||||
# Install the CLI.
|
||||
brew install 'awscli'
|
||||
docker pull 'amazon/aws-cli'
|
||||
pip install 'awscli'
|
||||
|
||||
# Configure profiles.
|
||||
@@ -53,6 +52,9 @@ rm -r ~'/.aws/cli/cache'
|
||||
<summary>Usage</summary>
|
||||
|
||||
```sh
|
||||
# Use the docker version.
|
||||
docker run --rm -ti -v "$HOME/.aws:/root/.aws:ro" 'amazon/aws-cli:2.17.16' autoscaling describe-auto-scaling-groups
|
||||
|
||||
# List applications in CodeDeploy.
|
||||
aws deploy list-applications
|
||||
|
||||
|
||||
@@ -11,11 +11,14 @@
|
||||
1. [Storage](#storage)
|
||||
1. [Use EBS as volumes](#use-ebs-as-volumes)
|
||||
1. [EBS CSI driver IAM role](#ebs-csi-driver-iam-role)
|
||||
1. [Pod identity](#pod-identity)
|
||||
1. [Autoscaling](#autoscaling)
|
||||
1. [Cluster autoscaler](#cluster-autoscaler)
|
||||
1. [Troubleshooting](#troubleshooting)
|
||||
1. [Identify common issues](#identify-common-issues)
|
||||
1. [The worker nodes fail to join the cluster](#the-worker-nodes-fail-to-join-the-cluster)
|
||||
1. [Identify common issues](#identify-common-issues)
|
||||
1. [The worker nodes fail to join the cluster](#the-worker-nodes-fail-to-join-the-cluster)
|
||||
1. [Further readings](#further-readings)
|
||||
1. [Sources](#sources)
|
||||
1. [Sources](#sources)
|
||||
|
||||
## TL;DR
|
||||
|
||||
@@ -34,9 +37,9 @@ both the control plane and nodes.<br/>
|
||||
Such security group cannot be avoided nor customized in the cluster's definition (e.g. using IaC tools like [Pulumi] or
|
||||
[Terraform]):
|
||||
|
||||
> ```txt
|
||||
> error: aws:eks/cluster:Cluster resource 'cluster' has a problem: Value for unconfigurable attribute. Can't configure a value for "vpc_config.0.cluster_security_group_id": its value will be decided automatically based on the result of applying this configuration.
|
||||
> ```
|
||||
> error: aws:eks/cluster:Cluster resource 'cluster' has a problem: Value for unconfigurable attribute. Can't configure a
|
||||
> value for "vpc_config.0.cluster_security_group_id": its value will be decided automatically based on the result of
|
||||
> applying this configuration.
|
||||
|
||||
For some reason, giving resources a tag like `aks:eks:cluster-name=value` succeeds, but has no effect (it is not really
|
||||
applied).
|
||||
@@ -83,6 +86,7 @@ aws eks associate-access-policy --cluster-name 'DeepThought' \
|
||||
|
||||
# Connect to clusters.
|
||||
aws eks update-kubeconfig --name 'DeepThought' && kubectl cluster-info
|
||||
aws eks --region 'eu-west-1' update-kubeconfig --name 'oneForAll' --profile 'dev-user' && kubectl cluster-info
|
||||
|
||||
|
||||
# Create EC2 node groups.
|
||||
@@ -100,6 +104,10 @@ aws eks create-fargate-profile \
|
||||
--pod-execution-role-arn 'arn:aws:iam::000011112222:role/DeepThinkerFargate' \
|
||||
--subnets 'subnet-11112222333344445' 'subnet-66667777888899990' \
|
||||
--selectors 'namespace=string'
|
||||
|
||||
|
||||
# Get addon names.
|
||||
aws eks describe-addon-versions --query 'addons[].addonName'
|
||||
```
|
||||
|
||||
</details>
|
||||
@@ -685,17 +693,137 @@ Requirements:
|
||||
1. ClusterRole, ClusterRoleBinding, and other RBAC components.
|
||||
1. Snapshot controller's Deployment.
|
||||
|
||||
## Pod identity
|
||||
|
||||
Refer [Learn how EKS Pod Identity grants pods access to AWS services].
|
||||
|
||||
Provides pods the ability to manage AWS credentials in a similar way to how EC2 instance profiles provide credentials to
|
||||
instances.
|
||||
|
||||
Limitations:
|
||||
|
||||
- Pod Identity Agents are DaemonSets.<br/>
|
||||
This means they **cannot** run on Fargate hosts and **will** require EC2 nodes.
|
||||
- Does **not** work with **Amazon-provided EKS add-ons** that need IAM credentials.<br/>
|
||||
These controllers, drivers and plugins support EKS Pod Identities should they be installed as **self-managed** add-ons
|
||||
instead.
|
||||
|
||||
Procedure:
|
||||
|
||||
1. Set up the Pod Identity Agent on clusters.
|
||||
|
||||
<details>
|
||||
<summary>Requirements</summary>
|
||||
|
||||
- The **nodes**' service role **must** have permissions for the agent to execute `AssumeRoleForPodIdentity` actions in
|
||||
the EKS Auth API.
|
||||
|
||||
Use the AWS-managed `AmazonEKSWorkerNodePolicy` policy.<br/>
|
||||
Alternatively, add a custom policy with the following:
|
||||
|
||||
```json
|
||||
{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [{
|
||||
"Effect": "Allow",
|
||||
"Action": [ "eks-auth:AssumeRoleForPodIdentity" ],
|
||||
"Resource": "*"
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
Limit this action using tags to restrict which roles can be assumed by pods that use the agent.
|
||||
|
||||
- Nodes to **be able** to reach and download images from ECRs.<br/>
|
||||
Required since the container image for the add-on is available there.
|
||||
- Nodes to **be able** to reach the EKS Auth API.<br/>
|
||||
Private clusters **will** require the `eks-auth` endpoint in PrivateLink.
|
||||
|
||||
</details>
|
||||
<details>
|
||||
<summary>CLI</summary>
|
||||
|
||||
```sh
|
||||
aws eks create-addon --cluster-name 'cluster' --addon-name 'eks-pod-identity-agent'
|
||||
aws eks create-addon --cluster-name 'cluster' --addon-name 'eks-pod-identity-agent' --resolve-conflicts 'OVERWRITE'
|
||||
```
|
||||
|
||||
</details>
|
||||
<details style="margin-bottom: 1em">
|
||||
<summary>Pulumi</summary>
|
||||
|
||||
```ts
|
||||
new aws.eks.Addon("pod-identity", {
|
||||
clusterName: cluster.name,
|
||||
addonName: "eks-pod-identity-agent",
|
||||
resolveConflictsOnCreate: "OVERWRITE",
|
||||
resolveConflictsOnUpdate: "OVERWRITE",
|
||||
});
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
1. Associate IAM roles with Kubernetes service accounts:
|
||||
|
||||
<details>
|
||||
<summary>CLI</summary>
|
||||
|
||||
```sh
|
||||
aws eks create-pod-identity-association \
|
||||
--cluster-name 'cluster' --namespace 'default' \
|
||||
--service-account 'default' --role-arn 'arn:aws:iam::012345678901:role/CustomRole'
|
||||
```
|
||||
|
||||
</details>
|
||||
<details style="margin-bottom: 1em">
|
||||
<summary>Pulumi</summary>
|
||||
|
||||
```ts
|
||||
new aws.eks.PodIdentityAssociation("customRole-to-defaultServiceAccount", {
|
||||
clusterName: cluster.name,
|
||||
roleArn: customRole.arn,
|
||||
serviceAccount: "default",
|
||||
namespace: "default",
|
||||
});
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
There is no need for the service account to exists before association.<br/>
|
||||
The moment it will be created in the defined namespace, it will also be able to assume the role.
|
||||
|
||||
1. Configure pods to use those service accounts.
|
||||
|
||||
## Autoscaling
|
||||
|
||||
Autoscaling of EKS clusters can happen:
|
||||
|
||||
- _Horizontally_ (as in **number** of nodes) through the use of [Cluster Autoscaler].
|
||||
- _Vertically_ (as in **size** of nodes) through the use of [Karpenter].
|
||||
|
||||
The pods running the autoscaling components **will need** the necessary permissions to operate on the cluster's
|
||||
resources.<br/>
|
||||
This means giving them pods access keys, or enabling [Pod Identity].
|
||||
|
||||
### Cluster autoscaler
|
||||
|
||||
Nothing more than the [Kubernetes' cluster autoscaler component].
|
||||
|
||||
After any operation, the cluster autoscaler will wait for the ASG cooldown time to end.<br/>
|
||||
Only then, it will start counting down its own timers.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
See [Amazon EKS troubleshooting].
|
||||
|
||||
### Identify common issues
|
||||
|
||||
Use the [AWSSupport-TroubleshootEKSWorkerNode](https://docs.aws.amazon.com/systems-manager-automation-runbooks/latest/userguide/automation-awssupport-troubleshooteksworkernode.html) runbook.
|
||||
Use the [AWSSupport-TroubleshootEKSWorkerNode runbook].
|
||||
|
||||
> For the automation to work, worker nodes **must** have permission to access Systems Manager and have Systems Manager
|
||||
> running.<br/>
|
||||
> Grant this permission by attaching the [`AmazonSSMManagedInstanceCore`](https://docs.aws.amazon.com/systems-manager/latest/userguide/setup-instance-profile.html#instance-profile-policies-overview) policy to the node role.
|
||||
> Grant this permission by attaching the `AmazonSSMManagedInstanceCore` policy to the node role.<br/>
|
||||
> See [Configure instance permissions required for Systems Manager].
|
||||
|
||||
Procedure:
|
||||
|
||||
@@ -754,6 +882,9 @@ Debug: see [Identify common issues].
|
||||
- [How to Add IAM User and IAM Role to AWS EKS Cluster?]
|
||||
- [Amazon Elastic Block Store (EBS) CSI driver]
|
||||
- [Manage the Amazon EBS CSI driver as an Amazon EKS add-on]
|
||||
- [How do you get kubectl to log in to an AWS EKS cluster?]
|
||||
- [Learn how EKS Pod Identity grants pods access to AWS services]
|
||||
- [Configure instance permissions required for Systems Manager]
|
||||
|
||||
<!--
|
||||
Reference
|
||||
@@ -762,16 +893,20 @@ Debug: see [Identify common issues].
|
||||
|
||||
<!-- In-article sections -->
|
||||
[access management]: #access-management
|
||||
[cluster autoscaler]: #cluster-autoscaler
|
||||
[create worker nodes]: #create-worker-nodes
|
||||
[ebs csi driver iam role]: #ebs-csi-driver-iam-role
|
||||
[identify common issues]: #identify-common-issues
|
||||
[pod identity]: #pod-identity
|
||||
[requirements]: #requirements
|
||||
[secrets encryption through kms]: #secrets-encryption-through-kms
|
||||
|
||||
<!-- Knowledge base -->
|
||||
[amazon web services]: README.md
|
||||
[cli]: cli.md
|
||||
[kubernetes' cluster autoscaler component]: ../../kubernetes/cluster%20autoscaler.md
|
||||
[ebs]: ebs.md
|
||||
[karpenter]: ../../kubernetes/karpenter.placeholder
|
||||
[kubernetes]: ../../kubernetes/README.md
|
||||
[pulumi]: ../../pulumi.md
|
||||
[terraform]: ../../pulumi.md
|
||||
@@ -790,7 +925,9 @@ Debug: see [Identify common issues].
|
||||
[aws eks create-cluster]: https://docs.aws.amazon.com/cli/latest/reference/eks/create-cluster.html
|
||||
[aws eks create-fargate-profile]: https://docs.aws.amazon.com/cli/latest/reference/eks/create-fargate-profile.html
|
||||
[aws eks create-nodegroup]: https://docs.aws.amazon.com/cli/latest/reference/eks/create-nodegroup.html
|
||||
[AWSSupport-TroubleshootEKSWorkerNode runbook]: https://docs.aws.amazon.com/systems-manager-automation-runbooks/latest/userguide/automation-awssupport-troubleshooteksworkernode.html
|
||||
[choosing an amazon ec2 instance type]: https://docs.aws.amazon.com/eks/latest/userguide/choosing-instance-type.html
|
||||
[configure instance permissions required for systems manager]: https://docs.aws.amazon.com/systems-manager/latest/userguide/setup-instance-profile.html#instance-profile-policies-overview
|
||||
[de-mystifying cluster networking for amazon eks worker nodes]: https://aws.amazon.com/blogs/containers/de-mystifying-cluster-networking-for-amazon-eks-worker-nodes/
|
||||
[eks workshop]: https://www.eksworkshop.com/
|
||||
[enabling iam principal access to your cluster]: https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html
|
||||
@@ -802,6 +939,7 @@ Debug: see [Identify common issues].
|
||||
[how do i resolve the error "you must be logged in to the server (unauthorized)" when i connect to the amazon eks api server?]: https://repost.aws/knowledge-center/eks-api-server-unauthorized-error
|
||||
[how do i use persistent storage in amazon eks?]: https://repost.aws/knowledge-center/eks-persistent-storage
|
||||
[identity and access management]: https://aws.github.io/aws-eks-best-practices/security/docs/iam/
|
||||
[learn how eks pod identity grants pods access to aws services]: https://docs.aws.amazon.com/eks/latest/userguide/pod-identities.html
|
||||
[manage the amazon ebs csi driver as an amazon eks add-on]: https://docs.aws.amazon.com/eks/latest/userguide/managing-ebs-csi.html
|
||||
[managed node groups]: https://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html
|
||||
[private cluster requirements]: https://docs.aws.amazon.com/eks/latest/userguide/private-clusters.html
|
||||
@@ -817,5 +955,6 @@ Debug: see [Identify common issues].
|
||||
<!-- Others -->
|
||||
[amazon elastic block store (ebs) csi driver]: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/README.md
|
||||
[external-snapshotter]: https://github.com/kubernetes-csi/external-snapshotter
|
||||
[how do you get kubectl to log in to an aws eks cluster?]: https://stackoverflow.com/questions/53266960/how-do-you-get-kubectl-to-log-in-to-an-aws-eks-cluster
|
||||
[how to add iam user and iam role to aws eks cluster?]: https://antonputra.com/kubernetes/add-iam-user-and-iam-role-to-eks/
|
||||
[visualizing aws eks kubernetes clusters with relationship graphs]: https://dev.to/aws-builders/visualizing-aws-eks-kubernetes-clusters-with-relationship-graphs-46a4
|
||||
|
||||
72
knowledge base/kubernetes/cluster autoscaler.md
Normal file
72
knowledge base/kubernetes/cluster autoscaler.md
Normal file
@@ -0,0 +1,72 @@
|
||||
# Cluster autoscaler
|
||||
|
||||
Automatically adjusts the number of nodes in Kubernetes clusters.
|
||||
|
||||
1. [TL;DR](#tldr)
|
||||
1. [Further readings](#further-readings)
|
||||
1. [Sources](#sources)
|
||||
|
||||
## TL;DR
|
||||
|
||||
Acts when one of the following conditions is true:
|
||||
|
||||
- Pods failed to run in the cluster due to insufficient resources.
|
||||
- Nodes in the cluster have been underutilized for an extended period of time, and their pods can be placed on other
|
||||
existing nodes.
|
||||
|
||||
<details>
|
||||
<summary>Setup</summary>
|
||||
|
||||
```sh
|
||||
helm repo add 'autoscaler' 'https://kubernetes.github.io/autoscaler'
|
||||
helm show values 'autoscaler/cluster-autoscaler'
|
||||
|
||||
helm install 'cluster-autoscaler' 'autoscaler/cluster-autoscaler' --set 'autoDiscovery.clusterName'=clusterName
|
||||
helm --namespace 'kube-system' upgrade --install 'cluster-autoscaler' 'autoscaler/cluster-autoscaler' \
|
||||
--set 'autoDiscovery.clusterName'=clusterName
|
||||
|
||||
helm uninstall 'cluster-autoscaler'
|
||||
helm --namespace 'kube-system' uninstall 'cluster-autoscaler'
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<!-- Uncomment if used
|
||||
<details>
|
||||
<summary>Usage</summary>
|
||||
|
||||
```sh
|
||||
```
|
||||
|
||||
</details>
|
||||
-->
|
||||
|
||||
<details>
|
||||
<summary>Real world use cases</summary>
|
||||
|
||||
```sh
|
||||
helm --namespace 'kube-system' upgrade --install 'cluster-autoscaler' 'autoscaler/cluster-autoscaler' \
|
||||
--set 'cloudProvider'='aws' --set 'awsRegion'='eu-west-1' \
|
||||
--set 'autoDiscovery.clusterName'='defaultCluster' --set 'rbac.serviceAccount.name'='cluster-autoscaler-aws'
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
## Further readings
|
||||
|
||||
- [Main repository]
|
||||
|
||||
### Sources
|
||||
|
||||
<!--
|
||||
Reference
|
||||
═╬═Time══
|
||||
-->
|
||||
|
||||
<!-- In-article sections -->
|
||||
<!-- Knowledge base -->
|
||||
<!-- Files -->
|
||||
<!-- Upstream -->
|
||||
[main repository]: https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
|
||||
|
||||
<!-- Others -->
|
||||
@@ -405,6 +405,13 @@ kubectl top node 'my-node'
|
||||
# Forward local connections to cluster resources.
|
||||
kubectl port-forward 'my-pod' '5000:6000'
|
||||
kubectl -n 'default' port-forward 'service/my-service' '8443:https'
|
||||
|
||||
# Start pods and attach to them.
|
||||
kubectl run --rm -it --image 'alpine' 'alpine' --command -- sh
|
||||
kubectl run --rm -it --image 'amazon/aws-cli:2.17.16' 'awscli' -- autoscaling describe-auto-scaling-groups
|
||||
|
||||
# Attach to running pods.
|
||||
kubectl attach 'alpine' -c 'alpine' -it
|
||||
```
|
||||
|
||||
</details>
|
||||
@@ -422,6 +429,12 @@ kubectl -n 'awx' port-forward 'service/awx-service' '8080:http'
|
||||
# Delete leftovers CRDs from helm charts by release name.
|
||||
kubectl delete crds -l "helm.sh/chart=awx-operator"
|
||||
|
||||
# Run pods with specific specs.
|
||||
kubectl -n 'kube-system' run --rm -it 'awscli' --overrides '{"spec":{"serviceAccountName":"cluster-autoscaler-aws"}}' \
|
||||
--image '012345678901.dkr.ecr.eu-west-1.amazonaws.com/cache/amazon/aws-cli:2.17.16' \
|
||||
-- \
|
||||
autoscaling describe-auto-scaling-groups
|
||||
|
||||
# Show Containers' status, properties and capabilities from the inside.
|
||||
# Run the command from *inside* the container.
|
||||
cat '/proc/1/status'
|
||||
|
||||
Reference in New Issue
Block a user