chore(kb/eks): improve notes about autoscaling

2026-02-09 13:44:24 +00:00 · 2025-01-06 17:40:07 +01:00
parent f90e1619c4
commit 303d54302f
3 changed files with 124 additions and 4 deletions
--- a/base/kubernetes/cluster
+++ b/base/kubernetes/cluster
@@ -4,6 +4,8 @@ Automatically adjusts the number of nodes in Kubernetes clusters to meet their c

 1. [TL;DR](#tldr)
 1. [Best practices](#best-practices)
+1. [Troubleshooting](#troubleshooting)
+   1. [Unschedulable pods do not trigger scale-up](#unschedulable-pods-do-not-trigger-scale-up)
 1. [Further readings](#further-readings)
   1. [Sources](#sources)

@@ -62,11 +64,31 @@ aws eks --region 'eu-west-1' update-kubeconfig --name 'custom-eks-cluster' \
 ## Best practices

 - Do **not** modify nodes belonging to autoscaled node groups directly.
+  Changes will be soon lost as the modified nodes might be deleted at any time.
 - All nodes within the same autoscaled node group should have the same capacity, labels and system pods running on them.
- Specify requests for all the pods one can.
+- Specify resource requests for all the pods one can, so that nodes can be scaled more reliably.
 - Should one need to prevent pods from being deleted too abruptly, consider using PodDisruptionBudgets.
- Check one's cloud provider's quota is big enough **before** specifying min/max settings for clusters' node pools.
- Do **not** run **any** additional node group autoscaler (**especially** those from one's own cloud provider).
+- Check one's cloud provider's VM quota is big enough **before** specifying min/max settings for clusters' node pools.
+- Ensure **any** additional node group autoscaler (**especially** those from one's own cloud provider) are **not**
+  competing for resources.<br/>
+- Prefer **avoiding** running multiple node autoscalers if possible.
+
+## Troubleshooting
+
+### Unschedulable pods do not trigger scale-up
+
+#### Context  <!-- omit in toc -->
+
+As of 2025-01-06, at least with EKS, it easily happens that unschedulable pods that would normally trigger a scale-up
+stay unschedulable and cause the _pod didn't trigger scale-up_ event instead.
+
+This primarily happens when the cluster's node groups are updated for any reason.
+
+#### Solution  <!-- omit in toc -->
+
+Restarting the Cluster Autoscaler's pods worked most of the time.
+
+It seems to be some sort of issue with cache.

 ## Further readings

--- a/base/kubernetes/karpenter.md
+++ b/base/kubernetes/karpenter.md
@@ -0,0 +1,99 @@
+# Karpenter
+
+Open-source, just-in-time cloud node provisioner for Kubernetes.
+
+1. [TL;DR](#tldr)
+1. [Setup](#setup)
+1. [Further readings](#further-readings)
+   1. [Sources](#sources)
+
+## TL;DR
+
+Karpenter works by:
+
+1. Watching for unschedulable pods.
+1. Evaluating unschedulable pods' scheduling constraints (resource requests, node selectors, affinities, tolerations,
+   and topology spread constraints).
+1. Provisioning **cloud-based** nodes meeting the requirements of unschedulable pods.
+1. Deleting nodes when no longer needed.
+
+Karpenter runs as workload on the cluster.
+
+Should one manually delete a Karpenter-provisioned node, Karpenter will gracefully cordon, drain, and shutdown the
+corresponding instance.<br/>
+Under the hood, Karpenter adds a finalizer to the node object it provisions. This blocks deletion until all pods are
+drained and the instance is terminated. This **only** works for nodes provisioned by Karpenter.
+
+<details>
+  <summary>Setup</summary>
+
+```sh
+# Managed NodeGroups
+helm --namespace 'kube-system' upgrade --create-namespace \
+  --install 'karpenter' 'oci://public.ecr.aws/karpenter/karpenter' --version '1.1.1' \
+  --set 'settings.clusterName=myCluster' \
+  --set 'settings.interruptionQueue=myCluster' \
+  --set 'controller.resources.requests.cpu=1' \
+  --set 'controller.resources.requests.memory=1Gi' \
+  --set 'controller.resources.limits.cpu=1' \
+  --set 'controller.resources.limits.memory=1Gi' \
+  --wait
+
+# Fargate
+# As per the managed NodeGroups, but with a serviceAccount annotation
+helm … \
+  --set 'serviceAccount.annotations."eks.amazonaws.com/role-arn"=arn:aws:iam::012345678901:role/myCluster-karpenter'
+```
+
+</details>
+
+<!-- Uncomment if used
+<details>
+  <summary>Usage</summary>
+
+```sh
+```
+
+</details>
+-->
+
+<!-- Uncomment if used
+<details>
+  <summary>Real world use cases</summary>
+
+```sh
+```
+
+</details>
+-->
+
+## Setup
+
+Karpenter's controller and webhook deployment are designed to run as a workload on the cluster.
+
+As of 2024-12-24, it only supports AWS and Azure nodes.<br/>
+As part of the installation process, one **will** need credentials from the underlying cloud provider to allow
+Karpenter-managed nodes to be started up and added to the cluster as needed.
+
+## Further readings
+
+- [Website]
+- [Codebase]
+- [Documentation]
+
+### Sources
+
+<!--
+  Reference
+  ═╬═Time══
+  -->
+
+<!-- In-article sections -->
+<!-- Knowledge base -->
+<!-- Files -->
+<!-- Upstream -->
+[codebase]: https://github.com/aws/karpenter-provider-aws
+[documentation]: https://karpenter.sh/docs/
+[website]: https://karpenter.sh/
+
+<!-- Others -->
--- a/base/kubernetes/karpenter.placeholder
+++ b/base/kubernetes/karpenter.placeholder
@@ -1 +0,0 @@
-https://karpenter.sh/