diff --git a/knowledge base/kubernetes/cluster autoscaler.md b/knowledge base/kubernetes/cluster autoscaler.md index 70600df..47baaf1 100644 --- a/knowledge base/kubernetes/cluster autoscaler.md +++ b/knowledge base/kubernetes/cluster autoscaler.md @@ -4,6 +4,8 @@ Automatically adjusts the number of nodes in Kubernetes clusters to meet their c 1. [TL;DR](#tldr) 1. [Best practices](#best-practices) +1. [Troubleshooting](#troubleshooting) + 1. [Unschedulable pods do not trigger scale-up](#unschedulable-pods-do-not-trigger-scale-up) 1. [Further readings](#further-readings) 1. [Sources](#sources) @@ -62,11 +64,31 @@ aws eks --region 'eu-west-1' update-kubeconfig --name 'custom-eks-cluster' \ ## Best practices - Do **not** modify nodes belonging to autoscaled node groups directly. + Changes will be soon lost as the modified nodes might be deleted at any time. - All nodes within the same autoscaled node group should have the same capacity, labels and system pods running on them. -- Specify requests for all the pods one can. +- Specify resource requests for all the pods one can, so that nodes can be scaled more reliably. - Should one need to prevent pods from being deleted too abruptly, consider using PodDisruptionBudgets. -- Check one's cloud provider's quota is big enough **before** specifying min/max settings for clusters' node pools. -- Do **not** run **any** additional node group autoscaler (**especially** those from one's own cloud provider). +- Check one's cloud provider's VM quota is big enough **before** specifying min/max settings for clusters' node pools. +- Ensure **any** additional node group autoscaler (**especially** those from one's own cloud provider) are **not** + competing for resources.
+- Prefer **avoiding** running multiple node autoscalers if possible. + +## Troubleshooting + +### Unschedulable pods do not trigger scale-up + +#### Context + +As of 2025-01-06, at least with EKS, it easily happens that unschedulable pods that would normally trigger a scale-up +stay unschedulable and cause the _pod didn't trigger scale-up_ event instead. + +This primarily happens when the cluster's node groups are updated for any reason. + +#### Solution + +Restarting the Cluster Autoscaler's pods worked most of the time. + +It seems to be some sort of issue with cache. ## Further readings diff --git a/knowledge base/kubernetes/karpenter.md b/knowledge base/kubernetes/karpenter.md new file mode 100644 index 0000000..71f21f0 --- /dev/null +++ b/knowledge base/kubernetes/karpenter.md @@ -0,0 +1,99 @@ +# Karpenter + +Open-source, just-in-time cloud node provisioner for Kubernetes. + +1. [TL;DR](#tldr) +1. [Setup](#setup) +1. [Further readings](#further-readings) + 1. [Sources](#sources) + +## TL;DR + +Karpenter works by: + +1. Watching for unschedulable pods. +1. Evaluating unschedulable pods' scheduling constraints (resource requests, node selectors, affinities, tolerations, + and topology spread constraints). +1. Provisioning **cloud-based** nodes meeting the requirements of unschedulable pods. +1. Deleting nodes when no longer needed. + +Karpenter runs as workload on the cluster. + +Should one manually delete a Karpenter-provisioned node, Karpenter will gracefully cordon, drain, and shutdown the +corresponding instance.
+Under the hood, Karpenter adds a finalizer to the node object it provisions. This blocks deletion until all pods are +drained and the instance is terminated. This **only** works for nodes provisioned by Karpenter. + +
+ Setup + +```sh +# Managed NodeGroups +helm --namespace 'kube-system' upgrade --create-namespace \ + --install 'karpenter' 'oci://public.ecr.aws/karpenter/karpenter' --version '1.1.1' \ + --set 'settings.clusterName=myCluster' \ + --set 'settings.interruptionQueue=myCluster' \ + --set 'controller.resources.requests.cpu=1' \ + --set 'controller.resources.requests.memory=1Gi' \ + --set 'controller.resources.limits.cpu=1' \ + --set 'controller.resources.limits.memory=1Gi' \ + --wait + +# Fargate +# As per the managed NodeGroups, but with a serviceAccount annotation +helm … \ + --set 'serviceAccount.annotations."eks.amazonaws.com/role-arn"=arn:aws:iam::012345678901:role/myCluster-karpenter' +``` + +
+ + + + + +## Setup + +Karpenter's controller and webhook deployment are designed to run as a workload on the cluster. + +As of 2024-12-24, it only supports AWS and Azure nodes.
+As part of the installation process, one **will** need credentials from the underlying cloud provider to allow +Karpenter-managed nodes to be started up and added to the cluster as needed. + +## Further readings + +- [Website] +- [Codebase] +- [Documentation] + +### Sources + + + + + + + +[codebase]: https://github.com/aws/karpenter-provider-aws +[documentation]: https://karpenter.sh/docs/ +[website]: https://karpenter.sh/ + + diff --git a/knowledge base/kubernetes/karpenter.placeholder b/knowledge base/kubernetes/karpenter.placeholder deleted file mode 100644 index 7f0b2cf..0000000 --- a/knowledge base/kubernetes/karpenter.placeholder +++ /dev/null @@ -1 +0,0 @@ -https://karpenter.sh/