mirror of
https://gitea.com/mcereda/oam.git
synced 2026-02-09 13:44:24 +00:00
chore(kb/eks): improve notes about autoscaling
This commit is contained in:
@@ -4,6 +4,8 @@ Automatically adjusts the number of nodes in Kubernetes clusters to meet their c
|
||||
|
||||
1. [TL;DR](#tldr)
|
||||
1. [Best practices](#best-practices)
|
||||
1. [Troubleshooting](#troubleshooting)
|
||||
1. [Unschedulable pods do not trigger scale-up](#unschedulable-pods-do-not-trigger-scale-up)
|
||||
1. [Further readings](#further-readings)
|
||||
1. [Sources](#sources)
|
||||
|
||||
@@ -62,11 +64,31 @@ aws eks --region 'eu-west-1' update-kubeconfig --name 'custom-eks-cluster' \
|
||||
## Best practices
|
||||
|
||||
- Do **not** modify nodes belonging to autoscaled node groups directly.
|
||||
Changes will be soon lost as the modified nodes might be deleted at any time.
|
||||
- All nodes within the same autoscaled node group should have the same capacity, labels and system pods running on them.
|
||||
- Specify requests for all the pods one can.
|
||||
- Specify resource requests for all the pods one can, so that nodes can be scaled more reliably.
|
||||
- Should one need to prevent pods from being deleted too abruptly, consider using PodDisruptionBudgets.
|
||||
- Check one's cloud provider's quota is big enough **before** specifying min/max settings for clusters' node pools.
|
||||
- Do **not** run **any** additional node group autoscaler (**especially** those from one's own cloud provider).
|
||||
- Check one's cloud provider's VM quota is big enough **before** specifying min/max settings for clusters' node pools.
|
||||
- Ensure **any** additional node group autoscaler (**especially** those from one's own cloud provider) are **not**
|
||||
competing for resources.<br/>
|
||||
- Prefer **avoiding** running multiple node autoscalers if possible.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Unschedulable pods do not trigger scale-up
|
||||
|
||||
#### Context <!-- omit in toc -->
|
||||
|
||||
As of 2025-01-06, at least with EKS, it easily happens that unschedulable pods that would normally trigger a scale-up
|
||||
stay unschedulable and cause the _pod didn't trigger scale-up_ event instead.
|
||||
|
||||
This primarily happens when the cluster's node groups are updated for any reason.
|
||||
|
||||
#### Solution <!-- omit in toc -->
|
||||
|
||||
Restarting the Cluster Autoscaler's pods worked most of the time.
|
||||
|
||||
It seems to be some sort of issue with cache.
|
||||
|
||||
## Further readings
|
||||
|
||||
|
||||
99
knowledge base/kubernetes/karpenter.md
Normal file
99
knowledge base/kubernetes/karpenter.md
Normal file
@@ -0,0 +1,99 @@
|
||||
# Karpenter
|
||||
|
||||
Open-source, just-in-time cloud node provisioner for Kubernetes.
|
||||
|
||||
1. [TL;DR](#tldr)
|
||||
1. [Setup](#setup)
|
||||
1. [Further readings](#further-readings)
|
||||
1. [Sources](#sources)
|
||||
|
||||
## TL;DR
|
||||
|
||||
Karpenter works by:
|
||||
|
||||
1. Watching for unschedulable pods.
|
||||
1. Evaluating unschedulable pods' scheduling constraints (resource requests, node selectors, affinities, tolerations,
|
||||
and topology spread constraints).
|
||||
1. Provisioning **cloud-based** nodes meeting the requirements of unschedulable pods.
|
||||
1. Deleting nodes when no longer needed.
|
||||
|
||||
Karpenter runs as workload on the cluster.
|
||||
|
||||
Should one manually delete a Karpenter-provisioned node, Karpenter will gracefully cordon, drain, and shutdown the
|
||||
corresponding instance.<br/>
|
||||
Under the hood, Karpenter adds a finalizer to the node object it provisions. This blocks deletion until all pods are
|
||||
drained and the instance is terminated. This **only** works for nodes provisioned by Karpenter.
|
||||
|
||||
<details>
|
||||
<summary>Setup</summary>
|
||||
|
||||
```sh
|
||||
# Managed NodeGroups
|
||||
helm --namespace 'kube-system' upgrade --create-namespace \
|
||||
--install 'karpenter' 'oci://public.ecr.aws/karpenter/karpenter' --version '1.1.1' \
|
||||
--set 'settings.clusterName=myCluster' \
|
||||
--set 'settings.interruptionQueue=myCluster' \
|
||||
--set 'controller.resources.requests.cpu=1' \
|
||||
--set 'controller.resources.requests.memory=1Gi' \
|
||||
--set 'controller.resources.limits.cpu=1' \
|
||||
--set 'controller.resources.limits.memory=1Gi' \
|
||||
--wait
|
||||
|
||||
# Fargate
|
||||
# As per the managed NodeGroups, but with a serviceAccount annotation
|
||||
helm … \
|
||||
--set 'serviceAccount.annotations."eks.amazonaws.com/role-arn"=arn:aws:iam::012345678901:role/myCluster-karpenter'
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<!-- Uncomment if used
|
||||
<details>
|
||||
<summary>Usage</summary>
|
||||
|
||||
```sh
|
||||
```
|
||||
|
||||
</details>
|
||||
-->
|
||||
|
||||
<!-- Uncomment if used
|
||||
<details>
|
||||
<summary>Real world use cases</summary>
|
||||
|
||||
```sh
|
||||
```
|
||||
|
||||
</details>
|
||||
-->
|
||||
|
||||
## Setup
|
||||
|
||||
Karpenter's controller and webhook deployment are designed to run as a workload on the cluster.
|
||||
|
||||
As of 2024-12-24, it only supports AWS and Azure nodes.<br/>
|
||||
As part of the installation process, one **will** need credentials from the underlying cloud provider to allow
|
||||
Karpenter-managed nodes to be started up and added to the cluster as needed.
|
||||
|
||||
## Further readings
|
||||
|
||||
- [Website]
|
||||
- [Codebase]
|
||||
- [Documentation]
|
||||
|
||||
### Sources
|
||||
|
||||
<!--
|
||||
Reference
|
||||
═╬═Time══
|
||||
-->
|
||||
|
||||
<!-- In-article sections -->
|
||||
<!-- Knowledge base -->
|
||||
<!-- Files -->
|
||||
<!-- Upstream -->
|
||||
[codebase]: https://github.com/aws/karpenter-provider-aws
|
||||
[documentation]: https://karpenter.sh/docs/
|
||||
[website]: https://karpenter.sh/
|
||||
|
||||
<!-- Others -->
|
||||
@@ -1 +0,0 @@
|
||||
https://karpenter.sh/
|
||||
Reference in New Issue
Block a user