Files
oam/knowledge base/kubernetes
Michele Cereda 89c38e050e fix: links
2024-10-29 23:30:39 +01:00
..
2023-12-23 19:44:15 +01:00
2024-06-08 01:30:22 +02:00
2023-12-23 19:44:15 +01:00
2023-07-16 22:40:55 +02:00
2023-11-25 20:21:21 +01:00
2024-09-14 23:12:08 +02:00
2023-12-24 19:35:27 +01:00
2023-11-13 06:32:16 +01:00
2023-11-25 20:21:21 +01:00
2023-11-30 18:47:11 +01:00
2023-11-30 18:47:11 +01:00
2023-11-15 23:44:10 +01:00

Kubernetes

Open source container orchestration engine for containerized applications.
Hosted by the Cloud Native Computing Foundation.

  1. Concepts
    1. Control plane
      1. API server
      2. kube-scheduler
      3. kube-controller-manager
      4. cloud-controller-manager
    2. Worker nodes
      1. kubelet
      2. kube-proxy
      3. Container runtime
      4. Addons
    3. Workloads
      1. Pods
  2. Best practices
  3. Volumes
    1. hostPaths
    2. emptyDirs
    3. configMaps
    4. secrets
    5. nfs
    6. downwardAPI
    7. PersistentVolumes
      1. Resize PersistentVolumes
  4. Autoscaling
    1. Pod scaling
    2. Node scaling
  5. Quality of service
  6. Containers with high privileges
    1. Capabilities
    2. Privileged container vs privilege escalation
  7. Sysctl settings
  8. Backup and restore
  9. Managed Kubernetes Services
    1. Best practices in cloud environments
  10. Edge computing
  11. Troubleshooting
    1. Dedicate Nodes to specific workloads
    2. Recreate Pods upon ConfigMap's or Secret's content change
    3. Run a command in a Pod right after its initialization
    4. Run a command just before a Pod stops
  12. Examples
    1. Create an admission webhook
  13. Further readings
    1. Sources

Concepts

When using Kubernetes, one is using a cluster.

Kubernetes clusters consist of one or more hosts (nodes) executing containerized applications.
In cloud environments, nodes are also available in grouped sets (node pools) capable of automatic scaling.

Nodes host application workloads in the form of pods.

The control plane manages the nodes and the pods in the cluster. It is itself a set of pods which expose the APIs and interfaces used to define, deploy, and manage the lifecycle of the cluster's resources.
In higher environments, the control plane usually runs across multiple dedicated nodes to provide improved fault-tolerance and high availability.

Cluster components

Control plane

Makes global decisions about the cluster (like scheduling).
Detects and responds to cluster events (like starting up a new pod when a deployment has less replicas then it requests).

The control plane is composed by:

Control plane components run on one or more cluster nodes.
For ease of use, setup scripts typically start all control plane components on the same host and avoid running other workloads on it.

API server

Exposes the Kubernetes API. It is the front end for, and the core of, the Kubernetes control plane.
kube-apiserver is the main implementation of the Kubernetes API server, and is designed to scale horizontally (by deploying more instances) and balance traffic between its instances.

The API server exposes the HTTP API that lets end users, different parts of a cluster and external components communicate with one another, or query and manipulate the state of API objects in Kubernetes.
Can be accessed through command-line tools or directly using REST calls.
The serialized state of the objects is stored by writing them into etcd's store.

Suggested the use of one of the available client libraries if writing an application using the Kubernetes API.
The complete API details are documented using OpenAPI.

Kubernetes supports multiple API versions, each at a different API path (e.g.: /api/v1, /apis/rbac.authorization.k8s.io/v1alpha1).
All the different versions are representations of the same persisted data.
The server handles the conversion between API versions transparently.

Versioning is done at the API level, rather than at the resource or field level, to ensure the API presents a clear and consistent view of system resources and behavior.
Also enables controlling access to end-of-life and/or experimental APIs.

API groups can be enabled or disabled.
API resources are distinguished by their API group, resource type, namespace (for namespaced resources), and name.
New API resources and new resource fields can be added often and frequently.
Elimination of resources or fields requires following the API deprecation policy.

The Kubernetes API can be extended:

  • using custom resources to declaratively define how the API server should provide your chosen resource API, or
  • extending the Kubernetes API by implementing an aggregation layer.

kube-scheduler

Detects newly created pods with no assigned node, and selects one for them to run on.

Scheduling decisions take into account:

  • individual and collective resource requirements;
  • hardware/software/policy constraints;
  • affinity and anti-affinity specifications;
  • data locality;
  • inter-workload interference;
  • deadlines.

kube-controller-manager

Runs controller processes.
Each controller is a separate process logically speaking; they are all compiled into a single binary and run in a single process to reduce complexity.

Examples of these controllers are:

  • the node controller, which notices and responds when nodes go down;
  • the replication controller, which maintains the correct number of pods for every replication controller object in the system;
  • the job controller, which checks one-off tasks (job) objects and creates pods to run them to completion;
  • the EndpointSlice controller, which populates EndpointSlice objects providing a link between services and pods;
  • the ServiceAccount controller, which creates default ServiceAccounts for new namespaces.

cloud-controller-manager

Embeds cloud-specific control logic, linking clusters to one's cloud provider's API and separating the components that interact with that cloud platform from the components that only interact with clusters.

Clusters only run controllers that are specific to one's cloud provider.
If running Kubernetes on one's own premises, or in a learning environment inside one's own PC, the cluster will have no cloud controller managers.

As with the kube-controller-manager, cloud controller managers combine several logically independent control loops into single binaries run as single processes.
It can scale horizontally to improve performance or to help tolerate failures.

The following controllers can have cloud provider dependencies:

  • the node controller, which checks the cloud provider to determine if a node has been deleted in the cloud after it stops responding;
  • the route controller, which sets up routes in the underlying cloud infrastructure;
  • the service controller, which creates, updates and deletes cloud provider load balancers.

Worker nodes

Each and every node runs components providing a runtime environment for the cluster, and syncing with the control plane to maintain workloads running as requested.

kubelet

A kubelet runs as an agent on each and every node in the cluster, making sure that containers are run in a pod.

It takes a set of PodSpecs and ensures that the containers described in them are running and healthy.
It only manages containers created by Kubernetes.

kube-proxy

Network proxy running on each node and implementing part of the Kubernetes Service concept.

It maintains all the network rules on nodes which allow network communication to the Pods from network sessions inside or outside of one's cluster.

It uses the operating system's packet filtering layer, if there is one and it's available; if not, it just forwards the traffic itself.

Container runtime

The software responsible for running containers.

Kubernetes supports container runtimes like containerd, CRI-O, and any other implementation of the Kubernetes CRI (Container Runtime Interface).

Addons

Addons use Kubernetes resources (DaemonSet, Deployment, etc) to implement cluster features.
As such, namespaced resources for addons belong within the kube-system namespace.

See addons for an extended list of the available addons.

Workloads

Workloads consist of groups of containers (pods) and a specification for how to run them (manifest).
Configuration files are written in YAML (preferred) or JSON format and are composed of:

  • metadata,
  • resource specifications, with attributes specific to the kind of resource they are describing, and
  • status, automatically generated and edited by the control plane.

Pods

The smallest deployable unit of computing that one can create and manage in Kubernetes.
Pods contain one or more relatively tightly coupled application containers; they are always co-located (executed on the same host) and co-scheduled (executed together), and share context, storage and network resources, and a specification for how to run them.

Pods are (and should be) usually created trough other workload resources (like Deployments, StatefulSets, or Jobs) and not directly.
Such parent resources leverage and manage ReplicaSets, which in turn manage copies of the same pod. When deleted, all the resources they manage are deleted with them.

Gotchas:

  • If a Container specifies a memory or CPU limit but does not specify a memory or CPU request, Kubernetes automatically assigns it a resource request spec equal to the given limit.

Best practices

Also see configuration best practices and the production best practices checklist.

  • Prefer an updated version of Kubernetes.
    The upstream project maintains release branches for the most recent three minor releases.
    Kubernetes 1.19 and newer receive approximately 1 year of patch support. Kubernetes 1.18 and older received approximately 9 months of patch support.
  • Prefer stable versions of Kubernetes and multiple nodes for production clusters.
  • Prefer consistent versions of Kubernetes components throughout all nodes.
    Components support version skew up to a point, with specific tools placing additional restrictions.
  • Consider keeping separation of ownership and control and/or group related resources.
    Leverage Namespaces.
  • Consider organizing cluster and workload resources.
    Leverage Labels; see recommended Labels.
  • Avoid sending traffic to pods which are not ready to manage it.
    Readiness probes signal services to not forward requests until the probe verifies its own pod is up.
    Liveness probes ping the pod for a response and check its health; if the check fails, they kill the current pod and launch a new one.
  • Avoid workloads and nodes fail due limited resources being available.
    Set resource requests and limits to reserve a minimum amount of resources for pods and limit their hogging abilities.
  • Prefer smaller container images.
  • Prioritize critical workloads.
    Leverage quality of service.
  • Instrument applications to detect and respond to the SIGTERM signal.
  • Avoid using bare pods.
    Prefer defining them as part of a replica-based resource, like Deployments, StatefulSets, ReplicaSets or DaemonSets.
  • Leverage autoscalers.
  • Try to avoid workload disruption.
    Leverage pod disruption budgets.
  • Try to use all available nodes.
    Leverage affinities, taint and tolerations.
  • Push for automation.
    GitOps.
  • Apply the principle of least privilege.
    Reduce container privileges where possible.
    Leverage Role-based access control (RBAC).
  • Restrict traffic between objects in the cluster.
    See network policies.
  • Continuously audit events and logs regularly, also for control plane components.
  • Keep an eye on connection tables.
    Specially valid when using connection tracking.
  • Protect the cluster's ingress points.
    Firewalls, web application firewalls, application gateways.

Volumes

Refer volumes.

Sources to mount directories from.

They go by the volumes key in Pods' spec.
E.g., in a Deployment they are declared in its spec.template.spec.volumes:

apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      volumes:
        - <volume source 1>
        - <volume source N>

Mount volumes in containers by using the volumesMount:

apiVersion: apps/v1
kind: Pod
spec:
  containers:
    - name: some-container
      volumeMounts:
        - name: my-volume-source
          mountPath: /path/to/mount
          readOnly: false
          subPath: dir/in/volume

hostPaths

Mount files or directories from the host node's filesystem into Pods.

Not something most Pods will need, but powerful escape hatches for some applications.

Use cases:

  • Containers needing access to node-level system components
    E.g., containers transferring system logs to a central location and needing access to those logs using a read-only mount of /var/log.
  • Making configuration files stored on the host system available read-only to static Pods. This because static Pods cannot access ConfigMaps.

If mounted files or directories on the host are only accessible to root:

  • Either the process needs to run as root in a privileged container,
  • Or the files' permissions on the host need to be changed to allow the process to read from (or write to) the volume.
apiVersion: apps/v1
kind: Pod
volumes:
  - name: example-volume
    # Mount '/data/foo' only if that directory already exists
    hostPath:
      path: /data/foo  # location on host
      type: Directory  # optional

emptyDirs

Scrape disks for temporary Pod data.

Not shared between Pods.
All data is destroyed once the Pod is removed, but stays intact when Pods restart.

Use cases:

  • Provide directories to create pid/lock or other special files for 3rd-party software when it's inconvenient or impossible to disable them.
    E.g., Java Hazelcast creates lockfiles in the user's home directory and there's no way to disable this behaviour.
  • Store intermediate calculations which can be lost
    E.g., external sorting, buffering of big responses to save memory.
  • Improve startup time after application crashes if the application in question pre-computes something before or during startup.
    E.g., compressed assets in the application's image, decompressing data into temporary directory.
apiVersion: apps/v1
kind: Pod
volumes:
  - name: my-empty-dir
    emptyDir:
      # Omit the 'medium' field to use disk storage.
      # The 'Memory' medium will create tmpfs to store data.
      medium: Memory
      sizeLimit: 1Gi

configMaps

Inject configuration data into Pods.

When referencing a ConfigMap:

  • Provide the name of the ConfigMap in the volume.
  • Optionally customize the path to use for a specific entry in the ConfigMap.
apiVersion: apps/v1
kind: Pod
spec:
  containers:
    - name: test
      volumeMounts:
        - name: config-vol
          mountPath: /etc/config
  volumes:
    - name: config-vol
      configMap:
        name: log-config
        items:
          - key: log_level
            path: log_level
    - name: my-configmap-volume
      configMap:
        name: my-configmap
        defaultMode: 0644  # posix access mode, set it to the most restricted value
        optional: true     # allow pods to start with this configmap missing, resulting in an empty directory

ConfigMaps must be created before they can be mounted.

One ConfigMap can be mounted into any number of Pods.

ConfigMaps are always mounted readOnly.

Containers using ConfigMaps as subPath volume mounts will not receive ConfigMap updates.

Text data is exposed as files using the UTF-8 character encoding.
Use binaryData For any other character encoding.

secrets

Used to pass sensitive information to Pods.
E.g., passwords.

They behave like ConfigMaps but are backed by tmpfs, so they are never written to non-volatile storage.

Secrets must be created before they can be mounted.

Secrets are always mounted readOnly.

Containers using Secrets as subPath volume mounts will not receive Secret updates.

apiVersion: apps/v1
kind: Pod
spec:
  volumes:
    - name: my-secret-volume
      secret:
        secretName: my-secret
        defaultMode: 0644
        optional: false

nfs

mount existing NFS shares into Pods.

The contents of NFS volumes are preserved after Pods are removed and the volume is merely unmounted.
This means that NFS volumes can be pre-populated with data, and that data can be shared between Pods.

NFS can be mounted by multiple writers simultaneously.

One cannot specify NFS mount options in a Pod spec.
Either set mount options server-side or use /etc/nfsmount.conf.
Alternatively, mount NFS volumes via PersistentVolumes as they do allow to set mount options.

apiVersion: v1
kind: Pod
spec:
  containers:
    - image: registry.k8s.io/test-web-server
      name: test-container
      volumeMounts:
      - mountPath: /my-nfs-data
        name: test-volume
  volumes:
    - name: test-volume
      nfs:
        server: my-nfs-server.example.com
        path: /my-nfs-volume
        readOnly: true

downwardAPI

Downward APIs expose Pods' and containers' resource declaration or status field values.
Refer Expose Pod information to Containers through files.

Downward API volumes make downward API data available to applications as read-only files in plain text format.

Containers using the downward API as subPath volume mounts will not receive updates when field values change.

apiVersion: v1
kind: Pod
metadata:
  labels:
    cluster: test-cluster1
    rack: rack-22
    zone: us-east-coast
spec:
  volumes:
    - name: my-downward-api-volume
      downwardAPI:
        defaultMode: 0644
        items:
        - path: labels
          fieldRef:
            fieldPath: metadata.labels

# Mounting this volume results in a file with contents similar to the following:
# ```plaintext
# cluster="test-cluster1"
# rack="rack-22"
# zone="us-east-coast"
# ```

PersistentVolumes

Resize PersistentVolumes

  1. Check the StorageClass is set with allowVolumeExpansion: true:

    kubectl get storageClass 'storage-class-name' -o jsonpath='{.allowVolumeExpansion}'
    
  2. Edit the PersistentVolumeClaim's spec.resources.requests.storage field.
    This will take care of the underlying PersistentVolume's size automagically.

    kubectl edit persistentVolumeClaim 'my-pvc'
    
  3. Verify the change by checking the PVC's status.capacity field:

    kubectl get pvc 'my-pvc' -o jsonpath='{.status}'
    

    Should one see the message

    Waiting for user to (re-)start a pod to finish file system resize of volume on node

    under the status.conditions field, just wait some time.
    It should not be necessary to restart the Pods, and the capacity should change soon to the requested one.

Gotchas:

  • It's possible to recreate StatefulSets without the need of killing the Pods it controls.
    Reapply the STS' declaration with a new PersistentVolume size, and start new pods to resize the underlying filesystem.

    If deploying the STS via Helm
    1. Change the size of the PersistentVolumeClaims used by the STS:

      kubectl edit persistentVolumeClaims 'my-pvc'
      
    2. Delete the STS without killing its pods:

      kubectl delete statefulSets.apps 'my-sts' --cascade 'orphan'
      
    3. Redeploy the STS with the changed size. It will retake ownership of existing Pods.

    4. Delete the STS' pods one-by-one.
      During Pod restart, the Kubelet will resize the filesystem to match new block device size.

      kubectl delete pod 'my-sts-pod'
      
    If managing the STS manually
    1. Change the size of the PersistentVolumeClaims used by the STS:

      kubectl edit persistentVolumeClaims 'my-pvc'
      
    2. Note down the names of PVs for specific PVCs and their sizes:

      kubectl get persistentVolume 'my-pv'
      
    3. Dump the STS to disk:

      kubectl get sts 'my-sts' -o yaml > 'my-sts.yaml'
      
    4. Remove any extra field (like metadata.{selfLink,resourceVersion,creationTimestamp,generation,uid} and status) and set the template's PVC size to the value you want.

    5. Delete the STS without killing its pods:

      kubectl delete sts 'my-sts' --cascade 'orphan'
      
    6. Reapply the STS.
      It will retake ownership of existing Pods.

      kubectl apply -f 'my-sts.yaml'
      
    7. Delete the STS' pods one-by-one.
      During Pod restart, the Kubelet will resize the filesystem to match new block device size.

      kubectl delete pod 'my-sts-pod'
      

Autoscaling

Controllers are available to scale Pods or Nodes automatically, both in number or size.

Automatic scaling of Pods is done in number by the Horizontal Pod Autoscaler, and in size by the Vertical Pod Autoscaler.
Automatic scaling of Nodes is done in number by the Cluster Autoscaler, and in size by add-ons like Karpenter.

Be aware of mix-and-matching autoscalers for the same kind of resource.
One can easily defy the work done by the other and make that resource behave unexpectedly.

K8S only comes with the Horizontal Pod Autoscaler by default.
Managed K8S usually also comes with the Cluster Autoscaler if autoscaling is enabled on the cluster resource.

The Horizontal and Vertical Pod Autoscalers require to access metrics.
This requires the metrics server addon to be installed and accessible.

Pod scaling

Autoscaling of Pods by number requires the use of the Horizontal Pod Autoscaler.
Autoscaling of Pods by size requires the use of the Vertical Pod Autoscaler.

Node scaling

Autoscaling of Nodes by number requires the Cluster Autoscaler.

  1. The Cluster Autoscaler routinely checks for pending Pods.
  2. Pods fill up the available Nodes.
  3. When Pods start to fail for lack of available resources, Nodes are added to the cluster.
  4. When Pods are not failing due to lack of available resources and one or more Nodes are underused, the Autoscaler tries to fit the existing Pods in less Nodes.
  5. If one or more Nodes can result unused from the previous step (DaemonSets are usually not taken into consideration), the Autoscaler will terminate them.

Autoscaling of Nodes by size requires add-ons like Karpenter.

Quality of service

See Configure Quality of Service for Pods for more information.

QoS classes are used to make decisions about scheduling and evicting Pods.
When a Pod is created, it is also assigned one of the following QoS classes:

  • Guaranteed, when every Container in the Pod, including init containers, has:

    • a memory limit and a memory request, and they are the same
    • a CPU limit and a CPU request, and they are the same
    spec:
      containers:
        
        resources:
          limits:
            cpu: 700m
            memory: 200Mi
          requests:
            cpu: 700m
            memory: 200Mi
        
    status:
      qosClass: Guaranteed
    
  • Burstable, when

    • the Pod does not meet the criteria for the Guaranteed QoS class
    • at least one Container in the Pod has a memory or CPU request spec
    spec:
      containers:
      - name: qos-demo
        
        resources:
          limits:
            memory: 200Mi
          requests:
            memory: 100Mi
      
    status:
      qosClass: Burstable
    
  • BestEffort, when the Pod does not meet the criteria for the other QoS classes (its Containers have no memory or CPU limits nor requests)

    spec:
      containers:
        
        resources: {}
      
    status:
      qosClass: BestEffort
    

Containers with high privileges

Kubernetes introduced a Security Context as a mitigation solution to some workloads requiring to change one or more Node settings for performance, stability, or other issues (e.g. ElasticSearch).
This is usually achieved executing the needed command from an InitContainer with higher privileges than normal, which will have access to the Node's resources and breaks the isolation Containers are usually famous for. If compromised, an attacker can use this highly privileged container to gain access to the underlying Node.

From the design proposal:

A security context is a set of constraints that are applied to a Container in order to achieve the following goals (from the Security design):

  • ensure a clear isolation between the Container and the underlying host it runs on;
  • limit the ability of the Container to negatively impact the infrastructure or other Containers.

[The main idea is that] Containers should only be granted the access they need to perform their work. The Security Context takes advantage of containerization features such as the ability to add or remove capabilities to give a process some privileges, but not all the privileges of the root user.

Capabilities

Adding capabilities to a Container is not making it privileged, nor allowing privilege escalation. It is just giving the Container the ability to write to specific files or devices depending on the given capability.

This means having a capability assigned does not automatically make the Container able to wreak havoc on a Node, and this practice can be a legitimate use of this feature instead.

From the feature's man page:

Linux divides the privileges traditionally associated with superuser into distinct units, known as capabilities, which can be independently enabled and disabled. Capabilities are a per-thread attribute.

This also means a Container will be limited to its contents, plus the capabilities it has been assigned.

Some capabilities are assigned to all Containers by default, while others (the ones which could cause more issues) require to be explicitly set using the Containers' securityContext.capabilities.add property.
If a Container is privileged (see Privileged container vs privilege escalation), it will have access to all the capabilities, with no regards of what are explicitly assigned to it.

Check:

Privileged container vs privilege escalation

A privileged container is very different from a container leveraging privilege escalation.

A privileged container does whatever a processes running directly on the Node can.
It will have automatically assigned all capabilities, and being root in this container is effectively being root on the Node it is running on.

For a Container to be privileged, its definition requires the securityContext.privileged property set to true.

Privilege escalation allows a process inside the Container to gain more privileges than its parent process.
The process will be able to assume root-like powers, but will have access only to the assigned capabilities and generally have limited to no access to the Node like any other Container.

For a Container to leverage privilege escalation, its definition requires the securityContext.allowPrivilegeEscalation property:

  • to either be set to true, or
  • to not be set at all if:
    • the Container is already privileged, or
    • the Container has SYS_ADMIN capabilities.

This property directly controls whether the no_new_privs flag gets set on the Container's process.

From the design document for no_new_privs:

In Linux, the execve system call can grant more privileges to a newly-created process than its parent process. Considering security issues, since Linux kernel v3.5, there is a new flag named no_new_privs added to prevent those new privileges from being granted to the processes.

no_new_privs is inherited across fork, clone and execve and can not be unset. With no_new_privs set, execve promises not to grant the privilege to do anything that could not have been done without the execve call.

For more details about no_new_privs, please check the Linux kernel documentation.

[…]

To recap, below is a table defining the default behavior at the pod security policy level and what can be set as a default with a pod security policy:

allowPrivilegeEscalation setting uid = 0 or unset uid != 0 privileged/CAP_SYS_ADMIN
nil no_new_privs=true no_new_privs=false no_new_privs=false
false no_new_privs=true no_new_privs=true no_new_privs=false
true no_new_privs=false no_new_privs=false no_new_privs=false

Sysctl settings

See Using sysctls in a Kubernetes Cluster.

Backup and restore

See velero.

Managed Kubernetes Services

Most cloud providers offer their managed versions of Kubernetes. Check their websites:

Best practices in cloud environments

All kubernetes clusters should:

  • be created using IaC (terraform, pulumi);
  • have different node pools dedicated to different workloads;
  • have at least one node pool composed by non-preemptible dedicated to critical services like Admission Controller Webhooks.

Each node pool should:

  • have a meaningful name (like <prefix…>-<workload_type>-<random_id>) to make it easy to recognize the workloads running on it or the features of the nodes in it;
  • have a minimum set of meaningful labels, like:
    • cloud provider information;
    • node information and capabilities;
  • sparse nodes on multiple availability zones.

Edge computing

If planning to run Kubernetes on a Raspberry Pi, see k3s and the Build your very own self-hosting platform with Raspberry Pi and Kubernetes series of articles.

Troubleshooting

Dedicate Nodes to specific workloads

Leverage taints and node affinity:

  1. Taint the Nodes:

    $ kubectl taint nodes 'host1' 'dedicated=devs:NoSchedule'
    node "host1" tainted
    
  2. Add Labels to the nodes:

    $ kubectl label nodes 'host1' 'dedicated=devs'
    node "host1" labeled
    
  3. add tolerations and node affinity to any Pod's spec:

    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: dedicated
                operator: In
                values:
                - devs
      tolerations:
      - key: "dedicated"
        operator: "Equal"
        value: "devs"
        effect: "NoSchedule"
    

Recreate Pods upon ConfigMap's or Secret's content change

Use a checksum annotation to do the trick:

apiVersion: apps/v1
kind: Deployment
spec:
  template:
    metadata:
      annotations:
        checksum/configmap: {{ include (print $.Template.BasePath "/configmap.yaml") $ | sha256sum }}
        checksum/secret: {{ include (print $.Template.BasePath "/secret.yaml") $ | sha256sum }}
        {{- if .podAnnotations }}
          {{- toYaml .podAnnotations | trim | nindent 8 }}
        {{- end }}

Run a command in a Pod right after its initialization

Use a container's lifecycle.postStart.exec.command spec:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-deployment
spec:
  template:
    
    spec:
      containers:
        - name: my-container
          
          lifecycle:
            postStart:
              exec:
                command: ["/bin/sh", "-c", "echo 'heeeeeeey yaaaaaa!'"]

Run a command just before a Pod stops

Leverage the preStop hook instead of postStart.

Hooks are not passed parameters, and this includes environment variables Use a script if you need them. See container hooks and preStop hook doesn't work with env variables

Since kubernetes version 1.9 and forth, volumeMounts behavior on secret, configMap, downwardAPI and projected have changed to Read-Only by default. A workaround to the problem is to create an emptyDir Volume and copy the contents into it and execute/write whatever you need:

  initContainers:
    - name: copy-ro-scripts
      image: busybox
      command: ['sh', '-c', 'cp /scripts/* /etc/pre-install/']
      volumeMounts:
        - name: scripts
          mountPath: /scripts
        - name: pre-install
          mountPath: /etc/pre-install
  volumes:
    - name: pre-install
      emptyDir: {}
    - name: scripts
      configMap:
        name: bla

Examples

Create an admission webhook

See the example's README.

Further readings

Usage:

Concepts:

Distributions:

Tools:

Applications:

Others:

Sources