Files
oam/knowledge base/cloud computing/aws/eks.md
2024-04-12 20:21:39 +02:00

28 KiB

Elastic Kubernetes Service

  1. TL;DR
  2. Requirements
  3. Creation procedure
  4. Create worker nodes
    1. Create managed node groups
    2. Schedule pods on Fargate
  5. Access management
  6. Secrets encryption through KMS
  7. Storage
  8. Troubleshooting
    1. Identify common issues
    2. The worker nodes fail to join the cluster
  9. Further readings
    1. Sources

TL;DR

When one creates a cluster, one really only creates the cluster's control plane and the dedicated nodes underneath it.
Worker nodes can consist of any combination of self-managed nodes, managed node groups and Fargate, and depend on the control plane.

EKS automatically installs some self-managed add-ons like the AWS VPC CNI plugin, kube-proxy and CoreDNS.
Disable them in the cluster's definition.

Upon cluster creation, EKS automatically creates a security group and applies it to both the control plane and nodes.
Such security group cannot be avoided nor customized in the cluster's definition (e.g. using IaC tools like Pulumi or Terraform):

error: aws:eks/cluster:Cluster resource 'cluster' has a problem: Value for unconfigurable attribute. Can't configure a value for "vpc_config.0.cluster_security_group_id": its value will be decided automatically based on the result of applying this configuration.

For some reason, giving resources a tag like aks:eks:cluster-name=value succeeds, but has no effect (it is not really applied).

By default, the IAM principal creating the cluster is the only one able to make calls to the cluster's API server.
To let other IAM principals have access to the cluster, one needs to add them to it. See access management.

Usage
# Create clusters.
aws eks create-cluster \
  --name 'DeepThought' \
  --role-arn 'arn:aws:iam::000011112222:role/aws-service-role/eks.amazonaws.com/AWSServiceRoleForAmazonEKS' \
  --resources-vpc-config 'subnetIds=subnet-11112222333344445,subnet-66667777888899990'
aws eks create-cluster … --access-config 'authenticationMode=API'

# Check cluster's authentication mode.
aws eks describe-cluster --name 'DeepThought' --query 'cluster.accessConfig.authenticationMode' --output 'text'

# Change encryption configuration.
aws eks associate-encryption-config \
  --cluster-name 'DeepThought' \
  --encryption-config '[{
    "provider": { "keyArn": "arn:aws:kms:eu-west-1:000011112222:key/33334444-5555-6666-7777-88889999aaaa" },
    "resources": [ "secrets" ]
  }]'


# Create access entries to use IAM for authentication.
aws eks create-access-entry --cluster-name 'DeepThought' \
  --principal-arn 'arn:aws:iam::000011112222:role/Admin'
aws eks create-access-entry … --principal-arn 'arn:aws:iam::000011112222:user/bob'

# List available access policies.
aws eks list-access-policies

# Associate policies to access entries.
aws eks associate-access-policy --cluster-name 'DeepThought' \
  --principal-arn 'arn:aws:iam::000011112222:role/Admin' \
  --policy-arn 'arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy' \
  --access-scope '[ "type": "cluster" ]'

# Connect to clusters.
aws eks update-kubeconfig --name 'DeepThought' && kubectl cluster-info


# Create EC2 node groups.
aws eks create-nodegroup \
  --cluster-name 'DeepThought' \
  --nodegroup-name 'alpha' \
  --scaling-config 'minSize=1,maxSize=3,desiredSize=1' \
  --node-role-arn 'arn:aws:iam::000011112222:role/DeepThinkerNodeRole' \
  --subnets 'subnet-11112222333344445' 'subnet-66667777888899990'

# Create Fargate profiles.
aws eks create-fargate-profile \
  --cluster-name 'DeepThought' \
  --fargate-profile-name 'alpha' \
  --pod-execution-role-arn 'arn:aws:iam::000011112222:role/DeepThinkerFargate' \
  --subnets 'subnet-11112222333344445' 'subnet-66667777888899990' \
  --selectors 'namespace=string'

Requirements

  • [suggestion] 1 (one) custom Cluster Service Role with the AmazonEKSClusterPolicy IAM policy attached or similar custom permissions.

    Kubernetes clusters managed by EKS make calls to other AWS services on the user's behalf to manage the resources that the cluster uses.
    For a cluster to be allowed to make those calls, it requires to have the aforementioned permissions.

    To create clusters which would not require access to any other AWS resource, one can assign the cluster the AWSServiceRoleForAmazonEKS service-linked role directly 1, 2.

    Amazon EKS uses the service-linked role named AWSServiceRoleForAmazonEKS - The role allows Amazon EKS to manage clusters in your account. The attached policies allow the role to manage the following resources: network interfaces, security groups, logs, and VPCs.


    Prior to October 3, 2023, AmazonEKSClusterPolicy was required on the IAM role for each cluster.

    Prior to April 16, 2020, AmazonEKSServicePolicy was also required and the suggested name was eksServiceRole. With the AWSServiceRoleForAmazonEKS service-linked role, that policy is no longer required for clusters created on or after April 16, 2020.

    Pro tip

    Should one want to use more advanced features like encryption with managed keys, the role will need access to the referenced resources.
    In this case it would probably be better to create a custom role instead of assigning permissions to the built-in one.

  • [suggestion] 1+ (one or more) custom service role(s) for the pod executors, with the required policies attached or similar permissions.

    The reasons and required permissions vary depending on the type of executor.
    It would probably be better to create a custom role instead of assigning permissions to the built-in one.

    See the corresponding section under Create worker nodes.

  • 1+ (one or more) executor(s) for pods.
    See the Create worker nodes section.

  • [if using APIs for authentication] 1+ (one or more) access entry (/entries) with an EKS access policy assigned.

  • Private clusters have more special requirements of their own.

Creation procedure

The Internet is full of guides and abstractions which do not work, are confusing, or rely on other code.
Some create Cloudformation stacks in the process. Follow the Getting started guide to avoid issues.

This is what worked:

  1. Create a VPC, if one does not have them already, with public and private subnets that meet EKS' requirements.

    Example in Cloudformation

  2. Create a custom IAM role for the cluster if needed (see Requirements).

  3. Attach the required policies to the role used in the cluster.

    Example in CLI
    {
        "Version": "2012-10-17",
        "Statement": [{
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Principal": {
                "Service": "eks.amazonaws.com"
            }
        }]
    }
    
    aws iam create-role \
      --role-name 'DeepThinker' \
      --assume-role-policy-document 'file://eks-cluster-role-trust-policy.json'
    aws iam attach-role-policy \
      --role-name 'DeepThinker' \
      --policy-arn 'arn:aws:iam::aws:policy/AmazonEKSClusterPolicy'
    
    Example in Pulumi
    const cluster_assumeRole_policy = JSON.stringify({
        Version: "2012-10-17",
        Statement: [{
            Effect: "Allow",
            Action: "sts:AssumeRole",
            Principal: {
                Service: "eks.amazonaws.com",
            },
        }],
    });
    
    const cluster_service_role = new aws.iam.Role("cluster-service-role", {
        assumeRolePolicy: cluster_assumeRole_policy,
        managedPolicyArns: [
            // alternatively, use RolePolicyAttachments
            "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy",
        ],
        name: "DeepThinker",
        
    });
    
  4. Create the cluster.
    Make sure you give it the correct cluster service role.

    Example in CLI
    aws eks create-cluster \
      --name 'DeepThought' \
      --role-arn 'arn:aws:iam::000011112222:role/aws-service-role/eks.amazonaws.com/AWSServiceRoleForAmazonEKS' \
      --resources-vpc-config 'subnetIds=subnet-11112222333344445,subnet-66667777888899990'
    
    Example in Pulumi
    const cluster = new aws.eks.Cluster("cluster", {
        name: "DeepThought",
        roleArn: cluster_service_role.arn,
        vpcConfig: {
            subnetIds: [
                "subnet-11112222333344445",
                "subnet-66667777888899990",
            ],
        },
        
    });
    
  5. Give access to users.

  6. Connect to the cluster.

    $ aws eks update-kubeconfig --name 'DeepThought'
    Added new context arn:aws:eks:eu-east-1:000011112222:cluster/DeepThought to /home/itsAme/.kube/config
    
    $ kubectl cluster-info
    Kubernetes control plane is running at https://FB32A9C4A3D6BBC82695B1936BF4AAA3.gr7.eu-east-1.eks.amazonaws.com
    CoreDNS is running at https://FB32A9C4A3D6BBC82695B1936BF4AAA3.gr7.eu-east-1.eks.amazonaws.com/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
    
  7. Create some worker nodes.

  8. Profit!

Create worker nodes

See step 3 of the getting started guide.

Create managed node groups

See Choosing an Amazon EC2 instance type and Managed node groups for more information.

Additional requirements:

  • [suggestion] 1 (one) custom Node Service Role with the AmazonEKSWorkerNodePolicy, AmazonEC2ContainerRegistryReadOnly and AmazonEKS_CNI_Policy policies attached or similar permissions.

    The EKS nodes' kubelet makes calls to the AWS APIs on one's behalf.
    Nodes receive permissions for these API calls through an IAM instance profile and associated policies.

    For a node to be allowed to make those calls, it requires to have the aforementioned permissions.

  • When deploying a managed node group in private subnets, one must ensure that it can access Amazon ECR for pulling container images.
    Do this by connecting a NAT gateway to the route table of the subnet, or by adding the following AWS PrivateLink VPC endpoints:

    • Amazon ECR API endpoint interface: com.amazonaws.{region}.ecr.api.
    • Amazon ECR Docker registry API endpoint interface: com.amazonaws.{region}.ecr.dkr.
    • Amazon S3 gateway endpoint: com.amazonaws.{region}.s3.
  • If the nodes are to be created in private subnets, the cluster must provide its private API server endpoint.
    Set the cluster's vpc_config.0.endpoint_private_access attribute to true.

Procedure:

  1. Create a custom IAM role for the nodes if needed (see Requirements).

  2. Attach the required policies to the role used by the nodes.

    Example in CLI
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": "sts:AssumeRole",
                "Principal": {
                    "Service": "ec2.amazonaws.com"
                }
            }
        ]
    }
    
    aws iam create-role \
      --role-name 'DeepThinkerNode' \
      --assume-role-policy-document 'file://eks-node-role-trust-policy.json'
    aws iam attach-role-policy \
      --policy-arn 'arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy' \
      --role-name 'DeepThinkerNode'
    aws iam attach-role-policy \
      --policy-arn 'arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly' \
      --role-name 'DeepThinkerNode'
    aws iam attach-role-policy \
      --policy-arn 'arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy' \
      --role-name 'DeepThinkerNode'
    
    Example in Pulumi
    const nodes_assumeRole_policy = JSON.stringify({
        Version: "2012-10-17",
        Statement: [{
            Effect: "Allow",
            Action: "sts:AssumeRole",
            Principal: {
                Service: "ec2.amazonaws.com",
            },
        }],
    });
    
    const node_service_role = new aws.iam.Role("node-service-role", {
        assumeRolePolicy: nodes_assumeRole_policy,
        managedPolicyArns: [
            // alternatively, use RolePolicyAttachments
            "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly",
            "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy",
            "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy",
        ],
        name: "DeepThinkerNode",
        
    });
    
  3. Create the desired node groups.

    Example in CLI
    aws eks create-nodegroup \
      --cluster-name 'DeepThought' \
      --nodegroup-name 'alpha' \
      --scaling-config 'minSize=1,maxSize=3,desiredSize=1' \
      --node-role-arn 'arn:aws:iam::000011112222:role/DeepThinkerNode' \
      --subnets 'subnet-11112222333344445' 'subnet-66667777888899990'
    
    Example in Pulumi
    const nodeGroup_alpha = new aws.eks.NodeGroup("nodeGroup-alpha", {
        nodeGroupName: "nodeGroup-alpha",
        clusterName: cluster.name,
        nodeRoleArn: node_service_role.arn,
        scalingConfig: {
            minSize: 1,
            maxSize: 3,
            desiredSize: 1,
        },
        subnetIds: cluster.vpcConfig.subnetIds,
        
    });
    

Schedule pods on Fargate

Additional requirements:

  • [suggestion] 1 (one) custom Fargate Service Role with the AmazonEKSFargatePodExecutionRolePolicy policy attached or similar permissions.

    To create pods on Fargate, the components running on Fargate must make calls to the AWS APIs on one's behalf.
    This is so that it can take actions such as pull container images from ECR or route logs to other AWS services.

    For a cluster to be allowed to make those calls, it requires to have a Fargate profile assigned, and this profile must use a role with:

    • The AmazonEKSFargatePodExecutionRolePolicy policy attached to it, or
    • Comparable permissions.
  • 1+ (one or more) Fargate profile(s).

Procedure:

  1. Create a custom IAM role for the Fargate profile if needed (see Requirements).

  2. Attach the required policies to the role used by the profile.

    Example in CLI
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": "sts:AssumeRole",
                "Principal": {
                    "Service": "eks-fargate-pods.amazonaws.com"
                },
                "Condition": {
                     "ArnLike": {
                          "aws:SourceArn": "arn:aws:eks:region-code:111122223333:fargateprofile/my-cluster/*"
                     }
                }
            }
        ]
    }
    
    aws iam create-role \
      --role-name 'DeepThinkerFargate' \
      --assume-role-policy-document 'file://eks-cluster-role-trust-policy.json'
    aws iam attach-role-policy \
      --role-name 'DeepThinkerFargate' \
      --policy-arn 'arn:aws:iam::aws:policy/AmazonEKSClusterPolicy'
    
    Example in Pulumi
    const fargate_assumeRole_policy = JSON.stringify({
        Version: "2012-10-17",
        Statement: [{
            Effect: "Allow",
            Action: "sts:AssumeRole",
            Principal: {
                Service:  "eks-fargate-pods.amazonaws.com",
            },
            Condition: {
                ArnLike: {
                    "aws:SourceArn": `arn:aws:eks:${region}:${account}:fargateprofile/${cluster.name}/*`
                }
            },
        }],
    });
    
    const fargate_service_role = new aws.iam.Role("fargate-service-role", {
        assumeRolePolicy: fargate_assumeRole_policy,
        managedPolicyArns: [
            // alternatively, use RolePolicyAttachments
            "arn:aws:iam::aws:policy/AmazonEKSFargatePodExecutionRolePolicy",
        ],
        name: "DeepThinkerFargate",
        
    });
    
  3. Create the desired Fargate profiles.

    Example in CLI
    aws eks create-fargate-profile \
      --cluster-name 'DeepThought' \
      --fargate-profile-name 'alpha' \
      --pod-execution-role-arn 'arn:aws:iam::000011112222:role/DeepThinkerFargate' \
      --subnets 'subnet-11112222333344445' 'subnet-66667777888899990' \
      --selectors 'namespace=string'
    
    Example in Pulumi
    const fargateProfile_alpha = new aws.eks.FargateProfile("fargateProfile-alpha", {
        fargateProfileName: "fargateProfile-alpha",
        clusterName: cluster.name,
        podExecutionRoleArn: fargate_service_role.arn,
        selectors: [
            { namespace: "monitoring" },
            { namespace: "default" },
        ],
        subnetIds: cluster.vpcConfig.subnetIds,
        
    });
    

Access management

The current default authentication method is through the aws-auth configmap in the kube-system namespace.

By default, the IAM principal creating the cluster is the only one able to make calls to the cluster's API server.
To let other IAM principals have access to the cluster, one needs to add them to it.

See the following to allow others:

When a cluster's authentication mode includes the APIs:

# Check cluster's authentication mode.
$ aws eks describe-cluster --name 'thisIsBananas' --query 'cluster.accessConfig.authenticationMode' --output 'text'
API_AND_CONFIG_MAP

One can use access entries to allow IAM users and roles to connect to it:

# Create access entries to use IAM for authentication.
aws eks create-access-entry --cluster-name 'DeepThought' \
  --principal-arn 'arn:aws:iam::000011112222:role/Admin'
aws eks create-access-entry … --principal-arn 'arn:aws:iam::000011112222:user/bob'

In the case the configmap is also used, APIs take precedence over the configmap.

Mind that, to allow operations inside the cluster, every access entry requires to be assigned an EKS access policy:

# List available access policies.
aws eks list-access-policies

# Associate policies to access entries.
aws eks associate-access-policy --cluster-name 'DeepThought' \
  --principal-arn 'arn:aws:iam::000011112222:role/Admin' \
  --policy-arn 'arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy' \
  --access-scope '[ "type": "cluster" ]'
aws eks associate-access-policy --cluster-name 'DeepThought' \
  --principal-arn 'arn:aws:iam::000011112222:user/bob' \
  --policy-arn 'arn:aws:eks::aws:cluster-access-policy/AmazonEKSViewPolicy' \
  --access-scope '[ "type": "namespace", "namespaces": [ "bob" ] ]'

Secrets encryption through KMS

See Enabling secret encryption on an existing cluster.

TL;DR:

  1. Make sure the role used in the cluster has access to the used key with kms:DescribeKey and kms:CreateGrant permissions.

  2. Configure the cluster:

    Example in CLI
    aws eks associate-encryption-config \
      --cluster-name 'DeepThought' \
      --encryption-config '[{
        "provider": { "keyArn": "arn:aws:kms:eu-west-1:000011112222:key/33334444-5555-6666-7777-88889999aaaa" },
        "resources": [ "secrets" ]
      }]'
    
    Example in Pulumi
    const cluster = new aws.eks.Cluster("cluster", {
        encryptionConfig: {
            provider: { keyArn: `arn:aws:kms:${region}:${account}:key/${key_id}` },
            resources: [ "secrets" ],
        },
        
    });
    

Storage

Refer How do I use persistent storage in Amazon EKS?, Fargate storage and Running stateful workloads with Amazon EKS on AWS Fargate using Amazon EFS for this.

Troubleshooting

See Amazon EKS troubleshooting.

Identify common issues

Use the AWSSupport-TroubleshootEKSWorkerNode runbook.

For the automation to work, worker nodes must have permission to access Systems Manager and have Systems Manager running.
Grant this permission by attaching the AmazonSSMManagedInstanceCore policy to the node role.

Procedure:

  1. Open the runbook.
  2. Check that the AWS Region in the Management Console is set to the same Region as your cluster.
  3. In the Input parameters section, specify the name of the cluster and the EC2 instance ID.
  4. [optional] In the AutomationAssumeRole field, specify a role to allow Systems Manager to perform actions.
    If left empty, the permissions of your current IAM entity are used to perform the actions in the runbook.
  5. Choose Execute.
  6. Check the Outputs section.

The worker nodes fail to join the cluster

Error message example:

NodeCreationFailure: Instances failed to join the kubernetes cluster.

Debug: see Identify common issues.

Further readings

Sources