Files
oam/knowledge base/gitlab/runner.md

17 KiB

Gitlab runner

  1. TL;DR
  2. Pull images from private AWS ECR registries
  3. Runners on Kubernetes
  4. Autoscaling
    1. Docker Machine
  5. Further readings
    1. Sources

TL;DR

Installation
brew install 'gitlab-runner'
dnf install 'gitlab-runner'
docker pull 'gitlab/gitlab-runner'
helm --namespace 'gitlab' upgrade --install --create-namespace --version '0.64.1' --repo 'https://charts.gitlab.io' \
  'gitlab-runner' -f 'values.gitlab-runner.yml' 'gitlab-runner'
Usage
docker run --rm --name 'runner' 'gitlab/gitlab-runner:alpine-v13.6.0' --version

# `gitlab-runner exec` is deprecated and has been removed in 17.0. ┌П┐(ಠ_ಠ) Gitlab.
# See https://docs.gitlab.com/16.11/runner/commands/#gitlab-runner-exec-deprecated.
gitlab-runner exec docker 'job-name'
gitlab-runner exec docker \
  --env 'AWS_ACCESS_KEY_ID=AKIA…' --env 'AWS_SECRET_ACCESS_KEY=F…s' --env 'AWS_REGION=eu-east-1' \
  --env 'DOCKER_AUTH_CONFIG={ "credsStore": "ecr-login" }' \
  --docker-volumes "$HOME/.aws/credentials:/root/.aws/credentials:ro"
  'job-requiring-ecr-access'

Each runner executor is assigned 1 task at a time.

Pull images from private AWS ECR registries

  1. Create an IAM Role in one's AWS account and attach it the arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly IAM policy.

  2. Create and InstanceProfile using the above IAM Role.

  3. Create an EC2 Instance.
    Make it use the above InstanceProfile.

  4. Install the Docker Engine and the Gitlab runner on the EC2 Instance.

  5. Install the Amazon ECR Docker Credential Helper.

  6. Configure an AWS Region in /root/.aws/config:

    [default]
    region = eu-west-1
    
  7. Create the /root/.docker/config.json file and add the following line to it:

     {
       …
    + "credsStore": "ecr-login"
     }
    
  8. Configure the runner to use the docker or docker+machine executor.

    [[runners]]
    executor = "docker"   # or "docker+machine"
    
  9. Configure the runner to use the ECR Credential Helper:

    [[runners]]
      [runners.docker]
      environment = [ 'DOCKER_AUTH_CONFIG={"credsStore":"ecr-login"}' ]
    
  10. Configure jobs to use images saved in private AWS ECR registries:

    phpunit:
      stage: testing
      image:
        name: 123456789123.dkr.ecr.eu-west-1.amazonaws.com/php-gitlabrunner:latest
        entrypoint: [""]
      script:
        - php ./vendor/bin/phpunit --coverage-text --colors=never
    

Now your GitLab runner should automatically authenticate to one's private ECR registry.

Runners on Kubernetes

Store tokens in secrets instead of putting the token in the chart's values.

Requirements:

  • A running and configured Gitlab instance.
  • A Kubernetes cluster.

Procedure:

  1. [best practice] Create a dedicated namespace:

    kubectl create namespace 'gitlab'
    
  2. Create a runner in gitlab:

    1. Go to one's Gitlab instance's /admin/runners page.
    2. Click on the New instance runner button.
    3. Keep Linux as runner type.
    4. Click on the Create runner button.
    5. Copy the runner's token.
  3. (Re-)Create the runners' Kubernetes secret with the runners' token from the previous step:

    kubectl delete --namespace 'gitlab' secret 'gitlab-runner-token' --ignore-not-found
    kubectl create --namespace 'gitlab' secret generic 'gitlab-runner-token' \
      --from-literal='runner-registration-token=""' --from-literal='runner-token=glrt-…'
    

    The secret's name must be matched in the helm chart's values file.

  4. Install the helm chart:

    helm --namespace 'gitlab' upgrade --install --repo 'https://charts.gitlab.io' \
      --values 'values.yaml' \
      'gitlab-runner' 'gitlab-runner'
    

    [best practice] Be sure to match the runner version with the Gitlab server's:

    helm search repo --versions 'gitlab/gitlab-runner'
    
Example helm chart values
gitlabUrl: https://gitlab.example.org/
unregisterRunners: true
concurrent: 20
checkInterval: 3
rbac:
  create: true
metrics:
  enabled: true
runners:
  config: |
    [[runners]]

      [runners.cache]
        Shared = true

      [runners.kubernetes]
        image = "alpine"
        pull_policy = [
          "if-not-present",
          "always"
        ]
        allowed_pull_policies = [
          "if-not-present",
          "always",
          "never"
        ]

        namespace = "{{.Release.Namespace}}"
  name: "runner-on-k8s"
  secret: gitlab-runner-token
affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
            - key: eks.amazonaws.com/capacityType
              operator: In
              values:
                - ON_DEMAND
tolerations:
  - key: app
    operator: Equal
    value: gitlab
  - key: component
    operator: Equal
    value: runner
podLabels:
  team: engineering

Gotchas:

  • The build, helper and multiple service containers will all reside in a single pod.
    If the sum of the resources request by all of them is too high, it will not be scheduled and the pipeline will hang and fail.
  • If any pod is killed due to OOM, the pipeline that spawned it will hang until it times out.

Improvements:

  • Keep the manager pod on stable nodes.

    affinity:
      nodeAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            preference:
              matchExpressions:
                - key: eks.amazonaws.com/capacityType
                  operator: In
                  values:
                    - ON_DEMAND
    
  • Dedicate specific nodes to runner executors.
    Taint dedicated nodes and add tolerations and affinities to the runner's configuration.

    [[runners]]
      [runners.kubernetes]
    
      [runners.kubernetes.node_selector]
        gitlab = "true"
        "kubernetes.io/arch" = "amd64"
    
        [runners.kubernetes.affinity]
          [runners.kubernetes.affinity.node_affinity]
            [runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution]
              [[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms]]
                [[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms.match_expressions]]
                  key = "app"
                  operator = "In"
                  values = [ "gitlab-runner" ]
                [[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms.match_expressions]]
                  key = "customLabel"
                  operator = "In"
                  values = [ "customValue" ]
    
              [[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution]]
                weight = 1
    
                [runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution.preference]
                  [[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution.preference.match_expressions]]
                    key = "eks.amazonaws.com/capacityType"
                    operator = "In"
                    values = [ "ON_DEMAND" ]
    
        [runners.kubernetes.node_tolerations]
          "app=gitlab-runner" = "NoSchedule"
          "node-role.kubernetes.io/master" = "NoSchedule"
          "custom.toleration=value" = "NoSchedule"
          "empty.value=" = "PreferNoSchedule"
          onlyKey = ""
    
  • Avoid massive resource consumption by defaulting to (very?) strict resource limits and 0 request.

    [[runners]]
      [runners.kubernetes]
        cpu_request = "0"
        cpu_limit = "2"
        memory_request = "0"
        memory_limit = "2Gi"
        ephemeral_storage_request = "0"
        ephemeral_storage_limit = "512Mi"
    
        helper_cpu_request = "0"
        helper_cpu_limit = "0.5"
        helper_memory_request = "0"
        helper_memory_limit = "128Mi"
        helper_ephemeral_storage_request = "0"
        helper_ephemeral_storage_limit = "64Mi"
    
        service_cpu_request = "0"
        service_cpu_limit = "1"
        service_memory_request = "0"
        service_memory_limit = "0.5Gi"
    
  • Play nice and make sure to leave some space for the host's other workloads by allowing for resource request and limit override only up to a point.

    [[runners]]
      [runners.kubernetes]
        cpu_limit_overwrite_max_allowed = "15"
        cpu_request_overwrite_max_allowed = "15"
        memory_limit_overwrite_max_allowed = "62Gi"
        memory_request_overwrite_max_allowed = "62Gi"
        ephemeral_storage_limit_overwrite_max_allowed = "49Gi"
        ephemeral_storage_request_overwrite_max_allowed = "49Gi"
    
        helper_cpu_limit_overwrite_max_allowed = "0.9"
        helper_cpu_request_overwrite_max_allowed = "0.9"
        helper_memory_limit_overwrite_max_allowed = "1Gi"
        helper_memory_request_overwrite_max_allowed = "1Gi"
        helper_ephemeral_storage_limit_overwrite_max_allowed = "1Gi"
        helper_ephemeral_storage_request_overwrite_max_allowed = "1Gi"
    
        service_cpu_limit_overwrite_max_allowed = "3.9"
        service_cpu_request_overwrite_max_allowed = "3.9"
        service_memory_limit_overwrite_max_allowed = "15.5Gi"
        service_memory_request_overwrite_max_allowed = "15.5Gi"
        service_ephemeral_storage_limit_overwrite_max_allowed = "15Gi"
        service_ephemeral_storage_request_overwrite_max_allowed = "15Gi"
    

Autoscaling

Docker Machine

Runner like any others, just configured to use the docker+machine executor.

Supported cloud providers.

Using this executor opens up specific configuration settings.

Pitfalls:

Example configuration
# Number of jobs *in total* that can be run concurrently by *all* configured runners
# Does *not* affect the *total* upper limit of VMs created by *all* providers
concurrent = 40

[[runners]]
  name = "static-scaler"

  url = "https://gitlab.example.org"
  token = "abcdefghijklmnopqrst"

  executor = "docker+machine"
  environment = [ "AWS_REGION=eu-west-1" ]

  # Number of jobs that can be run concurrently by the VMs created by *this* runner
  # Defines the *upper limit* of how many VMs can be created by *this* runner, since it is 1 task per VM at a time
  limit = 10

  [runners.machine]
    # Static number of VMs that need to be idle at all times
    IdleCount = 0

    # Remove VMs after 5m in the idle state
    IdleTime = 300

    # Maximum number of VMs that can be added to this runner in parallel
    # Defaults to 0 (no limit)
    MaxGrowthRate = 1

    # Template for the VMs' names
    # Must contain '%s'
    MachineName = "static-ondemand-%s"

    MachineDriver = "amazonec2"
    MachineOptions = [
      # Refer the correct driver at 'https://gitlab.com/gitlab-org/ci-cd/docker-machine/-/tree/main/docs/drivers'
      "amazonec2-region=eu-west-1",
      "amazonec2-vpc-id=vpc-1234abcd",
      "amazonec2-zone=a",                              # driver limitation, only 1 allowed
      "amazonec2-subnet-id=subnet-0123456789abcdef0",  # subnet-id in the specified az
      "amazonec2-use-private-address=true",
      "amazonec2-private-address-only=true",
      "amazonec2-security-group=GitlabRunners",

      "amazonec2-instance-type=m6i.large",
      "amazonec2-root-size=50",
      "amazonec2-iam-instance-profile=GitlabRunnerEc2",
      "amazonec2-tags=Team,Infrastructure,Application,Gitlab Runner,SpotInstance,False",
    ]

[[runners]]
  name = "dynamic-scaler"
  executor = "docker+machine"
  limit = 40  # will still respect the global concurrency value

  [runners.machine]
    # With 'IdleScaleFactor' defined, this becomes the upper limit of VMs that can be idle at all times
    IdleCount = 10

    # *Minimum* number of VMs that need to be idle at all times when 'IdleScaleFactor' is defined
    # Defaults to 1; will be set automatically to 1 if set lower than that
    IdleCountMin = 1

    # Number of VMs that need to be idle at all times, as a factor of the number of machines in use
    # In this case: idle VMs = 1.0 * machines in use, min 1, max 10
    # Must be a floating point number
    # Defaults to 0.0
    IdleScaleFactor = 1.0

    IdleTime = 600

    # Remove VMs after 250 jobs
    # Keeps them fresh
    MaxBuilds = 250

    MachineName = "dynamic-spot-%s"
    MachineDriver = "amazonec2"
    MachineOptions = [
      # Refer the correct driver at 'https://gitlab.com/gitlab-org/ci-cd/docker-machine/-/tree/main/docs/drivers'
      "amazonec2-region=eu-west-1",
      "amazonec2-vpc-id=vpc-1234abcd",
      "amazonec2-zone=b",                              # driver limitation, only 1 allowed
      "amazonec2-subnet-id=subnet-abcdef0123456789a",  # subnet-id in the specified az
      "amazonec2-use-private-address=true",
      "amazonec2-private-address-only=true",
      "amazonec2-security-group=GitlabRunners",

      "amazonec2-instance-type=r7a.large",
      "amazonec2-root-size=25",
      "amazonec2-iam-instance-profile=GitlabRunnerEc2",
      "amazonec2-tags=Team,Infrastructure,Application,Gitlab Runner,SpotInstance,True",

      "amazonec2-request-spot-instance=true",
      "amazonec2-spot-price=0.3",
    ]

    # Pump up the volume of available VMs during working hours
    [[runners.machine.autoscaling]]
      Periods = ["* * 9-17 * * mon-fri *"] # Every work day between 9 and 18 Amsterdam time
      Timezone = "Europe/Amsterdam"

      IdleCount = 20
      IdleCountMin = 5
      IdleTime = 3600

      # In this case: idle VMs = 1.5 * machines in use, min 5, max 20
      IdleScaleFactor = 1.5

    # Reduce even more the number of available VMs during the weekends
    [[runners.machine.autoscaling]]
      Periods = ["* * * * * sat,sun *"]
      IdleCount = 0
      IdleTime = 120
      Timezone = "UTC"

Further readings

Sources