29 KiB
Gitlab runner
TL;DR
Installation
brew install 'gitlab-runner'
dnf install 'gitlab-runner'
docker pull 'gitlab/gitlab-runner'
helm --namespace 'gitlab' upgrade --install --create-namespace --version '0.64.1' --repo 'https://charts.gitlab.io' \
'gitlab-runner' -f 'values.gitlab-runner.yml' 'gitlab-runner'
Usage
docker run --rm --name 'runner' 'gitlab/gitlab-runner:alpine-v13.6.0' --version
# `gitlab-runner exec` is deprecated and has been removed in 17.0. ┌П┐(ಠ_ಠ) Gitlab.
# See https://docs.gitlab.com/16.11/runner/commands/#gitlab-runner-exec-deprecated.
gitlab-runner exec docker 'job-name'
gitlab-runner exec docker \
--env 'AWS_ACCESS_KEY_ID=AKIA…' --env 'AWS_SECRET_ACCESS_KEY=F…s' --env 'AWS_REGION=eu-east-1' \
--env 'DOCKER_AUTH_CONFIG={ "credsStore": "ecr-login" }' \
--docker-volumes "$HOME/.aws/credentials:/root/.aws/credentials:ro"
'job-requiring-ecr-access'
Each runner executor is assigned 1 task at a time by default.
Runners seem to require the main instance to give the full certificate chain upon connection.
The runners.autoscaler.policy.periods setting appears to be a full blown cron job, not just a time frame.
Given the following policies:
[[runners]]
[runners.autoscaler]
[[runners.autoscaler.policy]]
periods = [ "* 7-19 * * mon-fri" ]
…
[[runners.autoscaler.policy]]
periods = [ "30 8-18 * * mon-fri" ]
…
It will not work as apply policy 1 between 07:00 and 19:00 but override it with policy 2 between 08:30 and
18:30.
Instead, the runner will:
- Apply policy 1 every minute of every hour between 07:00 and 19:00, and
- Override policy 1 by applying policy 2 only on the 30th minute of every hour between 08:00 and 18:00.
Meaning it will reapply policy 1 at the 31st minute of every hour in the period defined by policy 2.
Pull images from private AWS ECR registries
-
Create an IAM Role in one's AWS account and attach it the
arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnlyIAM policy. -
Create and InstanceProfile using the above IAM Role.
-
Create an EC2 Instance.
Make it use the above InstanceProfile. -
Install the Docker Engine and the Gitlab runner on the EC2 Instance.
-
Install the Amazon ECR Docker Credential Helper.
-
Configure an AWS Region in
/root/.aws/config:[default] region = eu-west-1 -
Create the
/root/.docker/config.jsonfile and add the following line to it:{ … + "credsStore": "ecr-login" } -
Configure the runner to use the
dockerordocker+machineexecutor.[[runners]] executor = "docker" # or "docker+machine" -
Configure the runner to use the ECR Credential Helper:
[[runners]] [runners.docker] environment = [ 'DOCKER_AUTH_CONFIG={"credsStore":"ecr-login"}' ] -
Configure jobs to use images saved in private AWS ECR registries:
phpunit: stage: testing image: name: 123456789123.dkr.ecr.eu-west-1.amazonaws.com/php-gitlabrunner:latest entrypoint: [""] script: - php ./vendor/bin/phpunit --coverage-text --colors=never
Now your GitLab runner should automatically authenticate to one's private ECR registry.
Executors
Docker Autoscaler executor
Refer Docker Autoscaler executor.
Autoscale-enabled wrap for the docker executor. Supports all docker executor's options and features.
Creates instances on-demand to accommodate jobs processed by the runner leveraging it, which acts as manager.
The runner itself will not execute jobs, just delegate them.
Leverages fleeting plugins to scale automatically.
Fleeting is an abstraction for a group of autoscaled instances, and uses plugins supporting cloud providers.
Both the manager and the instances executing jobs require the Docker Engine to be installed.
The manager will connect to the instances via SSH and execute Docker commands. The user it connects with must be
able to execute those commands commands (most likely by being part of the docker group on the instances).
Container images are pulled by the manager and sent to the instances it creates.
The instances do not require container registry access themselves this way.
Add the following settings in the config.toml file:
[[runners]]
executor = "docker-autoscaler"
[runners.docker]
image = "busybox:latest" # or whatever
[runners.autoscaler]
plugin = "aws:latest" # or 'googlecloud' or 'azure' or whatever
[runners.autoscaler.plugin_config]
name = "…" # see plugin docs
[[runners.autoscaler.policy]]
idle_count = 5
idle_time = "20m0s"
Example: AWS, 1 instance per job, 5 idle instances for 20min.
Give each job a dedicated instance.
As soon as the job completes, the instance is immediately deleted.
Try to keep 5 whole instances available for future demand.
Idle instances stay available for at least 20 minutes.
Requirements:
-
An EC2 instance with Docker Engine to act as manager.
-
A Launch Template referencing an AMI equipped with Docker Engine for the runners to use.
Alternatively, any AMI that can run Docker Engine can be used as long as an appropriate cloud-init configuration is provided in the template's
userData.packages: [ "docker" ] runcmd: - systemctl daemon-reload - systemctl enable --now docker.service - grep docker /etc/group -q && usermod -a -G docker ec2-userIn this case, and specially if the cloud-init process takes long, instances might be considered ready by the ASG but jobs might fail if the Docker Engine is not installed and configured properly before they are assigned to the instances.
Consider creating a new AMI with everything ready for the LT to use, or set up a lifecycle hook in the ASG to give instances time to finish preparations before being considered ready by the ASG. -
An AutoScaling Group with the following setting:
- Minimum capacity = 0.
- Desired capacity = 0.
The runner will take care of scaling up and down.
-
An IAM Policy granting the manager instance the permissions needed to scale the ASG.
Refer the Recommended IAM Policy.{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowAsgDiscovering", "Effect": "Allow", "Action": [ "autoscaling:DescribeAutoScalingGroups", "ec2:DescribeInstances" ], "Resource": "*" }, { "Sid": "AllowAsgScaling", "Effect": "Allow", "Action": [ "autoscaling:SetDesiredCapacity", "autoscaling:TerminateInstanceInAutoScalingGroup" ], "Resource": "arn:aws:autoscaling:eu-west-1:012345678901:autoScalingGroup:01234567-abcd-0123-abcd-0123456789ab:autoScalingGroupName/runners-autoscalingGroup" }, { "Sid": "AllowManagingAccessToAsgInstances", "Effect": "Allow", "Action": "ec2-instance-connect:SendSSHPublicKey", "Resource": "arn:aws:ec2:eu-west-1:012345678901:instance/*", "Condition": { "StringEquals": { "ec2:ResourceTag/aws:autoscaling:groupName": "runners-autoscalingGroup" } } } ] } -
[if needed] The amazon ecr docker credential helper installed on the manager instance.
-
[if needed] An IAM Policy granting the manager instance the permissions needed to pull images from ECRs.
{ "Version": "2012-10-17", "Statement": [ { Sid: "AllowAuthenticatingWithEcr", Effect: "Allow", Action: "ecr:GetAuthorizationToken", Resource: "*", }, { Sid: "AllowPullingImagesFromEcr", Effect: "Allow", Action: [ "ecr:BatchGetImage", "ecr:GetDownloadUrlForLayer", ], Resource: "012345678901.dkr.ecr.eu-west-1.amazonaws.com/some-repo/busybox", } ] }
Procedure:
-
Configure the default AWS Region for the AWS SDK to use.
[default] region = eu-west-1This could probably just be configured in the executor's setting, but I still need to confirm it.
[[runners]] executor = "docker-autoscaler" environment = [ "AWS_REGION=eu-west-1" ] -
Install the gitlab runner on the manager instance.
Configure it to use thedocker-autoscalerexecutor.concurrent = 10 [[runners]] name = "docker autoscaler" url = "https://gitlab.example.org" token = "<token>" executor = "docker-autoscaler" [runners.docker] image = "012345678901.dkr.ecr.eu-west-1.amazonaws.com/some-repo/busybox:latest" [runners.autoscaler] plugin = "aws" max_instances = 10 [runners.autoscaler.plugin_config] name = "my-docker-asg" # the required ASG name [[runners.autoscaler.policy]] idle_count = 5 idle_time = "20m0s" -
Install the fleeting plugin.
gitlab-runner fleeting install
Docker Machine executor
Deprecated in GitLab 17.5.
If using this executor with EC2 instances, Azure Compute, or GCE, migrate to the GitLab Runner Autoscaler.
Using this executor opens up specific configuration settings.
Pitfalls:
- On AWS, the driver supports only one subnet (and hence 1 AZ) per runner.
See AWS driver does not support multiple non default subnets and Docker Machine's AWS driver's options.
Example configuration
# Number of jobs *in total* that can be run concurrently by *all* configured runners
# Does *not* affect the *total* upper limit of VMs created by *all* providers
concurrent = 40
[[runners]]
name = "static-scaler"
url = "https://gitlab.example.org"
token = "abcdefghijklmnopqrst"
executor = "docker+machine"
environment = [ "AWS_REGION=eu-west-1" ]
# Number of jobs that can be run concurrently by the VMs created by *this* runner
# Defines the *upper limit* of how many VMs can be created by *this* runner, since it is 1 task per VM at a time
limit = 10
[runners.machine]
# Static number of VMs that need to be idle at all times
IdleCount = 0
# Remove VMs after 5m in the idle state
IdleTime = 300
# Maximum number of VMs that can be added to this runner in parallel
# Defaults to 0 (no limit)
MaxGrowthRate = 1
# Template for the VMs' names
# Must contain '%s'
MachineName = "static-ondemand-%s"
MachineDriver = "amazonec2"
MachineOptions = [
# Refer the correct driver at 'https://gitlab.com/gitlab-org/ci-cd/docker-machine/-/tree/main/docs/drivers'
"amazonec2-region=eu-west-1",
"amazonec2-vpc-id=vpc-1234abcd",
"amazonec2-zone=a", # driver limitation, only 1 allowed
"amazonec2-subnet-id=subnet-0123456789abcdef0", # subnet-id in the specified az
"amazonec2-use-private-address=true",
"amazonec2-private-address-only=true",
"amazonec2-security-group=GitlabRunners",
"amazonec2-instance-type=m6i.large",
"amazonec2-root-size=50",
"amazonec2-iam-instance-profile=GitlabRunnerEc2",
"amazonec2-tags=Team,Infrastructure,Application,Gitlab Runner,SpotInstance,False",
]
[[runners]]
name = "dynamic-scaler"
executor = "docker+machine"
limit = 40 # will still respect the global concurrency value
[runners.machine]
# With 'IdleScaleFactor' defined, this becomes the upper limit of VMs that can be idle at all times
IdleCount = 10
# *Minimum* number of VMs that need to be idle at all times when 'IdleScaleFactor' is defined
# Defaults to 1; will be set automatically to 1 if set lower than that
IdleCountMin = 1
# Number of VMs that need to be idle at all times, as a factor of the number of machines in use
# In this case: idle VMs = 1.0 * machines in use, min 1, max 10
# Must be a floating point number
# Defaults to 0.0
IdleScaleFactor = 1.0
IdleTime = 600
# Remove VMs after 250 jobs
# Keeps them fresh
MaxBuilds = 250
MachineName = "dynamic-spot-%s"
MachineDriver = "amazonec2"
MachineOptions = [
# Refer the correct driver at 'https://gitlab.com/gitlab-org/ci-cd/docker-machine/-/tree/main/docs/drivers'
"amazonec2-region=eu-west-1",
"amazonec2-vpc-id=vpc-1234abcd",
"amazonec2-zone=b", # driver limitation, only 1 allowed
"amazonec2-subnet-id=subnet-abcdef0123456789a", # subnet-id in the specified az
"amazonec2-use-private-address=true",
"amazonec2-private-address-only=true",
"amazonec2-security-group=GitlabRunners",
"amazonec2-instance-type=r7a.large",
"amazonec2-root-size=25",
"amazonec2-iam-instance-profile=GitlabRunnerEc2",
"amazonec2-tags=Team,Infrastructure,Application,Gitlab Runner,SpotInstance,True",
"amazonec2-request-spot-instance=true",
"amazonec2-spot-price=0.3",
]
# Pump up the volume of available VMs during working hours
[[runners.machine.autoscaling]]
Periods = ["* * 9-17 * * mon-fri *"] # Every work day between 9 and 18 Amsterdam time
Timezone = "Europe/Amsterdam"
IdleCount = 20
IdleCountMin = 5
IdleTime = 3600
# In this case: idle VMs = 1.5 * machines in use, min 5, max 20
IdleScaleFactor = 1.5
# Reduce even more the number of available VMs during the weekends
[[runners.machine.autoscaling]]
Periods = ["* * * * * sat,sun *"]
Timezone = "UTC"
IdleCount = 0
IdleTime = 120
Instance executor
Refer Instance executor.
Autoscale-enabled executor that creates instances on-demand to accommodate the expected volume of jobs processed by the runner manager.
Useful when jobs need full access to the host instance, operating system, and attached devices.
Can be configured to accommodate single and multi-tenant jobs with various levels of isolation and security.
Autoscaling
Refer GitLab Runner Autoscaling.
GitLab Runner can automatically scale using public cloud instances when configured to use an autoscaler.
Autoscaling options are available for public cloud instances and the following orchestration solutions:
- OpenShift.
- Kubernetes.
- Amazon ECS clusters using Fargate.
Docker Machine
Refer Autoscaling GitLab Runner on AWS EC2.
One or more runners must act as managers, and be configured to use the
Docker Machine executor.
Managers interact with the cloud infrastructure to create multiple runner instances to execute jobs.
Cloud instances acting as managers shall not be spot instances.
GitLab Runner Autoscaler
Refer GitLab Runner Autoscaler.
Successor to the Docker Machine.
Composed of:
- Taskscaler: manages autoscaling logic, bookkeeping, and fleets creations.
- Fleeting: abstraction for cloud-provided virtual machines.
- Cloud provider plugin: handles the API calls to the target cloud platform.
One or more runners must act as managers.
Managers interact with the cloud infrastructure to create multiple runner instances to execute jobs.
Cloud instances acting as managers shall not be spot instances.
Managers must be configured to use one or more of the specific executors for autoscaling:
Kubernetes
Store tokens in secrets instead of putting the token in the chart's values.
Requirements:
- A running and configured Gitlab instance.
- A running Kubernetes cluster.
Installation procedure
-
[best practice] Create a dedicated namespace:
kubectl create namespace 'gitlab' -
Create a runner in gitlab:
Web UI
- Go to one's Gitlab instance's
/admin/runnerspage. - Click on the New instance runner button.
- Keep Linux as runner type.
- Click on the Create runner button.
- Copy the runner's token.
API
curl -X 'POST' 'https://gitlab.example.org/api/v4/user/runners' -H 'PRIVATE-TOKEN: glpat-m-…' \ -d 'runner_type=instance_type' -d 'tag_list=small,instance' -d 'run_untagged=false' -d 'a runner' - Go to one's Gitlab instance's
-
(Re-)Create the runners' Kubernetes secret with the runners' token from the previous step:
kubectl --namespace 'gitlab' delete secret 'gitlab-runner-token' --ignore-not-found kubectl --namespace 'gitlab' create secret generic 'gitlab-runner-token' \ --from-literal='runner-registration-token=""' --from-literal='runner-token=glrt-…' -
[best practice] Be sure to match the runner version with the Gitlab server's:
helm search repo --versions 'gitlab/gitlab-runner' -
Install the helm chart.
The secret's name must be matched in the helm chart's values file.
helm --namespace 'gitlab' upgrade --install 'gitlab-runner-manager' \ --repo 'https://charts.gitlab.io' 'gitlab-runner' --version '0.69.0' \ --values 'values.yaml' --set 'runners.secret=gitlab-runner-token'
Example helm chart values
gitlabUrl: https://gitlab.example.org/
unregisterRunners: true
concurrent: 20
checkInterval: 3
rbac:
create: true
metrics:
enabled: true
runners:
name: "runner-on-k8s"
secret: gitlab-runner-token
config: |
[[runners]]
[runners.cache]
Shared = true
[runners.kubernetes]
namespace = "{{.Release.Namespace}}"
image = "alpine"
pull_policy = [
"if-not-present",
"always"
]
allowed_pull_policies = [
"if-not-present",
"always",
"never"
]
[runners.kubernetes.affinity]
[runners.kubernetes.affinity.node_affinity]
[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution]
[[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms]]
[[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms.match_expressions]]
key = "org.example.reservation/app"
operator = "In"
values = [ "gitlab" ]
[[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms.match_expressions]]
key = "org.example.reservation/component"
operator = "In"
values = [ "runner" ]
[[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution]]
weight = 1
[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution.preference]
[[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution.preference.match_expressions]]
key = "eks.amazonaws.com/capacityType"
operator = "In"
values = [ "ON_DEMAND" ]
[runners.kubernetes.node_tolerations]
"reservation/app=gitlab" = "NoSchedule"
"reservation/component=runner" = "NoSchedule"
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: eks.amazonaws.com/capacityType
operator: In
values:
- ON_DEMAND
tolerations:
- key: app
operator: Equal
value: gitlab
- key: component
operator: Equal
value: runner
podLabels:
team: engineering
Gotchas:
- The build, helper and multiple service containers will all reside in a single pod.
If the sum of the resources request by all of them is too high, it will not be scheduled and the pipeline will hang and fail. - If any pod is killed due to OOM, the pipeline that spawned it will hang until it times out.
Improvements:
-
Keep the manager pod on stable nodes.
affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: eks.amazonaws.com/capacityType operator: In values: - ON_DEMAND -
Dedicate specific nodes to runner executors.
Taint dedicated nodes and add tolerations and affinities to the runner's configuration.[[runners]] [runners.kubernetes] [runners.kubernetes.node_selector] gitlab = "true" "kubernetes.io/arch" = "amd64" [runners.kubernetes.affinity] [runners.kubernetes.affinity.node_affinity] [runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution] [[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms]] [[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms.match_expressions]] key = "app" operator = "In" values = [ "gitlab-runner" ] [[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms.match_expressions]] key = "customLabel" operator = "In" values = [ "customValue" ] [[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution]] weight = 1 [runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution.preference] [[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution.preference.match_expressions]] key = "eks.amazonaws.com/capacityType" operator = "In" values = [ "ON_DEMAND" ] [runners.kubernetes.node_tolerations] "app=gitlab-runner" = "NoSchedule" "node-role.kubernetes.io/master" = "NoSchedule" "custom.toleration=value" = "NoSchedule" "empty.value=" = "PreferNoSchedule" onlyKey = "" -
Avoid massive resource consumption by defaulting to (very?) strict resource limits and
0request.[[runners]] [runners.kubernetes] cpu_request = "0" cpu_limit = "2" memory_request = "0" memory_limit = "2Gi" ephemeral_storage_request = "0" ephemeral_storage_limit = "512Mi" helper_cpu_request = "0" helper_cpu_limit = "0.5" helper_memory_request = "0" helper_memory_limit = "128Mi" helper_ephemeral_storage_request = "0" helper_ephemeral_storage_limit = "64Mi" service_cpu_request = "0" service_cpu_limit = "1" service_memory_request = "0" service_memory_limit = "0.5Gi" -
Play nice and make sure to leave some space for the host's other workloads by allowing for resource request and limit override only up to a point.
[[runners]] [runners.kubernetes] cpu_limit_overwrite_max_allowed = "15" cpu_request_overwrite_max_allowed = "15" memory_limit_overwrite_max_allowed = "62Gi" memory_request_overwrite_max_allowed = "62Gi" ephemeral_storage_limit_overwrite_max_allowed = "49Gi" ephemeral_storage_request_overwrite_max_allowed = "49Gi" helper_cpu_limit_overwrite_max_allowed = "0.9" helper_cpu_request_overwrite_max_allowed = "0.9" helper_memory_limit_overwrite_max_allowed = "1Gi" helper_memory_request_overwrite_max_allowed = "1Gi" helper_ephemeral_storage_limit_overwrite_max_allowed = "1Gi" helper_ephemeral_storage_request_overwrite_max_allowed = "1Gi" service_cpu_limit_overwrite_max_allowed = "3.9" service_cpu_request_overwrite_max_allowed = "3.9" service_memory_limit_overwrite_max_allowed = "15.5Gi" service_memory_request_overwrite_max_allowed = "15.5Gi" service_ephemeral_storage_limit_overwrite_max_allowed = "15Gi" service_ephemeral_storage_request_overwrite_max_allowed = "15Gi"
Further readings
- Gitlab
- Amazon ECR Docker Credential Helper
- Gitlab's docker machine fork
- Gitlab's gitlab-runner-operator for OpenShift and Kubernetes
- Docker Machine Executor autoscale configuration
- Fleeting
Sources
- Install Gitlab runner
- Docker executor
- Authenticating your GitLab CI runner to an AWS ECR registry using Amazon ECR Docker Credential Helper
- Install and register GitLab Runner for autoscaling with Docker Machine
- AWS driver does not support multiple non default subnets
- GitLab Runner Helm Chart
- GitLab Runner Autoscaling
- Autoscaling GitLab Runner on AWS EC2
- Instance executor
- Docker Autoscaler executor