mirror of
https://gitea.com/mcereda/oam.git
synced 2026-02-09 05:44:23 +00:00
929 lines
32 KiB
Markdown
929 lines
32 KiB
Markdown
# Gitlab runner
|
|
|
|
1. [TL;DR](#tldr)
|
|
1. [Pull images from private AWS ECR registries](#pull-images-from-private-aws-ecr-registries)
|
|
1. [Executors](#executors)
|
|
1. [Docker Autoscaler executor](#docker-autoscaler-executor)
|
|
1. [Docker Machine executor](#docker-machine-executor)
|
|
1. [Instance executor](#instance-executor)
|
|
1. [Autoscaling](#autoscaling)
|
|
1. [Docker Machine](#docker-machine)
|
|
1. [GitLab Runner Autoscaler](#gitlab-runner-autoscaler)
|
|
1. [Kubernetes](#kubernetes)
|
|
1. [Further readings](#further-readings)
|
|
1. [Sources](#sources)
|
|
|
|
## TL;DR
|
|
|
|
<details>
|
|
<summary>Installation</summary>
|
|
|
|
```sh
|
|
brew install 'gitlab-runner'
|
|
dnf install 'gitlab-runner'
|
|
docker pull 'gitlab/gitlab-runner'
|
|
helm --namespace 'gitlab' upgrade --install --create-namespace --version '0.64.1' --repo 'https://charts.gitlab.io' \
|
|
'gitlab-runner' -f 'values.gitlab-runner.yml' 'gitlab-runner'
|
|
```
|
|
|
|
</details>
|
|
|
|
<details style="padding-bottom: 1em;">
|
|
<summary>Usage</summary>
|
|
|
|
```sh
|
|
docker run --rm --name 'runner' 'gitlab/gitlab-runner:alpine-v13.6.0' --version
|
|
|
|
# `gitlab-runner exec` is deprecated and has been removed in 17.0. ┌П┐(ಠ_ಠ) Gitlab.
|
|
# See https://docs.gitlab.com/16.11/runner/commands/#gitlab-runner-exec-deprecated.
|
|
gitlab-runner exec docker 'job-name'
|
|
gitlab-runner exec docker \
|
|
--env 'AWS_ACCESS_KEY_ID=AKIA…' --env 'AWS_SECRET_ACCESS_KEY=F…s' --env 'AWS_REGION=eu-east-1' \
|
|
--env 'DOCKER_AUTH_CONFIG={ "credsStore": "ecr-login" }' \
|
|
--docker-volumes "$HOME/.aws/credentials:/root/.aws/credentials:ro"
|
|
'job-requiring-ecr-access'
|
|
|
|
# Force a configuration file reload.
|
|
sudo kill -HUP $(pidof 'gitlab-runner')
|
|
sudo pkill -HUP 'gitlab-runner'
|
|
```
|
|
|
|
</details>
|
|
|
|
Each _Worker_ is assigned a single task at a time by default.
|
|
|
|
Runners require the Server to provide the full certificate chain upon connection.
|
|
|
|
The `runners.autoscaler.policy.periods` setting appears to be a full blown cron job, not just a time frame.
|
|
|
|
<details style="margin-top: -1em; padding: 0 0 1em 1em;">
|
|
|
|
Given the following policies:
|
|
|
|
```toml
|
|
[[runners]]
|
|
[runners.autoscaler]
|
|
[[runners.autoscaler.policy]] # policy 1
|
|
periods = [ "* 7-19 * * mon-fri" ]
|
|
…
|
|
[[runners.autoscaler.policy]] # policy 2
|
|
periods = [ "30 8-18 * * mon-fri" ]
|
|
…
|
|
```
|
|
|
|
It will **not** work as _apply policy 1 between 07:00 and 19:00 but override it with policy 2 between 08:30 and
|
|
18:30_.<br/>
|
|
Instead, the Runner will:
|
|
|
|
- Apply policy 1 every minute of every hour between 07:00 and 19:00, **and**
|
|
- Override policy 1 by applying policy 2 only on the 30th minute of every hour between 08:00 and 18:00.
|
|
|
|
Meaning it will reapply policy 1 at the 31st minute of every hour in the period defined by policy 2.
|
|
|
|
</details>
|
|
|
|
One can use system signals to interact with Runners.
|
|
|
|
<details style="margin-top: -1em; padding: 0 0 1em 1em;">
|
|
|
|
| Signal | Command it operates on | Effect | Example |
|
|
| ---------------------- | ---------------------- | ------------------------------------------------------------------------------------------------------------- | -------------------------------------- |
|
|
| `SIGINT` | `register` | Cancel ongoing Runner registrations.<br/>Delete Runners if already registered. | |
|
|
| `SIGINT`<br/>`SIGTERM` | `run`<br/>`run-single` | Abort all running builds and exit as soon as possible.<br/>Use twice to exit immediately (forceful shutdown). | |
|
|
| `SIGQUIT` | `run`<br/>`run-single` | Stop accepting new builds and exit as soon as the currently running builds finish (graceful shutdown). | `sudo kill -SIGQUIT <main_runner_pid>` |
|
|
| `SIGHUP` | run | Force reloading the configuration file. | `sudo kill -SIGHUP <main_runner_pid>` |
|
|
|
|
> Do **not** use `killall` or `pkill` for graceful shutdowns if one is using the `shell` or `docker` executors.<br/>
|
|
> This causes improper handling of the signals due to subprocesses being killed as well. Only use it on the main process
|
|
> handling the jobs.
|
|
|
|
</details>
|
|
|
|
## Pull images from private AWS ECR registries
|
|
|
|
FIXME
|
|
|
|
1. Create an IAM Role in one's AWS account and attach it the
|
|
`arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly` IAM policy.
|
|
1. Create and InstanceProfile using the above IAM Role.
|
|
1. Create an EC2 Instance.<br/>
|
|
Make it use the above InstanceProfile.
|
|
1. Install the Docker Engine and the [Gitlab runner][install gitlab runner] on the EC2 Instance.
|
|
1. Install the [Amazon ECR Docker Credential Helper].
|
|
1. Configure an AWS Region in `/root/.aws/config`:
|
|
|
|
```ini
|
|
[default]
|
|
region = eu-west-1
|
|
```
|
|
|
|
1. Create the `/root/.docker/config.json` file and add the following line to it:
|
|
|
|
```diff
|
|
{
|
|
…
|
|
+ "credsStore": "ecr-login"
|
|
}
|
|
```
|
|
|
|
1. Configure the runner to use the [`docker`][docker executor] or [`docker+machine`][docker machine] executor.
|
|
|
|
```toml
|
|
[[runners]]
|
|
executor = "docker" # or "docker+machine"
|
|
```
|
|
|
|
1. Configure the runner to use the ECR Credential Helper:
|
|
|
|
```toml
|
|
[[runners]]
|
|
[runners.docker]
|
|
environment = [ 'DOCKER_AUTH_CONFIG={"credsStore":"ecr-login"}' ]
|
|
```
|
|
|
|
1. Configure jobs to use images saved in private AWS ECR registries:
|
|
|
|
```yaml
|
|
phpunit:
|
|
stage: testing
|
|
image:
|
|
name: 123456789123.dkr.ecr.eu-west-1.amazonaws.com/php-gitlabrunner:latest
|
|
entrypoint: [""]
|
|
script:
|
|
- php ./vendor/bin/phpunit --coverage-text --colors=never
|
|
```
|
|
|
|
Now your GitLab runner should automatically authenticate to one's private ECR registry.
|
|
|
|
## Executors
|
|
|
|
### Docker Autoscaler executor
|
|
|
|
Refer [Docker Autoscaler executor].
|
|
|
|
Autoscale-enabled wrap for the `docker` executor. Supports all `docker` executor's options and features.<br/>
|
|
Creates instances on-demand to accommodate jobs processed by the runner leveraging it, which acts as manager.<br/>
|
|
The runner itself will **not** execute jobs, just delegate them.
|
|
|
|
A Runner registers to the Server and acts as Runner Autoscaler.<br/>
|
|
The Autoscaler shall be configured to use `docker-autoscaler` as the executor, and shall be granted permissions to
|
|
discover and scale the resources it uses to scale automatically.
|
|
|
|
Leverages [fleeting] plugins to scale automatically.<br/>
|
|
This is an abstraction for a group of automatically scaled instances, and uses plugins supporting cloud providers.
|
|
|
|
The Autoscaler connects to the Workers through SSH (if Linux-based) and executes Docker commands.<br/>
|
|
The user it connects with **must** be able to execute Docker commands with**out** a password (most likely by being part
|
|
of the `docker` group on the Worker).
|
|
|
|
Jobs are executed in containers.<br/>
|
|
The container images are pulled by the Autoscaler, which then feeds them to the Workers. As such:
|
|
|
|
- Workers do **not** need direct image access.
|
|
- **Both** the Autoscaler **and** the Workers require the Docker Engine to be installed.
|
|
|
|
Multiple Autoscaler do **not** play well managing the same resources.<br/>
|
|
They end up competing for resources, and will try to delete each other's Workers.
|
|
|
|
<details>
|
|
<summary>Setup</summary>
|
|
|
|
1. Configure the default region for the SDK to use in the runner executor user's related file.<br/>
|
|
Adding it as environment variable in the runner's configuration file does **not** work.
|
|
|
|
<details style="padding-bottom: 1em;">
|
|
<summary>AWS</summary>
|
|
|
|
`$HOME/.aws/config`
|
|
|
|
```ini
|
|
[default]
|
|
region = eu-west-1
|
|
```
|
|
|
|
</details>
|
|
|
|
1. Add at least the following in the runner's `config.toml` file:
|
|
|
|
```toml
|
|
[[runners]]
|
|
executor = "docker-autoscaler"
|
|
|
|
[runners.docker]
|
|
image = "busybox:latest" # or whatever
|
|
|
|
[runners.autoscaler]
|
|
plugin = "aws:latest" # or 'googlecloud' or 'azure' or whatever
|
|
|
|
[runners.autoscaler.plugin_config]
|
|
name = "…" # see plugin docs
|
|
```
|
|
|
|
</details>
|
|
|
|
<details>
|
|
<summary>Example: AWS, 1 instance per job, 5 idle instances for 20min.</summary>
|
|
|
|
Give each job a dedicated instance.<br/>
|
|
As soon as the job completes, the instance is immediately deleted.
|
|
|
|
Try to keep 5 whole instances available for future demand.<br/>
|
|
Idle instances stay available for at least 20 minutes.
|
|
|
|
Requirements:
|
|
|
|
- An EC2 instance with Docker Engine to act as manager.
|
|
- A Launch Template referencing an AMI equipped with Docker Engine for the runners to use.<br/>
|
|
Suggested to use [AL2023 based Amazon ECS AMIs][using al2023 based amazon ecs amis to host containerized workloads] or
|
|
custom ones.
|
|
|
|
Any AMI that can run Docker Engine can be used as-is, as long as an _appropriate_ cloud-init configuration is provided
|
|
in the LT's `userData` field.
|
|
|
|
<details style="margin-top: -1em; padding-bottom: 1em;">
|
|
|
|
```yaml
|
|
packages: [ "docker" ]
|
|
runcmd:
|
|
- systemctl daemon-reload
|
|
- systemctl enable --now docker.service
|
|
- grep docker /etc/group -q && usermod -a -G docker ec2-user
|
|
```
|
|
|
|
</details>
|
|
|
|
In this last case, and specially if the cloud-init process takes long, instances might be considered ready by the ASG,
|
|
but jobs might fail if the Docker Engine is not installed and configured properly before they are assigned to the
|
|
instances.<br/>
|
|
Consider creating a new AMI with everything ready for the LT to use, or set up a lifecycle hook in the ASG to give
|
|
instances time to finish preparations before being considered ready.
|
|
- An AutoScaling Group with the following setting:
|
|
|
|
- Minimum capacity = 0.
|
|
- Desired capacity = 0.
|
|
|
|
The runner will take care of scaling up and down.
|
|
- An IAM Policy granting the **Autoscaler** instance the permissions needed to scale the ASG.<br/>
|
|
Refer the [Recommended IAM Policy](https://gitlab.com/gitlab-org/fleeting/plugins/aws#recommended-iam-policy).
|
|
|
|
<details style="margin-top: -1em; padding-bottom: 1em;">
|
|
|
|
```json
|
|
{
|
|
"Version": "2012-10-17",
|
|
"Statement": [
|
|
{
|
|
"Sid": "AllowAsgDiscovering",
|
|
"Effect": "Allow",
|
|
"Action": [
|
|
"autoscaling:DescribeAutoScalingGroups",
|
|
"ec2:DescribeInstances"
|
|
],
|
|
"Resource": "*"
|
|
},
|
|
{
|
|
"Sid": "AllowAsgScaling",
|
|
"Effect": "Allow",
|
|
"Action": [
|
|
"autoscaling:SetDesiredCapacity",
|
|
"autoscaling:TerminateInstanceInAutoScalingGroup"
|
|
],
|
|
"Resource": "arn:aws:autoscaling:eu-west-1:012345678901:autoScalingGroup:01234567-abcd-0123-abcd-0123456789ab:autoScalingGroupName/runners-autoscalingGroup"
|
|
},
|
|
{
|
|
"Sid": "AllowManagingAccessToAsgInstances",
|
|
"Effect": "Allow",
|
|
"Action": "ec2-instance-connect:SendSSHPublicKey",
|
|
"Resource": "arn:aws:ec2:eu-west-1:012345678901:instance/*",
|
|
"Condition": {
|
|
"StringEquals": {
|
|
"ec2:ResourceTag/aws:autoscaling:groupName": "runners-autoscalingGroup"
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
</details>
|
|
|
|
- \[if needed] The [amazon ecr docker credential helper] installed on the **Autoscaler** instance.
|
|
- \[if needed] An IAM Policy granting the **Autoscaler** instance the permissions needed to pull images from ECRs.
|
|
|
|
<details style="margin-top: -1em; padding-bottom: 1em;">
|
|
|
|
```json
|
|
{
|
|
"Version": "2012-10-17",
|
|
"Statement": [
|
|
{
|
|
Sid: "AllowAuthenticatingWithEcr",
|
|
Effect: "Allow",
|
|
Action: "ecr:GetAuthorizationToken",
|
|
Resource: "*",
|
|
},
|
|
{
|
|
Sid: "AllowPullingImagesFromEcr",
|
|
Effect: "Allow",
|
|
Action: [
|
|
"ecr:BatchGetImage",
|
|
"ecr:GetDownloadUrlForLayer",
|
|
],
|
|
Resource: "012345678901.dkr.ecr.eu-west-1.amazonaws.com/some-repo/busybox",
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
</details>
|
|
|
|
Procedure:
|
|
|
|
1. Configure the default AWS Region for the AWS SDK to use.
|
|
|
|
<details style="margin-top: -1em; padding-bottom: 1em;">
|
|
|
|
```ini
|
|
[default]
|
|
region = eu-west-1
|
|
```
|
|
|
|
</details>
|
|
|
|
1. Install GitLab Runner on the **Autoscaling** instance.
|
|
1. Configure the Runner to use the `docker-autoscaler` executor.
|
|
|
|
<details style="margin-top: -1em; padding-bottom: 1em;">
|
|
|
|
```toml
|
|
concurrent = 10
|
|
|
|
[[runners]]
|
|
name = "runner autoscaler using docker autoscaler"
|
|
url = "https://gitlab.example.org"
|
|
token = "<token>"
|
|
executor = "docker-autoscaler"
|
|
|
|
[runners.docker]
|
|
image = "012345678901.dkr.ecr.eu-west-1.amazonaws.com/some-repo/busybox:latest"
|
|
|
|
[runners.autoscaler]
|
|
plugin = "aws"
|
|
max_instances = 10
|
|
|
|
[runners.autoscaler.plugin_config]
|
|
name = "my-docker-asg" # the required ASG name
|
|
|
|
[[runners.autoscaler.policy]]
|
|
idle_count = 5
|
|
idle_time = "20m0s"
|
|
```
|
|
|
|
</details>
|
|
|
|
1. Install the [fleeting] plugin.
|
|
|
|
<details style="margin-top: -1em; padding-bottom: 1em;">
|
|
|
|
```sh
|
|
gitlab-runner fleeting install
|
|
```
|
|
|
|
</details>
|
|
|
|
</details>
|
|
|
|
### Docker Machine executor
|
|
|
|
> **Deprecated** in GitLab 17.5.<br/>
|
|
> If using this executor with EC2 instances, Azure Compute, or GCE, migrate to the
|
|
> [GitLab Runner Autoscaler](#gitlab-runner-autoscaler).
|
|
|
|
<details style="padding: 0 0 0 1em">
|
|
|
|
[Supported cloud providers][docker machine's supported cloud providers].
|
|
|
|
Using this executor opens up specific [configuration settings][docker machine executor autoscale configuration].
|
|
|
|
Pitfalls:
|
|
|
|
- On AWS, the driver supports only one subnet (and hence 1 AZ) per runner.<br/>
|
|
See [AWS driver does not support multiple non default subnets] and [Docker Machine's AWS driver's options].
|
|
|
|
<details>
|
|
<summary>Example configuration</summary>
|
|
|
|
```toml
|
|
# Number of jobs *in total* that can be run concurrently by *all* configured runners
|
|
# Does *not* affect the *total* upper limit of VMs created by *all* providers
|
|
concurrent = 40
|
|
|
|
[[runners]]
|
|
name = "static-scaler"
|
|
|
|
url = "https://gitlab.example.org"
|
|
token = "abcdefghijklmnopqrst"
|
|
|
|
executor = "docker+machine"
|
|
environment = [ "AWS_REGION=eu-west-1" ]
|
|
|
|
# Number of jobs that can be run concurrently by the VMs created by *this* runner
|
|
# Defines the *upper limit* of how many VMs can be created by *this* runner, since it is 1 task per VM at a time
|
|
limit = 10
|
|
|
|
[runners.machine]
|
|
# Static number of VMs that need to be idle at all times
|
|
IdleCount = 0
|
|
|
|
# Remove VMs after 5m in the idle state
|
|
IdleTime = 300
|
|
|
|
# Maximum number of VMs that can be added to this runner in parallel
|
|
# Defaults to 0 (no limit)
|
|
MaxGrowthRate = 1
|
|
|
|
# Template for the VMs' names
|
|
# Must contain '%s'
|
|
MachineName = "static-ondemand-%s"
|
|
|
|
MachineDriver = "amazonec2"
|
|
MachineOptions = [
|
|
# Refer the correct driver at 'https://gitlab.com/gitlab-org/ci-cd/docker-machine/-/tree/main/docs/drivers'
|
|
"amazonec2-region=eu-west-1",
|
|
"amazonec2-vpc-id=vpc-1234abcd",
|
|
"amazonec2-zone=a", # driver limitation, only 1 allowed
|
|
"amazonec2-subnet-id=subnet-0123456789abcdef0", # subnet-id in the specified az
|
|
"amazonec2-use-private-address=true",
|
|
"amazonec2-private-address-only=true",
|
|
"amazonec2-security-group=GitlabRunners",
|
|
|
|
"amazonec2-instance-type=m6i.large",
|
|
"amazonec2-root-size=50",
|
|
"amazonec2-iam-instance-profile=GitlabRunnerEc2",
|
|
"amazonec2-tags=Team,Infrastructure,Application,Gitlab Runner,SpotInstance,False",
|
|
]
|
|
|
|
[[runners]]
|
|
name = "dynamic-scaler"
|
|
executor = "docker+machine"
|
|
limit = 40 # will still respect the global concurrency value
|
|
|
|
[runners.machine]
|
|
# With 'IdleScaleFactor' defined, this becomes the upper limit of VMs that can be idle at all times
|
|
IdleCount = 10
|
|
|
|
# *Minimum* number of VMs that need to be idle at all times when 'IdleScaleFactor' is defined
|
|
# Defaults to 1; will be set automatically to 1 if set lower than that
|
|
IdleCountMin = 1
|
|
|
|
# Number of VMs that need to be idle at all times, as a factor of the number of machines in use
|
|
# In this case: idle VMs = 1.0 * machines in use, min 1, max 10
|
|
# Must be a floating point number
|
|
# Defaults to 0.0
|
|
IdleScaleFactor = 1.0
|
|
|
|
IdleTime = 600
|
|
|
|
# Remove VMs after 250 jobs
|
|
# Keeps them fresh
|
|
MaxBuilds = 250
|
|
|
|
MachineName = "dynamic-spot-%s"
|
|
MachineDriver = "amazonec2"
|
|
MachineOptions = [
|
|
# Refer the correct driver at 'https://gitlab.com/gitlab-org/ci-cd/docker-machine/-/tree/main/docs/drivers'
|
|
"amazonec2-region=eu-west-1",
|
|
"amazonec2-vpc-id=vpc-1234abcd",
|
|
"amazonec2-zone=b", # driver limitation, only 1 allowed
|
|
"amazonec2-subnet-id=subnet-abcdef0123456789a", # subnet-id in the specified az
|
|
"amazonec2-use-private-address=true",
|
|
"amazonec2-private-address-only=true",
|
|
"amazonec2-security-group=GitlabRunners",
|
|
|
|
"amazonec2-instance-type=r7a.large",
|
|
"amazonec2-root-size=25",
|
|
"amazonec2-iam-instance-profile=GitlabRunnerEc2",
|
|
"amazonec2-tags=Team,Infrastructure,Application,Gitlab Runner,SpotInstance,True",
|
|
|
|
"amazonec2-request-spot-instance=true",
|
|
"amazonec2-spot-price=0.3",
|
|
]
|
|
|
|
# Pump up the volume of available VMs during working hours
|
|
[[runners.machine.autoscaling]]
|
|
Periods = ["* * 9-17 * * mon-fri *"] # Every work day between 9 and 18 Amsterdam time
|
|
Timezone = "Europe/Amsterdam"
|
|
|
|
IdleCount = 20
|
|
IdleCountMin = 5
|
|
IdleTime = 3600
|
|
|
|
# In this case: idle VMs = 1.5 * machines in use, min 5, max 20
|
|
IdleScaleFactor = 1.5
|
|
|
|
# Reduce even more the number of available VMs during the weekends
|
|
[[runners.machine.autoscaling]]
|
|
Periods = ["* * * * * sat,sun *"]
|
|
Timezone = "UTC"
|
|
|
|
IdleCount = 0
|
|
IdleTime = 120
|
|
```
|
|
|
|
</details>
|
|
|
|
</details>
|
|
|
|
### Instance executor
|
|
|
|
Refer [Instance executor](#instance-executor).
|
|
|
|
Autoscale-enabled executor that creates instances on-demand to accommodate the expected volume of jobs processed by the
|
|
runner manager.
|
|
|
|
Useful when jobs need full access to the host instance, operating system, and attached devices.<br/>
|
|
Can be configured to accommodate single and multi-tenant jobs with various levels of isolation and security.
|
|
|
|
## Autoscaling
|
|
|
|
Refer [GitLab Runner Autoscaling].
|
|
|
|
GitLab Runner can automatically scale using public cloud instances when configured to use an autoscaler.
|
|
|
|
Autoscaling options are available for public cloud instances and the following orchestration solutions:
|
|
|
|
- OpenShift.
|
|
- Kubernetes.
|
|
- Amazon ECS clusters using Fargate.
|
|
|
|
### Docker Machine
|
|
|
|
Refer [Autoscaling GitLab Runner on AWS EC2].
|
|
|
|
One or more runners must act as managers, and be configured to use the
|
|
[Docker Machine executor](#docker-machine-executor).<br/>
|
|
Managers interact with the cloud infrastructure to create multiple runner instances to execute jobs.<br/>
|
|
Cloud instances acting as managers shall **not** be spot instances.
|
|
|
|
### GitLab Runner Autoscaler
|
|
|
|
Refer [GitLab Runner Autoscaler].
|
|
|
|
Successor to the [Docker Machine](#docker-machine).
|
|
|
|
Composed of:
|
|
|
|
- **Taskscaler**: manages autoscaling logic, bookkeeping, and fleets creations.
|
|
- **Fleeting**: abstraction for cloud-provided virtual machines.
|
|
- **Cloud provider plugin**: handles the API calls to the target cloud platform.
|
|
|
|
One or more runners must act as managers.<br/>
|
|
Managers interact with the cloud infrastructure to create multiple runner instances to execute jobs.<br/>
|
|
Cloud instances acting as managers shall **not** be spot instances.
|
|
|
|
Managers must be configured to use one or more of the specific executors for autoscaling:
|
|
|
|
- [Instance executor](#instance-executor).
|
|
- [Docker Autoscaling executor](#docker-autoscaler-executor).
|
|
|
|
### Kubernetes
|
|
|
|
[Store tokens in secrets][store registration tokens or runner tokens in secrets] instead of putting the token in the
|
|
chart's values.
|
|
|
|
Requirements:
|
|
|
|
- A running and configured Gitlab instance.
|
|
- A running Kubernetes cluster.
|
|
|
|
<details>
|
|
<summary>Installation procedure</summary>
|
|
|
|
1. \[best practice] Create a dedicated namespace:
|
|
|
|
```sh
|
|
kubectl create namespace 'gitlab'
|
|
```
|
|
|
|
1. Create a runner in gitlab:
|
|
|
|
<details>
|
|
<summary>Web UI</summary>
|
|
|
|
1. Go to one's Gitlab instance's `/admin/runners` page.
|
|
1. Click on the _New instance runner_ button.
|
|
1. Keep _Linux_ as runner type.
|
|
1. Click on the _Create runner_ button.
|
|
1. Copy the runner's token.
|
|
|
|
</details>
|
|
|
|
<details style="padding-bottom: 1em;">
|
|
<summary>API</summary>
|
|
|
|
```sh
|
|
curl -X 'POST' 'https://gitlab.example.org/api/v4/user/runners' -H 'PRIVATE-TOKEN: glpat-m-…' \
|
|
-d 'runner_type=instance_type' -d 'tag_list=small,instance' -d 'run_untagged=false' -d 'a runner'
|
|
```
|
|
|
|
</details>
|
|
|
|
1. (Re-)Create the runners' Kubernetes secret with the runners' token from the previous step:
|
|
|
|
```sh
|
|
kubectl --namespace 'gitlab' delete secret 'gitlab-runner-token' --ignore-not-found
|
|
kubectl --namespace 'gitlab' create secret generic 'gitlab-runner-token' \
|
|
--from-literal='runner-registration-token=""' --from-literal='runner-token=glrt-…'
|
|
```
|
|
|
|
1. \[best practice] Be sure to match the runner version with the Gitlab server's:
|
|
|
|
```sh
|
|
helm search repo --versions 'gitlab/gitlab-runner'
|
|
```
|
|
|
|
1. Install the helm chart.
|
|
|
|
> The secret's name **must** be matched in the helm chart's values file.
|
|
|
|
```sh
|
|
helm --namespace 'gitlab' upgrade --install 'gitlab-runner-manager' \
|
|
--repo 'https://charts.gitlab.io' 'gitlab-runner' --version '0.69.0' \
|
|
--values 'values.yaml' --set 'runners.secret=gitlab-runner-token'
|
|
```
|
|
|
|
</details>
|
|
|
|
<details style="padding-bottom: 1em;">
|
|
<summary>Example helm chart values</summary>
|
|
|
|
```yaml
|
|
gitlabUrl: https://gitlab.example.org/
|
|
unregisterRunners: true
|
|
concurrent: 20
|
|
checkInterval: 3
|
|
rbac:
|
|
create: true
|
|
metrics:
|
|
enabled: true
|
|
runners:
|
|
name: "runner-on-k8s"
|
|
secret: gitlab-runner-token
|
|
config: |
|
|
[[runners]]
|
|
|
|
[runners.cache]
|
|
Shared = true
|
|
|
|
[runners.kubernetes]
|
|
namespace = "{{.Release.Namespace}}"
|
|
image = "alpine"
|
|
pull_policy = [
|
|
"if-not-present",
|
|
"always"
|
|
]
|
|
allowed_pull_policies = [
|
|
"if-not-present",
|
|
"always",
|
|
"never"
|
|
]
|
|
|
|
[runners.kubernetes.affinity]
|
|
[runners.kubernetes.affinity.node_affinity]
|
|
[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution]
|
|
[[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms]]
|
|
[[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms.match_expressions]]
|
|
key = "org.example.reservation/app"
|
|
operator = "In"
|
|
values = [ "gitlab" ]
|
|
[[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms.match_expressions]]
|
|
key = "org.example.reservation/component"
|
|
operator = "In"
|
|
values = [ "runner" ]
|
|
[[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution]]
|
|
weight = 1
|
|
[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution.preference]
|
|
[[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution.preference.match_expressions]]
|
|
key = "eks.amazonaws.com/capacityType"
|
|
operator = "In"
|
|
values = [ "ON_DEMAND" ]
|
|
[runners.kubernetes.node_tolerations]
|
|
"reservation/app=gitlab" = "NoSchedule"
|
|
"reservation/component=runner" = "NoSchedule"
|
|
|
|
affinity:
|
|
nodeAffinity:
|
|
preferredDuringSchedulingIgnoredDuringExecution:
|
|
- weight: 1
|
|
preference:
|
|
matchExpressions:
|
|
- key: eks.amazonaws.com/capacityType
|
|
operator: In
|
|
values:
|
|
- ON_DEMAND
|
|
tolerations:
|
|
- key: app
|
|
operator: Equal
|
|
value: gitlab
|
|
- key: component
|
|
operator: Equal
|
|
value: runner
|
|
podLabels:
|
|
team: engineering
|
|
```
|
|
|
|
</details>
|
|
|
|
Gotchas:
|
|
|
|
- The _build_, _helper_ and multiple _service_ containers will all reside in a single pod.<br/>
|
|
If **the sum** of the resources request by **all** of them is too high, it will **not** be scheduled and the pipeline
|
|
will hang and fail.
|
|
- If any pod is killed due to OOM, the pipeline that spawned it will hang until it times out.
|
|
|
|
Improvements:
|
|
|
|
- Keep the manager pod on stable nodes.
|
|
|
|
<details style="padding-bottom: 1em;">
|
|
|
|
```yaml
|
|
affinity:
|
|
nodeAffinity:
|
|
preferredDuringSchedulingIgnoredDuringExecution:
|
|
- weight: 1
|
|
preference:
|
|
matchExpressions:
|
|
- key: eks.amazonaws.com/capacityType
|
|
operator: In
|
|
values:
|
|
- ON_DEMAND
|
|
```
|
|
|
|
</details>
|
|
|
|
- Dedicate specific nodes to runner executors.<br/>
|
|
Taint dedicated nodes and add tolerations and affinities to the runner's configuration.
|
|
|
|
<details style="padding-bottom: 1em;">
|
|
|
|
```toml
|
|
[[runners]]
|
|
[runners.kubernetes]
|
|
|
|
[runners.kubernetes.node_selector]
|
|
gitlab = "true"
|
|
"kubernetes.io/arch" = "amd64"
|
|
|
|
[runners.kubernetes.affinity]
|
|
[runners.kubernetes.affinity.node_affinity]
|
|
[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution]
|
|
[[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms]]
|
|
[[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms.match_expressions]]
|
|
key = "app"
|
|
operator = "In"
|
|
values = [ "gitlab-runner" ]
|
|
[[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms.match_expressions]]
|
|
key = "customLabel"
|
|
operator = "In"
|
|
values = [ "customValue" ]
|
|
|
|
[[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution]]
|
|
weight = 1
|
|
|
|
[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution.preference]
|
|
[[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution.preference.match_expressions]]
|
|
key = "eks.amazonaws.com/capacityType"
|
|
operator = "In"
|
|
values = [ "ON_DEMAND" ]
|
|
|
|
[runners.kubernetes.node_tolerations]
|
|
"app=gitlab-runner" = "NoSchedule"
|
|
"node-role.kubernetes.io/master" = "NoSchedule"
|
|
"custom.toleration=value" = "NoSchedule"
|
|
"empty.value=" = "PreferNoSchedule"
|
|
onlyKey = ""
|
|
```
|
|
|
|
</details>
|
|
|
|
- Avoid massive resource consumption by defaulting to (very?) strict resource limits and `0` request.
|
|
|
|
<details style="padding-bottom: 1em;">
|
|
|
|
```toml
|
|
[[runners]]
|
|
[runners.kubernetes]
|
|
cpu_request = "0"
|
|
cpu_limit = "2"
|
|
memory_request = "0"
|
|
memory_limit = "2Gi"
|
|
ephemeral_storage_request = "0"
|
|
ephemeral_storage_limit = "512Mi"
|
|
|
|
helper_cpu_request = "0"
|
|
helper_cpu_limit = "0.5"
|
|
helper_memory_request = "0"
|
|
helper_memory_limit = "128Mi"
|
|
helper_ephemeral_storage_request = "0"
|
|
helper_ephemeral_storage_limit = "64Mi"
|
|
|
|
service_cpu_request = "0"
|
|
service_cpu_limit = "1"
|
|
service_memory_request = "0"
|
|
service_memory_limit = "0.5Gi"
|
|
```
|
|
|
|
</details>
|
|
|
|
- Play nice and make sure to leave some space for the host's other workloads by allowing for resource request and limit
|
|
override only up to a point.
|
|
|
|
<details style="padding-bottom: 1em;">
|
|
|
|
```toml
|
|
[[runners]]
|
|
[runners.kubernetes]
|
|
cpu_limit_overwrite_max_allowed = "15"
|
|
cpu_request_overwrite_max_allowed = "15"
|
|
memory_limit_overwrite_max_allowed = "62Gi"
|
|
memory_request_overwrite_max_allowed = "62Gi"
|
|
ephemeral_storage_limit_overwrite_max_allowed = "49Gi"
|
|
ephemeral_storage_request_overwrite_max_allowed = "49Gi"
|
|
|
|
helper_cpu_limit_overwrite_max_allowed = "0.9"
|
|
helper_cpu_request_overwrite_max_allowed = "0.9"
|
|
helper_memory_limit_overwrite_max_allowed = "1Gi"
|
|
helper_memory_request_overwrite_max_allowed = "1Gi"
|
|
helper_ephemeral_storage_limit_overwrite_max_allowed = "1Gi"
|
|
helper_ephemeral_storage_request_overwrite_max_allowed = "1Gi"
|
|
|
|
service_cpu_limit_overwrite_max_allowed = "3.9"
|
|
service_cpu_request_overwrite_max_allowed = "3.9"
|
|
service_memory_limit_overwrite_max_allowed = "15.5Gi"
|
|
service_memory_request_overwrite_max_allowed = "15.5Gi"
|
|
service_ephemeral_storage_limit_overwrite_max_allowed = "15Gi"
|
|
service_ephemeral_storage_request_overwrite_max_allowed = "15Gi"
|
|
```
|
|
|
|
</details>
|
|
|
|
## Further readings
|
|
|
|
- [Gitlab]
|
|
- [Amazon ECR Docker Credential Helper]
|
|
- Gitlab's [docker machine] fork
|
|
- Gitlab's [gitlab-runner-operator] for OpenShift and Kubernetes
|
|
- [Docker Machine Executor autoscale configuration]
|
|
- [Fleeting]
|
|
- [Using AL2023 based Amazon ECS AMIs to host containerized workloads]
|
|
|
|
### Sources
|
|
|
|
- [Install Gitlab runner]
|
|
- [Docker executor]
|
|
- [Authenticating your GitLab CI runner to an AWS ECR registry using Amazon ECR Docker Credential Helper]
|
|
- [Install and register GitLab Runner for autoscaling with Docker Machine]
|
|
- [AWS driver does not support multiple non default subnets]
|
|
- [GitLab Runner Helm Chart]
|
|
- [GitLab Runner Autoscaling]
|
|
- [Autoscaling GitLab Runner on AWS EC2]
|
|
- [Instance executor]
|
|
- [Docker Autoscaler executor]
|
|
- [Signals]
|
|
|
|
<!--
|
|
Reference
|
|
═╬═Time══
|
|
-->
|
|
|
|
<!-- In-article sections -->
|
|
<!-- Knowledge base -->
|
|
[gitlab]: README.md
|
|
|
|
<!-- Files -->
|
|
<!-- Upstream -->
|
|
[autoscaling gitlab runner on aws ec2]: https://docs.gitlab.com/runner/configuration/runner_autoscale_aws/
|
|
[docker autoscaler executor]: https://docs.gitlab.com/runner/executors/docker_autoscaler.html
|
|
[docker executor]: https://docs.gitlab.com/runner/executors/docker.html
|
|
[docker machine executor autoscale configuration]: https://docs.gitlab.com/runner/configuration/autoscale.html
|
|
[docker machine's aws driver's options]: https://gitlab.com/gitlab-org/ci-cd/docker-machine/-/blob/main/docs/drivers/aws.md#options
|
|
[docker machine's supported cloud providers]: https://docs.gitlab.com/runner/configuration/autoscale.html#supported-cloud-providers
|
|
[docker machine]: https://gitlab.com/gitlab-org/ci-cd/docker-machine
|
|
[fleeting]: https://gitlab.com/gitlab-org/fleeting/fleeting
|
|
[gitlab runner autoscaler]: https://docs.gitlab.com/runner/runner_autoscale/index.html#gitlab-runner-autoscaler
|
|
[gitlab runner autoscaling]: https://docs.gitlab.com/runner/runner_autoscale/
|
|
[gitlab runner helm chart]: https://docs.gitlab.com/runner/install/kubernetes.html
|
|
[gitlab-runner-operator]: https://gitlab.com/gitlab-org/gl-openshift/gitlab-runner-operator
|
|
[install and register gitlab runner for autoscaling with docker machine]: https://docs.gitlab.com/runner/executors/docker_machine.html
|
|
[install gitlab runner]: https://docs.gitlab.com/runner/install/
|
|
[instance executor]: https://docs.gitlab.com/runner/executors/instance.html
|
|
[signals]: https://docs.gitlab.com/runner/commands/#signals
|
|
[store registration tokens or runner tokens in secrets]: https://docs.gitlab.com/runner/install/kubernetes.html#store-registration-tokens-or-runner-tokens-in-secrets
|
|
|
|
<!-- Others -->
|
|
[authenticating your gitlab ci runner to an aws ecr registry using amazon ecr docker credential helper]: https://faun.pub/authenticating-your-gitlab-ci-runner-to-an-aws-ecr-registry-using-amazon-ecr-docker-credential-b4604a9391eb
|
|
[aws driver does not support multiple non default subnets]: https://github.com/docker/machine/issues/4700
|
|
[amazon ecr docker credential helper]: https://github.com/awslabs/amazon-ecr-credential-helper
|
|
[using al2023 based amazon ecs amis to host containerized workloads]: https://docs.aws.amazon.com/linux/al2023/ug/ecs.html
|