mirror of
https://gitea.com/mcereda/oam.git
synced 2026-02-08 21:34:25 +00:00
chore(gitlab): dump runner autoscaler installation and commands
This commit is contained in:
@@ -3,6 +3,7 @@
|
||||
1. [TL;DR](#tldr)
|
||||
1. [Gotchas](#gotchas)
|
||||
1. [Daemon configuration](#daemon-configuration)
|
||||
1. [Credentials](#credentials)
|
||||
1. [Images configuration](#images-configuration)
|
||||
1. [Containers configuration](#containers-configuration)
|
||||
1. [Health checks](#health-checks)
|
||||
@@ -287,10 +288,34 @@ The docker daemon is configured using the `/etc/docker/daemon.json` file:
|
||||
```json
|
||||
{
|
||||
"default-runtime": "runc",
|
||||
"dns": ["8.8.8.8", "1.1.1.1"]
|
||||
"dns": ["8.8.8.8", "1.1.1.1"],
|
||||
|
||||
}
|
||||
```
|
||||
|
||||
### Credentials
|
||||
|
||||
Configured in the `${HOME}/.docker/config.json` file of the user executing docker commands:
|
||||
|
||||
```json
|
||||
{
|
||||
"credsStore": "ecr-login",
|
||||
"auths": {
|
||||
"https://index.docker.io/v1/": {
|
||||
"auth": "abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ101234"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The `ecr-login` credentials store requires the [`amazon-ecr-credential-helper`][amazon-ecr-credential-helper] to be
|
||||
present on the system.
|
||||
|
||||
```sh
|
||||
brew install 'docker-credential-helper-ecr'
|
||||
dnf install 'amazon-ecr-credential-helper'
|
||||
```
|
||||
|
||||
## Images configuration
|
||||
|
||||
One should follow the [OpenContainers Image Spec].
|
||||
@@ -395,6 +420,7 @@ docker load …
|
||||
- [Testcontainers]
|
||||
- [Containerd]
|
||||
- [Kaniko]
|
||||
- [`amazon-ecr-credential-helper`][amazon-ecr-credential-helper]
|
||||
|
||||
### Sources
|
||||
|
||||
@@ -430,6 +456,7 @@ docker load …
|
||||
[github]: https://github.com/docker
|
||||
|
||||
<!-- Others -->
|
||||
[amazon-ecr-credential-helper]: https://github.com/awslabs/amazon-ecr-credential-helper
|
||||
[arch linux wiki]: https://wiki.archlinux.org/index.php/Docker
|
||||
[cheatsheet]: https://collabnix.com/docker-cheatsheet/
|
||||
[configuring dns]: https://dockerlabs.collabnix.com/intermediate/networking/Configuring_DNS.html
|
||||
|
||||
@@ -2,9 +2,14 @@
|
||||
|
||||
1. [TL;DR](#tldr)
|
||||
1. [Pull images from private AWS ECR registries](#pull-images-from-private-aws-ecr-registries)
|
||||
1. [Runners on Kubernetes](#runners-on-kubernetes)
|
||||
1. [Executors](#executors)
|
||||
1. [Docker Autoscaler executor](#docker-autoscaler-executor)
|
||||
1. [Docker Machine executor](#docker-machine-executor)
|
||||
1. [Instance executor](#instance-executor)
|
||||
1. [Autoscaling](#autoscaling)
|
||||
1. [Docker Machine](#docker-machine)
|
||||
1. [GitLab Runner Autoscaler](#gitlab-runner-autoscaler)
|
||||
1. [Kubernetes](#kubernetes)
|
||||
1. [Further readings](#further-readings)
|
||||
1. [Sources](#sources)
|
||||
|
||||
@@ -99,252 +104,202 @@ Runners seem to require the main instance to give the full certificate chain upo
|
||||
|
||||
Now your GitLab runner should automatically authenticate to one's private ECR registry.
|
||||
|
||||
## Runners on Kubernetes
|
||||
## Executors
|
||||
|
||||
[Store tokens in secrets][store registration tokens or runner tokens in secrets] instead of putting the token in the
|
||||
chart's values.
|
||||
### Docker Autoscaler executor
|
||||
|
||||
Refer [Docker Autoscaler executor].
|
||||
|
||||
Autoscale-enabled wrap for the Docker executor that creates instances on-demand to accommodate jobs processed by the
|
||||
runner manager.
|
||||
|
||||
Leverages [fleeting] plugins to scale automatically.<br/>
|
||||
Fleeting is an abstraction for a group of autoscaled instances, and uses plugins supporting cloud providers.
|
||||
|
||||
Add the following settings in the `config.toml` file:
|
||||
|
||||
```toml
|
||||
[[runners]]
|
||||
executor = "docker-autoscaler"
|
||||
|
||||
[runners.docker]
|
||||
image = "busybox:latest" # or whatever
|
||||
|
||||
[runners.autoscaler]
|
||||
plugin = "aws:latest" # or 'googlecloud' or 'azure' or whatever
|
||||
|
||||
[runners.autoscaler.plugin_config]
|
||||
name = "…" # see plugin docs
|
||||
|
||||
[[runners.autoscaler.policy]]
|
||||
idle_count = 5
|
||||
idle_time = "20m0s"
|
||||
```
|
||||
|
||||
<details>
|
||||
<summary>Example: AWS, 1 instance per job, 5 idle instances for 20min.</summary>
|
||||
|
||||
Give each job a dedicated instance.<br/>
|
||||
As soon as the job completes, the instance is immediately deleted.
|
||||
|
||||
Try to keep 5 whole instances available for future demand.<br/>
|
||||
Idle instances stay available for at least 20 minutes.
|
||||
|
||||
Requirements:
|
||||
|
||||
- A running and configured Gitlab instance.
|
||||
- A Kubernetes cluster.
|
||||
- An EC2 instance with Docker Engine to act as manager.
|
||||
- A Launch Template referencing an AMI equipped with Docker Engine for the runners to use.
|
||||
|
||||
Alternatively, any AMI that can run Docker Engine can be used as long as an appropriate cloud-init configuration is
|
||||
provided in the template's `userData`.<br/>
|
||||
Specifically, the user executing Docker (by default, the instance's default user) must be part of the `docker` group
|
||||
in order to be able to access Docker's socket.
|
||||
|
||||
<details style="padding-bottom: 1em;">
|
||||
|
||||
```yaml
|
||||
packages: [ "docker" ]
|
||||
runcmd:
|
||||
- systemctl daemon-reload
|
||||
- systemctl enable --now docker.service
|
||||
- grep docker /etc/group -q && usermod -a -G docker ec2-user
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
- An AutoScaling Group with the following setting:
|
||||
|
||||
- Minimum capacity = 0.
|
||||
- Desired capacity = 0.
|
||||
|
||||
The runner will take care of scaling up and down.
|
||||
- An IAM Policy granting the **manager** instance the permissions needed to scale the ASG.<br/>
|
||||
Refer the [Recommended IAM Policy](https://gitlab.com/gitlab-org/fleeting/plugins/aws#recommended-iam-policy).
|
||||
|
||||
<details style="padding-bottom: 1em;">
|
||||
|
||||
```json
|
||||
{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
"Sid": "AllowAsgDiscovering",
|
||||
"Effect": "Allow",
|
||||
"Action": [
|
||||
"autoscaling:DescribeAutoScalingGroups",
|
||||
"ec2:DescribeInstances"
|
||||
],
|
||||
"Resource": "*"
|
||||
},
|
||||
{
|
||||
"Sid": "AllowAsgScaling",
|
||||
"Effect": "Allow",
|
||||
"Action": [
|
||||
"autoscaling:SetDesiredCapacity",
|
||||
"autoscaling:TerminateInstanceInAutoScalingGroup"
|
||||
],
|
||||
"Resource": "arn:aws:autoscaling:eu-west-1:012345678901:autoScalingGroup:01234567-abcd-0123-abcd-0123456789ab:autoScalingGroupName/runners-autoscalingGroup"
|
||||
},
|
||||
{
|
||||
"Sid": "AllowManagingAccessToAsgInstances",
|
||||
"Effect": "Allow",
|
||||
"Action": "ec2-instance-connect:SendSSHPublicKey",
|
||||
"Resource": "arn:aws:ec2:eu-west-1:012345678901:instance/*",
|
||||
"Condition": {
|
||||
"StringEquals": {
|
||||
"ec2:ResourceTag/aws:autoscaling:groupName": "runners-autoscalingGroup"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
- \[if needed] The [amazon ecr docker credential helper] installed on the **manager** instance.
|
||||
- \[if needed] An IAM Policy granting the **manager** instance the permissions needed to pull images from ECRs.
|
||||
|
||||
<details style="padding-bottom: 1em;">
|
||||
|
||||
```json
|
||||
{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
Sid: "AllowAuthenticatingWithEcr",
|
||||
Effect: "Allow",
|
||||
Action: "ecr:GetAuthorizationToken",
|
||||
Resource: "*",
|
||||
},
|
||||
{
|
||||
Sid: "AllowPullingImagesFromEcr",
|
||||
Effect: "Allow",
|
||||
Action: [
|
||||
"ecr:BatchGetImage",
|
||||
"ecr:GetDownloadUrlForLayer",
|
||||
],
|
||||
Resource: "012345678901.dkr.ecr.eu-west-1.amazonaws.com/some-repo/busybox",
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
Procedure:
|
||||
|
||||
1. \[best practice] Create a dedicated namespace:
|
||||
1. Configure the default AWS Region for the AWS SDK to use.
|
||||
|
||||
```sh
|
||||
kubectl create namespace 'gitlab'
|
||||
```ini
|
||||
[default]
|
||||
region = eu-west-1
|
||||
```
|
||||
|
||||
1. Create a runner in gitlab:
|
||||
1. Install the gitlab runner on the **manager** instance.<br/>
|
||||
Configure it to use the `docker-autoscaler` executor.
|
||||
|
||||
1. Go to one's Gitlab instance's `/admin/runners` page.
|
||||
1. Click on the _New instance runner_ button.
|
||||
1. Keep _Linux_ as runner type.
|
||||
1. Click on the _Create runner_ button.
|
||||
1. Copy the runner's token.
|
||||
<details style="padding-bottom: 1em;">
|
||||
|
||||
1. (Re-)Create the runners' Kubernetes secret with the runners' token from the previous step:
|
||||
```toml
|
||||
concurrent = 10
|
||||
|
||||
```sh
|
||||
kubectl delete --namespace 'gitlab' secret 'gitlab-runner-token' --ignore-not-found
|
||||
kubectl create --namespace 'gitlab' secret generic 'gitlab-runner-token' \
|
||||
--from-literal='runner-registration-token=""' --from-literal='runner-token=glrt-…'
|
||||
[[runners]]
|
||||
name = "docker autoscaler"
|
||||
url = "https://gitlab.example.org"
|
||||
token = "<token>"
|
||||
executor = "docker-autoscaler"
|
||||
|
||||
[runners.docker]
|
||||
image = "012345678901.dkr.ecr.eu-west-1.amazonaws.com/some-repo/busybox:latest"
|
||||
|
||||
[runners.autoscaler]
|
||||
plugin = "aws"
|
||||
max_instances = 10
|
||||
|
||||
[runners.autoscaler.plugin_config]
|
||||
name = "my-docker-asg" # the required ASG name
|
||||
|
||||
[[runners.autoscaler.policy]]
|
||||
idle_count = 5
|
||||
idle_time = "20m0s"
|
||||
```
|
||||
|
||||
The secret's name **must** be matched in the helm chart's values file.
|
||||
</details>
|
||||
|
||||
1. Install the helm chart:
|
||||
1. Install the [fleeting] plugin.
|
||||
|
||||
```sh
|
||||
helm --namespace 'gitlab' upgrade --install --repo 'https://charts.gitlab.io' \
|
||||
--values 'values.yaml' \
|
||||
'gitlab-runner' 'gitlab-runner'
|
||||
gitlab-runner fleeting install
|
||||
```
|
||||
|
||||
\[best practice] Be sure to match the runner version with the Gitlab server's:
|
||||
|
||||
```sh
|
||||
helm search repo --versions 'gitlab/gitlab-runner'
|
||||
```
|
||||
|
||||
<details style="margin-bottom: 1em;">
|
||||
<summary>Example helm chart values</summary>
|
||||
|
||||
```yaml
|
||||
gitlabUrl: https://gitlab.example.org/
|
||||
unregisterRunners: true
|
||||
concurrent: 20
|
||||
checkInterval: 3
|
||||
rbac:
|
||||
create: true
|
||||
metrics:
|
||||
enabled: true
|
||||
runners:
|
||||
config: |
|
||||
[[runners]]
|
||||
|
||||
[runners.cache]
|
||||
Shared = true
|
||||
|
||||
[runners.kubernetes]
|
||||
image = "alpine"
|
||||
pull_policy = [
|
||||
"if-not-present",
|
||||
"always"
|
||||
]
|
||||
allowed_pull_policies = [
|
||||
"if-not-present",
|
||||
"always",
|
||||
"never"
|
||||
]
|
||||
|
||||
namespace = "{{.Release.Namespace}}"
|
||||
name: "runner-on-k8s"
|
||||
secret: gitlab-runner-token
|
||||
affinity:
|
||||
nodeAffinity:
|
||||
preferredDuringSchedulingIgnoredDuringExecution:
|
||||
- weight: 1
|
||||
preference:
|
||||
matchExpressions:
|
||||
- key: eks.amazonaws.com/capacityType
|
||||
operator: In
|
||||
values:
|
||||
- ON_DEMAND
|
||||
tolerations:
|
||||
- key: app
|
||||
operator: Equal
|
||||
value: gitlab
|
||||
- key: component
|
||||
operator: Equal
|
||||
value: runner
|
||||
podLabels:
|
||||
team: engineering
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
Gotchas:
|
||||
### Docker Machine executor
|
||||
|
||||
- The _build_, _helper_ and multiple _service_ containers will all reside in a single pod.<br/>
|
||||
If **the sum** of the resources request by **all** of them is too high, it will **not** be scheduled and the pipeline
|
||||
will hang and fail.
|
||||
- If any pod is killed due to OOM, the pipeline that spawned it will hang until it times out.
|
||||
|
||||
Improvements:
|
||||
|
||||
- Keep the manager pod on stable nodes.
|
||||
|
||||
<details style="margin-bottom: 1em;">
|
||||
|
||||
```yaml
|
||||
affinity:
|
||||
nodeAffinity:
|
||||
preferredDuringSchedulingIgnoredDuringExecution:
|
||||
- weight: 1
|
||||
preference:
|
||||
matchExpressions:
|
||||
- key: eks.amazonaws.com/capacityType
|
||||
operator: In
|
||||
values:
|
||||
- ON_DEMAND
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
- Dedicate specific nodes to runner executors.<br/>
|
||||
Taint dedicated nodes and add tolerations and affinities to the runner's configuration.
|
||||
|
||||
<details style="margin-bottom: 1em;">
|
||||
|
||||
```toml
|
||||
[[runners]]
|
||||
[runners.kubernetes]
|
||||
|
||||
[runners.kubernetes.node_selector]
|
||||
gitlab = "true"
|
||||
"kubernetes.io/arch" = "amd64"
|
||||
|
||||
[runners.kubernetes.affinity]
|
||||
[runners.kubernetes.affinity.node_affinity]
|
||||
[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution]
|
||||
[[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms]]
|
||||
[[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms.match_expressions]]
|
||||
key = "app"
|
||||
operator = "In"
|
||||
values = [ "gitlab-runner" ]
|
||||
[[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms.match_expressions]]
|
||||
key = "customLabel"
|
||||
operator = "In"
|
||||
values = [ "customValue" ]
|
||||
|
||||
[[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution]]
|
||||
weight = 1
|
||||
|
||||
[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution.preference]
|
||||
[[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution.preference.match_expressions]]
|
||||
key = "eks.amazonaws.com/capacityType"
|
||||
operator = "In"
|
||||
values = [ "ON_DEMAND" ]
|
||||
|
||||
[runners.kubernetes.node_tolerations]
|
||||
"app=gitlab-runner" = "NoSchedule"
|
||||
"node-role.kubernetes.io/master" = "NoSchedule"
|
||||
"custom.toleration=value" = "NoSchedule"
|
||||
"empty.value=" = "PreferNoSchedule"
|
||||
onlyKey = ""
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
- Avoid massive resource consumption by defaulting to (very?) strict resource limits and `0` request.
|
||||
|
||||
<details style="margin-bottom: 1em;">
|
||||
|
||||
```toml
|
||||
[[runners]]
|
||||
[runners.kubernetes]
|
||||
cpu_request = "0"
|
||||
cpu_limit = "2"
|
||||
memory_request = "0"
|
||||
memory_limit = "2Gi"
|
||||
ephemeral_storage_request = "0"
|
||||
ephemeral_storage_limit = "512Mi"
|
||||
|
||||
helper_cpu_request = "0"
|
||||
helper_cpu_limit = "0.5"
|
||||
helper_memory_request = "0"
|
||||
helper_memory_limit = "128Mi"
|
||||
helper_ephemeral_storage_request = "0"
|
||||
helper_ephemeral_storage_limit = "64Mi"
|
||||
|
||||
service_cpu_request = "0"
|
||||
service_cpu_limit = "1"
|
||||
service_memory_request = "0"
|
||||
service_memory_limit = "0.5Gi"
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
- Play nice and make sure to leave some space for the host's other workloads by allowing for resource request and limit
|
||||
override only up to a point.
|
||||
|
||||
<details style="margin-bottom: 1em;">
|
||||
|
||||
```toml
|
||||
[[runners]]
|
||||
[runners.kubernetes]
|
||||
cpu_limit_overwrite_max_allowed = "15"
|
||||
cpu_request_overwrite_max_allowed = "15"
|
||||
memory_limit_overwrite_max_allowed = "62Gi"
|
||||
memory_request_overwrite_max_allowed = "62Gi"
|
||||
ephemeral_storage_limit_overwrite_max_allowed = "49Gi"
|
||||
ephemeral_storage_request_overwrite_max_allowed = "49Gi"
|
||||
|
||||
helper_cpu_limit_overwrite_max_allowed = "0.9"
|
||||
helper_cpu_request_overwrite_max_allowed = "0.9"
|
||||
helper_memory_limit_overwrite_max_allowed = "1Gi"
|
||||
helper_memory_request_overwrite_max_allowed = "1Gi"
|
||||
helper_ephemeral_storage_limit_overwrite_max_allowed = "1Gi"
|
||||
helper_ephemeral_storage_request_overwrite_max_allowed = "1Gi"
|
||||
|
||||
service_cpu_limit_overwrite_max_allowed = "3.9"
|
||||
service_cpu_request_overwrite_max_allowed = "3.9"
|
||||
service_memory_limit_overwrite_max_allowed = "15.5Gi"
|
||||
service_memory_request_overwrite_max_allowed = "15.5Gi"
|
||||
service_ephemeral_storage_limit_overwrite_max_allowed = "15Gi"
|
||||
service_ephemeral_storage_request_overwrite_max_allowed = "15Gi"
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
## Autoscaling
|
||||
|
||||
### Docker Machine
|
||||
|
||||
Runner like any others, just configured to use the `docker+machine` executor.
|
||||
> **Deprecated** in GitLab 17.5.<br/>
|
||||
> If using this executor with EC2 instances, Azure Compute, or GCE, migrate to the
|
||||
> [GitLab Runner Autoscaler](#gitlab-runner-autoscaler).
|
||||
|
||||
[Supported cloud providers][docker machine's supported cloud providers].
|
||||
|
||||
@@ -476,6 +431,340 @@ concurrent = 40
|
||||
|
||||
</details>
|
||||
|
||||
### Instance executor
|
||||
|
||||
Refer [Instance executor](#instance-executor).
|
||||
|
||||
Autoscale-enabled executor that creates instances on-demand to accommodate the expected volume of jobs processed by the
|
||||
runner manager.
|
||||
|
||||
Useful when jobs need full access to the host instance, operating system, and attached devices.<br/>
|
||||
Can be configured to accommodate single and multi-tenant jobs with various levels of isolation and security.
|
||||
|
||||
## Autoscaling
|
||||
|
||||
Refer [GitLab Runner Autoscaling].
|
||||
|
||||
GitLab Runner can automatically scale using public cloud instances when configured to use an autoscaler.
|
||||
|
||||
Autoscaling options are available for public cloud instances and the following orchestration solutions:
|
||||
|
||||
- OpenShift.
|
||||
- Kubernetes.
|
||||
- Amazon ECS clusters using Fargate.
|
||||
|
||||
### Docker Machine
|
||||
|
||||
Refer [Autoscaling GitLab Runner on AWS EC2].
|
||||
|
||||
One or more runners must act as managers, and be configured to use the
|
||||
[Docker Machine executor](#docker-machine-executor).<br/>
|
||||
Managers interact with the cloud infrastructure to create multiple runner instances to execute jobs.<br/>
|
||||
Cloud instances acting as managers shall **not** be spot instances.
|
||||
|
||||
### GitLab Runner Autoscaler
|
||||
|
||||
Refer [GitLab Runner Autoscaler].
|
||||
|
||||
Successor to the [Docker Machine](#docker-machine).
|
||||
|
||||
Composed of:
|
||||
|
||||
- **Taskscaler**: manages autoscaling logic, bookkeeping, and fleets creations.
|
||||
- **Fleeting**: abstraction for cloud-provided virtual machines.
|
||||
- **Cloud provider plugin**: handles the API calls to the target cloud platform.
|
||||
|
||||
One or more runners must act as managers.<br/>
|
||||
Managers interact with the cloud infrastructure to create multiple runner instances to execute jobs.<br/>
|
||||
Cloud instances acting as managers shall **not** be spot instances.
|
||||
|
||||
Managers must be configured to use one or more of the specific executors for autoscaling:
|
||||
|
||||
- [Instance executor](#instance-executor).
|
||||
- [Docker Autoscaling executor](#docker-autoscaler-executor).
|
||||
|
||||
### Kubernetes
|
||||
|
||||
[Store tokens in secrets][store registration tokens or runner tokens in secrets] instead of putting the token in the
|
||||
chart's values.
|
||||
|
||||
Requirements:
|
||||
|
||||
- A running and configured Gitlab instance.
|
||||
- A running Kubernetes cluster.
|
||||
|
||||
<details>
|
||||
<summary>Installation procedure</summary>
|
||||
|
||||
1. \[best practice] Create a dedicated namespace:
|
||||
|
||||
```sh
|
||||
kubectl create namespace 'gitlab'
|
||||
```
|
||||
|
||||
1. Create a runner in gitlab:
|
||||
|
||||
<details>
|
||||
<summary>Web UI</summary>
|
||||
|
||||
1. Go to one's Gitlab instance's `/admin/runners` page.
|
||||
1. Click on the _New instance runner_ button.
|
||||
1. Keep _Linux_ as runner type.
|
||||
1. Click on the _Create runner_ button.
|
||||
1. Copy the runner's token.
|
||||
|
||||
</details>
|
||||
|
||||
<details style="padding-bottom: 1em;">
|
||||
<summary>API</summary>
|
||||
|
||||
```sh
|
||||
curl -X 'POST' 'https://gitlab.example.org/api/v4/user/runners' -H 'PRIVATE-TOKEN: glpat-m-…' \
|
||||
-d 'runner_type=instance_type' -d 'tag_list=small,instance' -d 'run_untagged=false' -d 'a runner'
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
1. (Re-)Create the runners' Kubernetes secret with the runners' token from the previous step:
|
||||
|
||||
```sh
|
||||
kubectl --namespace 'gitlab' delete secret 'gitlab-runner-token' --ignore-not-found
|
||||
kubectl --namespace 'gitlab' create secret generic 'gitlab-runner-token' \
|
||||
--from-literal='runner-registration-token=""' --from-literal='runner-token=glrt-…'
|
||||
```
|
||||
|
||||
1. \[best practice] Be sure to match the runner version with the Gitlab server's:
|
||||
|
||||
```sh
|
||||
helm search repo --versions 'gitlab/gitlab-runner'
|
||||
```
|
||||
|
||||
1. Install the helm chart.
|
||||
|
||||
> The secret's name **must** be matched in the helm chart's values file.
|
||||
|
||||
```sh
|
||||
helm --namespace 'gitlab' upgrade --install 'gitlab-runner-manager' \
|
||||
--repo 'https://charts.gitlab.io' 'gitlab-runner' --version '0.69.0' \
|
||||
--values 'values.yaml' --set 'runners.secret=gitlab-runner-token'
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details style="padding-bottom: 1em;">
|
||||
<summary>Example helm chart values</summary>
|
||||
|
||||
```yaml
|
||||
gitlabUrl: https://gitlab.example.org/
|
||||
unregisterRunners: true
|
||||
concurrent: 20
|
||||
checkInterval: 3
|
||||
rbac:
|
||||
create: true
|
||||
metrics:
|
||||
enabled: true
|
||||
runners:
|
||||
name: "runner-on-k8s"
|
||||
secret: gitlab-runner-token
|
||||
config: |
|
||||
[[runners]]
|
||||
|
||||
[runners.cache]
|
||||
Shared = true
|
||||
|
||||
[runners.kubernetes]
|
||||
namespace = "{{.Release.Namespace}}"
|
||||
image = "alpine"
|
||||
pull_policy = [
|
||||
"if-not-present",
|
||||
"always"
|
||||
]
|
||||
allowed_pull_policies = [
|
||||
"if-not-present",
|
||||
"always",
|
||||
"never"
|
||||
]
|
||||
|
||||
[runners.kubernetes.affinity]
|
||||
[runners.kubernetes.affinity.node_affinity]
|
||||
[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution]
|
||||
[[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms]]
|
||||
[[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms.match_expressions]]
|
||||
key = "org.example.reservation/app"
|
||||
operator = "In"
|
||||
values = [ "gitlab" ]
|
||||
[[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms.match_expressions]]
|
||||
key = "org.example.reservation/component"
|
||||
operator = "In"
|
||||
values = [ "runner" ]
|
||||
[[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution]]
|
||||
weight = 1
|
||||
[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution.preference]
|
||||
[[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution.preference.match_expressions]]
|
||||
key = "eks.amazonaws.com/capacityType"
|
||||
operator = "In"
|
||||
values = [ "ON_DEMAND" ]
|
||||
[runners.kubernetes.node_tolerations]
|
||||
"reservation/app=gitlab" = "NoSchedule"
|
||||
"reservation/component=runner" = "NoSchedule"
|
||||
|
||||
affinity:
|
||||
nodeAffinity:
|
||||
preferredDuringSchedulingIgnoredDuringExecution:
|
||||
- weight: 1
|
||||
preference:
|
||||
matchExpressions:
|
||||
- key: eks.amazonaws.com/capacityType
|
||||
operator: In
|
||||
values:
|
||||
- ON_DEMAND
|
||||
tolerations:
|
||||
- key: app
|
||||
operator: Equal
|
||||
value: gitlab
|
||||
- key: component
|
||||
operator: Equal
|
||||
value: runner
|
||||
podLabels:
|
||||
team: engineering
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
Gotchas:
|
||||
|
||||
- The _build_, _helper_ and multiple _service_ containers will all reside in a single pod.<br/>
|
||||
If **the sum** of the resources request by **all** of them is too high, it will **not** be scheduled and the pipeline
|
||||
will hang and fail.
|
||||
- If any pod is killed due to OOM, the pipeline that spawned it will hang until it times out.
|
||||
|
||||
Improvements:
|
||||
|
||||
- Keep the manager pod on stable nodes.
|
||||
|
||||
<details style="padding-bottom: 1em;">
|
||||
|
||||
```yaml
|
||||
affinity:
|
||||
nodeAffinity:
|
||||
preferredDuringSchedulingIgnoredDuringExecution:
|
||||
- weight: 1
|
||||
preference:
|
||||
matchExpressions:
|
||||
- key: eks.amazonaws.com/capacityType
|
||||
operator: In
|
||||
values:
|
||||
- ON_DEMAND
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
- Dedicate specific nodes to runner executors.<br/>
|
||||
Taint dedicated nodes and add tolerations and affinities to the runner's configuration.
|
||||
|
||||
<details style="padding-bottom: 1em;">
|
||||
|
||||
```toml
|
||||
[[runners]]
|
||||
[runners.kubernetes]
|
||||
|
||||
[runners.kubernetes.node_selector]
|
||||
gitlab = "true"
|
||||
"kubernetes.io/arch" = "amd64"
|
||||
|
||||
[runners.kubernetes.affinity]
|
||||
[runners.kubernetes.affinity.node_affinity]
|
||||
[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution]
|
||||
[[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms]]
|
||||
[[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms.match_expressions]]
|
||||
key = "app"
|
||||
operator = "In"
|
||||
values = [ "gitlab-runner" ]
|
||||
[[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms.match_expressions]]
|
||||
key = "customLabel"
|
||||
operator = "In"
|
||||
values = [ "customValue" ]
|
||||
|
||||
[[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution]]
|
||||
weight = 1
|
||||
|
||||
[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution.preference]
|
||||
[[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution.preference.match_expressions]]
|
||||
key = "eks.amazonaws.com/capacityType"
|
||||
operator = "In"
|
||||
values = [ "ON_DEMAND" ]
|
||||
|
||||
[runners.kubernetes.node_tolerations]
|
||||
"app=gitlab-runner" = "NoSchedule"
|
||||
"node-role.kubernetes.io/master" = "NoSchedule"
|
||||
"custom.toleration=value" = "NoSchedule"
|
||||
"empty.value=" = "PreferNoSchedule"
|
||||
onlyKey = ""
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
- Avoid massive resource consumption by defaulting to (very?) strict resource limits and `0` request.
|
||||
|
||||
<details style="padding-bottom: 1em;">
|
||||
|
||||
```toml
|
||||
[[runners]]
|
||||
[runners.kubernetes]
|
||||
cpu_request = "0"
|
||||
cpu_limit = "2"
|
||||
memory_request = "0"
|
||||
memory_limit = "2Gi"
|
||||
ephemeral_storage_request = "0"
|
||||
ephemeral_storage_limit = "512Mi"
|
||||
|
||||
helper_cpu_request = "0"
|
||||
helper_cpu_limit = "0.5"
|
||||
helper_memory_request = "0"
|
||||
helper_memory_limit = "128Mi"
|
||||
helper_ephemeral_storage_request = "0"
|
||||
helper_ephemeral_storage_limit = "64Mi"
|
||||
|
||||
service_cpu_request = "0"
|
||||
service_cpu_limit = "1"
|
||||
service_memory_request = "0"
|
||||
service_memory_limit = "0.5Gi"
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
- Play nice and make sure to leave some space for the host's other workloads by allowing for resource request and limit
|
||||
override only up to a point.
|
||||
|
||||
<details style="padding-bottom: 1em;">
|
||||
|
||||
```toml
|
||||
[[runners]]
|
||||
[runners.kubernetes]
|
||||
cpu_limit_overwrite_max_allowed = "15"
|
||||
cpu_request_overwrite_max_allowed = "15"
|
||||
memory_limit_overwrite_max_allowed = "62Gi"
|
||||
memory_request_overwrite_max_allowed = "62Gi"
|
||||
ephemeral_storage_limit_overwrite_max_allowed = "49Gi"
|
||||
ephemeral_storage_request_overwrite_max_allowed = "49Gi"
|
||||
|
||||
helper_cpu_limit_overwrite_max_allowed = "0.9"
|
||||
helper_cpu_request_overwrite_max_allowed = "0.9"
|
||||
helper_memory_limit_overwrite_max_allowed = "1Gi"
|
||||
helper_memory_request_overwrite_max_allowed = "1Gi"
|
||||
helper_ephemeral_storage_limit_overwrite_max_allowed = "1Gi"
|
||||
helper_ephemeral_storage_request_overwrite_max_allowed = "1Gi"
|
||||
|
||||
service_cpu_limit_overwrite_max_allowed = "3.9"
|
||||
service_cpu_request_overwrite_max_allowed = "3.9"
|
||||
service_memory_limit_overwrite_max_allowed = "15.5Gi"
|
||||
service_memory_request_overwrite_max_allowed = "15.5Gi"
|
||||
service_ephemeral_storage_limit_overwrite_max_allowed = "15Gi"
|
||||
service_ephemeral_storage_request_overwrite_max_allowed = "15Gi"
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
## Further readings
|
||||
|
||||
- [Gitlab]
|
||||
@@ -483,6 +772,7 @@ concurrent = 40
|
||||
- Gitlab's [docker machine] fork
|
||||
- Gitlab's [gitlab-runner-operator] for OpenShift and Kubernetes
|
||||
- [Docker Machine Executor autoscale configuration]
|
||||
- [Fleeting]
|
||||
|
||||
### Sources
|
||||
|
||||
@@ -492,6 +782,10 @@ concurrent = 40
|
||||
- [Install and register GitLab Runner for autoscaling with Docker Machine]
|
||||
- [AWS driver does not support multiple non default subnets]
|
||||
- [GitLab Runner Helm Chart]
|
||||
- [GitLab Runner Autoscaling]
|
||||
- [Autoscaling GitLab Runner on AWS EC2]
|
||||
- [Instance executor]
|
||||
- [Docker Autoscaler executor]
|
||||
|
||||
<!--
|
||||
Reference
|
||||
@@ -504,15 +798,21 @@ concurrent = 40
|
||||
|
||||
<!-- Files -->
|
||||
<!-- Upstream -->
|
||||
[docker executor]: https://docs.gitlab.com/17.0/runner/executors/docker.html
|
||||
[autoscaling gitlab runner on aws ec2]: https://docs.gitlab.com/runner/configuration/runner_autoscale_aws/
|
||||
[docker autoscaler executor]: https://docs.gitlab.com/runner/executors/docker_autoscaler.html
|
||||
[docker executor]: https://docs.gitlab.com/runner/executors/docker.html
|
||||
[docker machine executor autoscale configuration]: https://docs.gitlab.com/runner/configuration/autoscale.html
|
||||
[docker machine's aws driver's options]: https://gitlab.com/gitlab-org/ci-cd/docker-machine/-/blob/main/docs/drivers/aws.md#options
|
||||
[docker machine's supported cloud providers]: https://docs.gitlab.com/runner/configuration/autoscale.html#supported-cloud-providers
|
||||
[docker machine]: https://gitlab.com/gitlab-org/ci-cd/docker-machine
|
||||
[fleeting]: https://gitlab.com/gitlab-org/fleeting/fleeting
|
||||
[gitlab runner autoscaler]: https://docs.gitlab.com/runner/runner_autoscale/index.html#gitlab-runner-autoscaler
|
||||
[gitlab runner autoscaling]: https://docs.gitlab.com/runner/runner_autoscale/
|
||||
[gitlab runner helm chart]: https://docs.gitlab.com/runner/install/kubernetes.html
|
||||
[gitlab-runner-operator]: https://gitlab.com/gitlab-org/gl-openshift/gitlab-runner-operator
|
||||
[install and register gitlab runner for autoscaling with docker machine]: https://docs.gitlab.com/17.0/runner/executors/docker_machine.html
|
||||
[install and register gitlab runner for autoscaling with docker machine]: https://docs.gitlab.com/runner/executors/docker_machine.html
|
||||
[install gitlab runner]: https://docs.gitlab.com/runner/install/
|
||||
[instance executor]: https://docs.gitlab.com/runner/executors/instance.html
|
||||
[store registration tokens or runner tokens in secrets]: https://docs.gitlab.com/runner/install/kubernetes.html#store-registration-tokens-or-runner-tokens-in-secrets
|
||||
|
||||
<!-- Others -->
|
||||
|
||||
@@ -1,22 +1,28 @@
|
||||
#!/usr/bin/env sh
|
||||
|
||||
cloud-init status
|
||||
|
||||
# Check logs
|
||||
cat '/var/log/cloud-init-output.log'
|
||||
tail -f '/var/log/cloud-init.log' '/var/log/cloud-init-output.log'
|
||||
|
||||
##
|
||||
# Re-run everything.
|
||||
# Re-run everything
|
||||
##
|
||||
|
||||
# 1. Clean the existing configuration.
|
||||
# 1. Clean the existing configuration
|
||||
sudo cloud-init clean --logs
|
||||
|
||||
# 2. Detect local data sources.
|
||||
# 2. Detect local data sources
|
||||
sudo cloud-init init --local
|
||||
|
||||
# 3. Detect any data source requiring the network and run the 'initialization' modules.
|
||||
# 3. Detect any data source requiring the network and run the 'initialization' modules
|
||||
sudo cloud-init init
|
||||
|
||||
# 4. Run the 'configuration' modules.
|
||||
# 4. Run the 'configuration' modules
|
||||
sudo cloud-init modules --mode='config'
|
||||
|
||||
# 5. Run the 'final' modules.
|
||||
# 5. Run the 'final' modules
|
||||
sudo cloud-init modules -m 'final'
|
||||
|
||||
# All together now!
|
||||
|
||||
@@ -37,3 +37,12 @@ gitlab-runner verify -c '/etc/gitlab-runner/config.toml'
|
||||
gitlab-runner verify … --delete
|
||||
|
||||
diff -y <(helm show values 'gitlab/gitlab-runner' --version '0.64.2') <(helm show values 'gitlab/gitlab-runner' --version '0.68.1')
|
||||
|
||||
# Install plugins from the OCI registry distribution
|
||||
gitlab-runner fleeting install
|
||||
|
||||
# List plugins with version
|
||||
gitlab-runner fleeting list
|
||||
|
||||
# Sign in to private registries
|
||||
gitlab-runner fleeting login
|
||||
|
||||
40
snippets/pulumi/aws/userData.ts
Normal file
40
snippets/pulumi/aws/userData.ts
Normal file
@@ -0,0 +1,40 @@
|
||||
import * as cloudinit from "@pulumi/cloudinit";
|
||||
import * as pulumi from "@pulumi/pulumi";
|
||||
import * as yaml from 'yaml';
|
||||
|
||||
const userData = new cloudinit.Config(
|
||||
"userData",
|
||||
{
|
||||
gzip: false,
|
||||
base64Encode: false,
|
||||
parts: [
|
||||
{
|
||||
// docker on AmazonLinux 2023
|
||||
filename: "cloud-config.docker-engine.yml",
|
||||
mergeType: "dict(allow_delete,no_replace)+list(append)",
|
||||
contentType: "text/cloud-config",
|
||||
content: yaml.stringify({
|
||||
package_upgrade: false,
|
||||
packages: [
|
||||
"docker",
|
||||
"amazon-ecr-credential-helper",
|
||||
],
|
||||
write_files: [
|
||||
{
|
||||
path: "/root/.docker/config.json",
|
||||
permissions: "0644",
|
||||
content: `{ "credsStore": "ecr-login" }`,
|
||||
},
|
||||
],
|
||||
runcmd: [
|
||||
"systemctl daemon-reload",
|
||||
"systemctl enable --now docker.service",
|
||||
"grep docker /etc/group -q && usermod -a -G docker ec2-user"
|
||||
],
|
||||
}),
|
||||
},
|
||||
],
|
||||
},
|
||||
);
|
||||
|
||||
export userData.rendered;
|
||||
100
snippets/pulumi/userData.ts
Normal file
100
snippets/pulumi/userData.ts
Normal file
@@ -0,0 +1,100 @@
|
||||
import * as cloudinit from "@pulumi/cloudinit";
|
||||
import * as pulumi from "@pulumi/pulumi";
|
||||
import * as fs from 'fs';
|
||||
import * as yaml from 'yaml';
|
||||
|
||||
const gitlabUrl = "https://gitlab.example.org";
|
||||
const runnerToken = "glrt-…";
|
||||
|
||||
const userData = new cloudinit.Config(
|
||||
"userData",
|
||||
{
|
||||
gzip: false,
|
||||
base64Encode: false,
|
||||
parts: [
|
||||
{
|
||||
filename: "cloud-config.security-updates.yml",
|
||||
contentType: "text/cloud-config",
|
||||
content: yaml.stringify({
|
||||
write_files: [{
|
||||
path: "/etc/cron.daily/security-updates",
|
||||
permissions: "0755",
|
||||
content: [
|
||||
"#!/bin/bash",
|
||||
"dnf -y upgrade --security --nobest",
|
||||
].join("\n"),
|
||||
defer: true,
|
||||
}],
|
||||
}),
|
||||
mergeType: "dict(recurse_array,no_replace)+list(append)",
|
||||
},
|
||||
{
|
||||
filename: "cloud-config.docker.yml",
|
||||
contentType: "text/cloud-config",
|
||||
content: fs.readFileSync("./docker.yum.yaml", "utf8"),
|
||||
mergeType: "dict(recurse_array,no_replace)+list(append)",
|
||||
},
|
||||
{
|
||||
filename: "cloud-config.gitlab-runner.yml",
|
||||
mergeType: "dict(allow_delete,no_replace)+list(append)",
|
||||
contentType: "text/cloud-config",
|
||||
content: pulumi.all([ gitlabUrl, runnerToken ]).apply(
|
||||
([ gitlabUrl, runnerToken ]) => yaml.stringify({
|
||||
yum_repos: {
|
||||
"gitlab-runner": {
|
||||
name: "Gitlab Runner",
|
||||
baseurl: "https://packages.gitlab.com/runner/gitlab-runner/amazon/2023/$basearch",
|
||||
gpgcheck: true,
|
||||
gpgkey: [
|
||||
"https://packages.gitlab.com/runner/gitlab-runner/gpgkey",
|
||||
"https://packages.gitlab.com/runner/gitlab-runner/gpgkey/runner-gitlab-runner-4C80FB51394521E9.pub.gpg",
|
||||
"https://packages.gitlab.com/runner/gitlab-runner/gpgkey/runner-gitlab-runner-49F16C5CC3A0F81F.pub.gpg",
|
||||
].join("\n"),
|
||||
sslverify: true,
|
||||
sslcacert: "/etc/pki/tls/certs/ca-bundle.crt",
|
||||
metadata_expire: 300,
|
||||
},
|
||||
},
|
||||
write_files: [{
|
||||
path: "/etc/gitlab-runner/config.toml",
|
||||
permissions: "0600",
|
||||
content: [
|
||||
`concurrent = 1`,
|
||||
`check_interval = 0`,
|
||||
`shutdown_timeout = 0`,
|
||||
``,
|
||||
`[session_server]`,
|
||||
` session_timeout = 1800`,
|
||||
`[[runners]]`,
|
||||
` name = "runner autoscaler"`,
|
||||
` url = "${gitlabUrl}"`,
|
||||
` token = ${runnerToken}`,
|
||||
` executor = "sh"`,
|
||||
].join("\n"),
|
||||
}],
|
||||
packages: [ "gitlab-runner-17.4.0" ],
|
||||
runcmd: [
|
||||
"systemctl daemon-reload",
|
||||
"systemctl enable --now 'gitlab-runner'",
|
||||
],
|
||||
})
|
||||
),
|
||||
},
|
||||
{
|
||||
contentType: "text/cloud-config",
|
||||
content: yaml.stringify({
|
||||
package_upgrade: false,
|
||||
packages: [ "postgresql" ],
|
||||
runcmd: [
|
||||
"systemctl daemon-reload",
|
||||
"systemctl enable --now 'postgres'",
|
||||
]
|
||||
}),
|
||||
filename: "cloud-config.postgres.yml",
|
||||
mergeType: "dict(allow_delete,no_replace)+list(append)",
|
||||
},
|
||||
],
|
||||
},
|
||||
);
|
||||
|
||||
export userData.rendered;
|
||||
Reference in New Issue
Block a user