-
Notifications
You must be signed in to change notification settings - Fork 241
Gitlab CI Autoscaling Setup
I have set up an autoscaling Gitlab runner, like vg uses, to run multiple tests in parallel.
I am basically following the tutorial at https://docs.gitlab.com/runner/configuration/runner_autoscale_aws/
The tutorial has you create a "bastion" instance, on which you install the Gitlab Runner, using the "docker+machine" runner type. Then the bastion instance uses Docker Machine to create and destroy other instances to do the actual testing, as needed, but from the Gitlab side it looks like a single "runner" executing multiple tests.
I created a t2.micro
instance named gitlab-ci-bastion
, in the gitlab-ci-runner
security group, with the gitlab-ci-runner
IAM role, using the Ubuntu 18.04 image. I gave it a 20 GB root volume. I protected it from termination. It got IP address 54.218.250.217.
ssh ubuntu@54.218.250.217
I made sure to authorize the "ci" SSH key to access it, in ~/.ssh/authorized_keys.
Then I installed Gitlab Runner and Docker. I had to run each command separately; copy-pasting the whole block did not work.
curl -L https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.deb.sh | sudo bash
sudo apt-get -y -q install gitlab-runner
sudo apt-get -y -q install docker.io
sudo usermod -a -G docker gitlab-runner
sudo usermod -a -G docker ubuntu
Then I installed Docker Machine. Version 0.16.1 was current:
curl -L https://github.com/docker/machine/releases/download/v0.16.1/docker-machine-`uname -s`-`uname -m` >/tmp/docker-machine &&
chmod +x /tmp/docker-machine &&
sudo mv /tmp/docker-machine /usr/local/bin/docker-machine
Then I disconnected and ssh-d back in. At that point I could successfully run docker ps
.
Then I went and got the Gitlab registration token from the Gitlab web UI. I decided to register the runner to the DataBiosphere
group, instead of just the Toil project.
Then I registered the Gitlab Runner with the main Gitlab server, using the token instead of ##CENSORED##
.
sudo gitlab-ci-multi-runner register -n \
--url https://ucsc-ci.com/ \
--registration-token ##CENSORED## \
--executor docker+machine \
--description "docker-machine-runner" \
--docker-image "quay.io/vgteam/dind" \
--docker-privileged
As soon as the runner registered with the Gitlab server, I found it in the web UI and paused it, so it wouldn't start trying to run jobs until I had it configured properly.
I also at some point updated the packages on the bastion machine:
sudo apt update && sudo apt upgrade -y
I edited the /etc/gitlab-runner/config.toml
file to actually configure the runner. After a bit of debugging, I got it looking like this.
# Let the runner run 10 jobs in parallel
concurrent = 10
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "docker-machine-runner"
url = "https://ucsc-ci.com/"
# Leave the pre-filled value here from your config.toml, or replace
# with the registration token you are using if copy-pasting this one.
token = "##CENSORED##"
executor = "docker+machine"
# Run no more than 10 machines at a time.
limit = 10
[runners.docker]
tls_verify = false
# We reuse this image because it is Ubuntu with Docker
# available and virtualenv installed.
image = "quay.io/vgteam/vg_ci_prebake"
# t2.xlarge has 16 GB
memory = "15g"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.machine]
IdleCount = 0
IdleTime = 60
# Max builds per machine before recreating
MaxBuilds = 10
MachineDriver = "amazonec2"
MachineName = "gitlab-ci-machine-%s"
MachineOptions = [
"amazonec2-iam-instance-profile=gitlab-ci-runner",
"amazonec2-region=us-west-2",
"amazonec2-zone=a",
"amazonec2-use-private-address=true",
# Make sure to fill in your own owner details here!
"amazonec2-tags=Owner,anovak@soe.ucsc.edu,Name,gitlab-ci-runner-machine",
"amazonec2-security-group=gitlab-ci-runner",
"amazonec2-instance-type=t2.xlarge",
"amazonec2-root-size=80"
]
To enable this to work, I had to add some IAM policies to the gitlab-ci-runner
role. It already had the AWS built-in AmazonS3ReadOnlyAccess
, to let the tests read test data from S3. I gave it the AWS built-in AmazonEC2FullAccess
to allow the bastion to create the machines. I also gave it gitlab-ci-runner-passrole
, which I had to talk cluster-admin
into creating for me, which allows the bastion to pass on the gitlab-ci-runner
role to the machines it creates. That policy had the following contents:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::719818754276:role/gitlab-ci-runner"
}
]
}
After getting all the policies attached to the role, I rebooted the bastion machine to get it to actually start up the Gitlab Runner daemon:
sudo shutdown -r now
Then when it came back up I unpaused it in the Gitlab web interface, and it started running jobs. A few jobs failed, and to debug them I set the docker image to the vg_ci_prebake
that vg uses (to provide packages like python-virtualenv
) and added python3-dev
to the packages that that image carries.
To make more changes to the image, commit to https://github.com/vgteam/vg_ci_prebake and Quay will automatically rebuild it. If you don't have rights to do that and don't want to wait around for a PR, clone the repo, edit it, and make a new Quay project to build your own version.
One change I have not yet made might be to set a high output_limit
as described in https://stackoverflow.com/a/53541010 in case the CI logs get too long.
I also have not yet destroyed the old shell runner. I want to leave it in place until we are confident in the new system.