Gitlab CI Autoscaling Setup

I have set up an autoscaling Gitlab runner, like vg uses, to run multiple tests in parallel.


I am basically following the tutorial at

The tutorial has you create a "bastion" instance, on which you install the Gitlab Runner, using the "docker+machine" runner type. Then the bastion instance uses Docker Machine to create and destroy other instances to do the actual testing, as needed, but from the Gitlab side it looks like a single "runner" executing multiple tests.

I created a t2.micro instance named gitlab-ci-bastion, in the gitlab-ci-runner security group, with the gitlab-ci-runner IAM role, using the Ubuntu 18.04 image. I gave it a 20 GB root volume. I protected it from termination. It got IP address

I made sure to authorize the "ci" SSH key to access it, in ~/.ssh/authorized_keys.

Then I installed Gitlab Runner and Docker. I had to run each command separately; copy-pasting the whole block did not work.

curl -L | sudo bash

sudo apt-get -y -q install gitlab-runner

sudo apt-get -y -q install

sudo usermod -a -G docker gitlab-runner 

sudo usermod -a -G docker ubuntu 

Then I installed Docker Machine. Version 0.16.1 was current:

curl -L`uname -s`-`uname -m` >/tmp/docker-machine &&
chmod +x /tmp/docker-machine &&
sudo mv /tmp/docker-machine /usr/local/bin/docker-machine

Then I disconnected and ssh-d back in. At that point I could successfully run docker ps.

Then I went and got the Gitlab registration token from the Gitlab web UI. I decided to register the runner to the DataBiosphere group, instead of just the Toil project.

Then I registered the Gitlab Runner with the main Gitlab server, using the token.

sudo gitlab-ci-multi-runner register -n \
  --url \
  --executor docker+machine \
  --description "docker-machine-runner" \
  --docker-image "" \

As soon as the runner registered with the Gitlab server, I found it in the web UI and paused it, so it wouldn't start trying to run jobs until I had it configured properly.

I also at some point updated the packages on the bastion machine:

sudo apt update && sudo apt upgrade -y

I edited the /etc/gitlab-runner/config.toml file to actually configure the runner. After a bit of debugging, I got it looking like this.

# Let the runner run 10 jobs in parallel
concurrent = 10
check_interval = 0

  session_timeout = 1800

  name = "docker-machine-runner"
  url = ""
  # Leave the pre-filled value here from your config.toml, or replace
  # with the registration token you are using if copy-pasting this one.
  executor = "docker+machine"
  # Run no more than 10 machines at a time.
  limit = 10
    tls_verify = false
    # We reuse this image because it is Ubuntu with Docker 
    # available and virtualenv installed.
    image = ""
    # t2.xlarge has 16 GB
    memory = "15g"
    privileged = true
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    shm_size = 0
    IdleCount = 0
    IdleTime = 60
    # Max builds per machine before recreating
    MaxBuilds = 10
    MachineDriver = "amazonec2"
    MachineName = "gitlab-ci-machine-%s"
    MachineOptions = [
      # Make sure to fill in your own owner details here!

To enable this to work, I had to add some IAM policies to the gitlab-ci-runner role. It already had the AWS built-in AmazonS3ReadOnlyAccess, to let the tests read test data from S3. I gave it the AWS built-in AmazonEC2FullAccess to allow the bastion to create the machines. I also gave it gitlab-ci-runner-passrole, which I had to talk cluster-admin into creating for me, which allows the bastion to pass on the gitlab-ci-runner role to the machines it creates. That policy had the following contents:

   "Version": "2012-10-17",
   "Statement": [
           "Sid": "VisualEditor0",
           "Effect": "Allow",
           "Action": "iam:PassRole",
           "Resource": "arn:aws:iam::719818754276:role/gitlab-ci-runner"

After getting all the policies attached to the role, I rebooted the bastion machine to get it to actually start up the Gitlab Runner daemon:

sudo shutdown -r now

Then when it came back up I unpaused it in the Gitlab web interface, and it started running jobs. A few jobs failed, and to debug them I set the docker image to the vg_ci_prebake that vg uses (to provide packages like python-virtualenv) and added python3-dev to the packages that that image carries.

Docker Maintenance

To make more changes to the image, commit to and Quay will automatically rebuild it. If you don't have rights to do that and don't want to wait around for a PR, clone the repo, edit it, and make a new Quay project to build your own version.

Future Work

One change I have not yet made might be to set a high output_limit as described in in case the CI logs get too long.

I also have not yet destroyed the old shell runner. I want to leave it in place until we are confident in the new system.