Skip to content

Commit

Permalink
Add infrastructure for on-demand ARM64 runners on AWS (#1569)
Browse files Browse the repository at this point in the history
* Add infrastructure for on-demand ARM64 runners on AWS

With this change, ARM64 release artifacts will be built automatically by
a GitHub workflow. Since GitHub doesn't offer hosted runners running on
ARM64, we're spinning up an EC2 spot instance on demand and run the jobs
building ARM64 artifacts there.

As a fun side note, the Terraform infrastructure code is written
entirely in Nickel.

* Remove unused `update-github` script

* Address comments from code review

* Address comments from code review
  • Loading branch information
vkleen authored Sep 4, 2023
1 parent 305d1a4 commit 0dd1e10
Show file tree
Hide file tree
Showing 14 changed files with 877 additions and 12 deletions.
119 changes: 108 additions & 11 deletions .github/workflows/release-artifacts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,85 @@ on:
description: "The release tag to target"

permissions:
id-token: write
contents: write
packages: write

jobs:
start-runner:
name: Start EC2 runner
runs-on: ubuntu-latest
outputs:
instance_id: ${{ steps.invoke-start.outputs.INSTANCE_ID }}
steps:
- uses: aws-actions/configure-aws-credentials@v3
with:
role-to-assume: ${{ secrets.EC2_ROLE }}
aws-region: ${{ vars.EC2_REGION }}
- name: Start EC2 instance
id: invoke-start
env:
GH_TOKEN: ${{ secrets.GH_TOKEN_FOR_UPDATES }}
EC2_START: ${{ secrets.EC2_START }}
run: |
RUNNER_TOKEN=$(gh api -X POST -q '.token' /repos/${{ github.repository }}/actions/runners/registration-token)
aws lambda invoke \
--cli-binary-format raw-in-base64-out \
--function-name "$EC2_START" \
--payload '{"ref_name":"${{ github.ref_name }}","runner_token":"'"${RUNNER_TOKEN}"'"}' \
response.json
INSTANCE_ID=$(jq -r '.body.instance_id' < response.json)
echo "INSTANCE_ID=${INSTANCE_ID}" >>"$GITHUB_OUTPUT"
echo "Got EC2 instance ${INSTANCE_ID}"
echo 'Waiting for GitHub runner to start'
while [[ -z "$(gh api /repos/${{ github.repository }}/actions/runners | jq '.runners[] | select(.name == "ec2-spot")')" ]]; do
sleep 60
done
echo 'Done 🎉'
stop-runner:
name: Stop EC2 runner
runs-on: ubuntu-latest
# Ensure that `stop-runner` will always stop the EC2 instance, even if other jobs failed or were canceled
if: ${{ always() }}
needs:
- start-runner
- docker-multiplatform-image
- static-binary
steps:
- uses: aws-actions/configure-aws-credentials@v3
with:
role-to-assume: ${{ secrets.EC2_ROLE }}
aws-region: ${{ vars.EC2_REGION }}
- name: Delete GitHub Runner
env:
GH_TOKEN: ${{ secrets.GH_TOKEN_FOR_UPDATES }}
run: |
RUNNER_ID=$(gh api /repos/${{ github.repository }}/actions/runners | jq '.runners[] | select(.name == "ec2-spot") | .id')
if [[ -n "${RUNNER_ID}" ]]; then
gh api -X DELETE /repos/${{ github.repository }}/actions/runners/${RUNNER_ID}
fi
- name: Lambda Invoke Stop
env:
EC2_STOP: ${{ secrets.EC2_STOP }}
run: |
aws lambda invoke \
--cli-binary-format raw-in-base64-out \
--function-name "$EC2_STOP" \
--payload '{"instance_id":"${{ needs.start-runner.outputs.instance_id }}"}' \
response.json
cat response.json
docker-image:
name: "Build docker image"
runs-on: "ubuntu-latest"
strategy:
matrix:
os:
- runs-on: ubuntu-latest
architecture: x86_64
- runs-on: [EC2, ARM64, Linux]
architecture: arm64
runs-on: ${{ matrix.os.runs-on }}
steps:
- uses: actions/checkout@v3
with:
Expand All @@ -30,15 +102,15 @@ jobs:
name: "Build docker image"
run: |
nix build --print-build-logs .#dockerImage
cp ./result nickel-docker-image.tar.gz
cp ./result nickel-${{ matrix.os.architecture }}-docker-image.tar.gz
echo "imageName=$(nix eval --raw .#dockerImage.imageName)" >> "$GITHUB_OUTPUT"
echo "imageTag=$(nix eval --raw .#dockerImage.imageTag)" >> "$GITHUB_OUTPUT"
- name: "Upload docker image as release asset"
env:
GH_TOKEN: ${{ github.token }}
RELEASE_TAG: ${{ github.event_name == 'release' && github.event.release.tag_name || github.event.inputs.release_tag }}
run: |
gh release upload --clobber $RELEASE_TAG nickel-docker-image.tar.gz
gh release upload --clobber $RELEASE_TAG nickel-${{ matrix.os.architecture }}-docker-image.tar.gz
- name: Log in to registry
# This is where you will update the personal access token to GITHUB_TOKEN
run: echo "${{ secrets.GITHUB_TOKEN }}" | docker login ghcr.io -u $ --password-stdin
Expand All @@ -47,13 +119,38 @@ jobs:
RELEASE_TAG: ${{ github.event_name == 'release' && github.event.release.tag_name || github.event.inputs.release_tag }}
TARBALL_TAG: ${{ steps.build-image.outputs.imageName }}:${{ steps.build-image.outputs.imageTag }}
run: |
docker load -i nickel-docker-image.tar.gz
docker tag "$TARBALL_TAG" ghcr.io/tweag/nickel:$RELEASE_TAG
docker push ghcr.io/tweag/nickel:$RELEASE_TAG
docker load -i nickel-${{ matrix.os.architecture }}-docker-image.tar.gz
docker tag "$TARBALL_TAG" ghcr.io/tweag/nickel:$RELEASE_TAG-${{ matrix.os.architecture}}
docker push ghcr.io/tweag/nickel:$RELEASE_TAG-${{ matrix.os.architecture}}
docker-multiplatform-image:
name: "Assemble multi-platform Docker image"
runs-on: ubuntu-latest
needs: docker-image
steps:
- name: Log in to registry
run: echo "${{ secrets.GITHUB_TOKEN }}" | docker login ghcr.io -u $ --password-stdin
- name: Assemble and push image
env:
RELEASE_TAG: ${{ github.event_name == 'release' && github.event.release.tag_name || github.event.inputs.release_tag }}
run: |
docker manifest create \
ghcr.io/tweag/nickel:$RELEASE_TAG \
--amend ghcr.io/tweag/nickel:$RELEASE_TAG-x86_64 \
--amend ghcr.io/tweag/nickel:$RELEASE_TAG-arm64 \
docker manifest push ghcr.io/tweag/nickel:$RELEASE_TAG
static-binary:
name: "Build Nickel release binary"
runs-on: "ubuntu-latest"
strategy:
matrix:
os:
- runs-on: ubuntu-latest
architecture: x86_64
- runs-on: [EC2, ARM64, Linux]
architecture: arm64
runs-on: ${{ matrix.os.runs-on }}
steps:
- uses: actions/checkout@v3
with:
Expand All @@ -65,13 +162,13 @@ jobs:
experimental-features = nix-command flakes
accept-flake-config = true
nix_path: "nixpkgs=channel:nixos-unstable"
- name: "Build x86_64 static binary"
- name: "Build static binary"
run: |
nix build --print-build-logs .#nickel-static
cp ./result/bin/nickel nickel-x86_64-linux
- name: "Upload x86_64 static binary as release asset"
cp ./result/bin/nickel nickel-${{ os.matrix.architecture }}-linux
- name: "Upload static binary as release asset"
env:
GH_TOKEN: ${{ github.token }}
RELEASE_TAG: ${{ github.event_name == 'release' && github.event.release.tag_name || github.event.inputs.release_tag }}
run: |
gh release upload --clobber $RELEASE_TAG nickel-x86_64-linux
gh release upload --clobber $RELEASE_TAG nickel-${{ os.matrix.architecture }}-linux
36 changes: 35 additions & 1 deletion flake.nix
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,8 @@
snapFilter = mkFilter ".*snap$";
scmFilter = mkFilter ".*scm$";
importsFilter = mkFilter ".*/core/tests/integration/imports/imported/.*$"; # include all files that are imported in tests

infraFilter = mkFilter ".*/infra/.*$";
in
pkgs.lib.cleanSourceWith {
src = pkgs.lib.cleanSource ./.;
Expand All @@ -187,7 +189,9 @@
scmFilter
filterCargoSources
importsFilter
];
] && !(builtins.any (filter: filter path type) [
infraFilter
]);
};

# Given a rust toolchain, provide Nickel's Rust dependencies, Nickel, as
Expand Down Expand Up @@ -466,6 +470,35 @@
'';
};

infraShell = nickel:
let
terraform = pkgs.terraform.withPlugins (p: with p; [
archive
aws
github
]);
ec2-region = "eu-north-1";
ec2-ami = (import "${nixpkgs}/nixos/modules/virtualisation/amazon-ec2-amis.nix").latest.${ec2-region}.aarch64-linux.hvm-ebs;
run-terraform = pkgs.writeShellScriptBin "run-terraform" ''
set -e
${nickel}/bin/nickel export > main.tf.json <<EOF
((import "main.ncl") & {
region = "${ec2-region}",
nixos-ami = "${ec2-ami}",
}).config
EOF
${terraform}/bin/terraform "$@"
'';

update-infra = pkgs.writeShellScriptBin "update-infra" ''
set -e
${run-terraform}/bin/run-terraform init
GITHUB_TOKEN="$(${pkgs.gh}/bin/gh auth token)" ${run-terraform}/bin/run-terraform apply
'';
in
pkgs.mkShell {
buildInputs = [ terraform run-terraform update-infra ];
};
in
rec {
packages = {
Expand Down Expand Up @@ -502,6 +535,7 @@
value = makeDevShell { rust = mkRust { inherit channel; rustProfile = "default"; targets = [ "wasm32-unknown-unknown" ]; }; };
})) // {
default = devShells.stable;
infra = infraShell packages.nickel-lang-cli;
};

checks = {
Expand Down
3 changes: 3 additions & 0 deletions infra/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
.terraform*
build
main.tf.json
56 changes: 56 additions & 0 deletions infra/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# GitHub Runner Infrastructure

If you make any changes to the infrastructure code in this directory, you will
have to redeploy it. Do the following:

1. Make sure you're logged into AWS. You can check using `awscli2`:

```console
❯ nix run nixpkgs#awscli2 -- sts get-caller-identity
{
# CENSORED
}
```

If this fails, log in with AWS SSO credentials, following [their guide][aws-sso-guide].

2. Make sure you're logged into GitHub. You can check using `gh`:

```console
❯ nix run github:nixos/nixpkgs#gh -- auth status
github.com
# CENSORED
✓ Token scopes: gist, read:org, repo
```

If this fails, log in using `nix run nixpkgs#gh -- auth login` and follow
the instructions.

3. Update the infrastructure using

```console
nix develop ..#infra -c update-infra
```

## Architecture

The code in this subdirectory provisions AWS infrastucture for starting an
ARM64 GitHub Actions runner on demand. The workflow for producing ARM64 release
artifacts is as follows:

- the release workflow is triggered automatically when a release is created or
manually for testing
- the workflow requests a runner registration token `$TOKEN` from the GitHub
API. For this, it needs a personal access token with `repo` scope for the Nickel
repository.
- the workflow invokes the `$EC2_START` AWS Lambda and provides `$TOKEN` as
input
- the AWS Lambda stores `$TOKEN` as a parameter in the AWS SSM and requests an
appropriate EC2 spot instance
- the spot instance boots up, retrieves `$TOKEN` from AWS SSM and starts a
GitHub Actions runner
- GitHub Actions schedules the ARM64 jobs on the spot instance
- when the jobs building the release artifact have finished, the workflow
invokes the `$EC2_STOP` AWS Lambda which terminates the EC2 instance

[aws-sso-guide]: https://docs.aws.amazon.com/cli/latest/userguide/sso-configure-profile-token.html
49 changes: 49 additions & 0 deletions infra/github-oidc.ncl
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
{
naming_prefix | String,
github = {
owner | String,
repo | String,
ec2_role = "${resource.aws_iam_role.invoke_lambda_role.arn}",
},
lambda.invoke_policy | String,
config = {
resource.aws_iam_openid_connect_provider.github_oidc = {
url = "https://token.actions.githubusercontent.com",
client_id_list = [
"sts.amazonaws.com",
],
thumbprint_list
| doc m%"
Thumbprints are provided by GitHub, see
[https://github.blog/changelog/2023-06-27-github-actions-update-on-oidc-integration-with-aws/]
This should be kept sorted to prevent apparent Terraform drift
"%
= [
"1c58a3a8518e8759bf075b76b750d4f2df264fcd",
"6938fd4d98bab03faadb97b34396831e3780aea1",
],
},

resource.aws_iam_role.invoke_lambda_role = {
name = "%{naming_prefix}-invoke-lambda-role",
managed_policy_arns = [lambda.invoke_policy],
assume_role_policy =
std.serialize
'Json
{
Version = "2012-10-17",
Statement = [
{
Principal.Federated = "${resource.aws_iam_openid_connect_provider.github_oidc.id}",
Action = "sts:AssumeRoleWithWebIdentity",
Condition = {
StringLike."token.actions.githubusercontent.com:sub" = "repo:%{github.owner}/%{github.repo}:ref:refs/tags/*",
StringEquals."token.actions.githubusercontent.com:aud" = "sts.amazonaws.com",
},
Effect = "Allow",
}
],
},
},
}
}
Loading

0 comments on commit 0dd1e10

Please sign in to comment.