Skip to content

Commit

Permalink
Merge pull request #123 from DFE-Digital/2041-configure-bigquery-in-t…
Browse files Browse the repository at this point in the history
…he-terraform-module

[2041] dfe_analytics module
  • Loading branch information
saliceti authored Oct 22, 2024
2 parents 398fda7 + 0fcc184 commit 8e53a3d
Show file tree
Hide file tree
Showing 10 changed files with 698 additions and 0 deletions.
64 changes: 64 additions & 0 deletions aks/dfe_analytics/.terraform.lock.hcl

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

123 changes: 123 additions & 0 deletions aks/dfe_analytics/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# DfE Analytics
Create resources in Google cloud Bigquery and provides the required variables to applications so they can send events.

## Examples
### Reuse existing dataset and events table

```hcl
module "dfe_analytics" {
source = "./vendor/modules/dfe-terraform-modules//aks/dfe_analytics"
azure_resource_prefix = var.azure_resource_prefix
cluster = var.cluster
namespace = var.namespace
service_short = var.service_short
environment = var.environment
gcp_dataset = "events_${var.config}"
gcp_project_id = "apply-for-qts-in-england"
gcp_project_number = 385922361840
}
```

### Create new dataset and events table
Use for a new environment. To get the values for `gcp_taxonomy_id` and `gcp_policy_tag_id` see [Taxonomy and policy tag](#taxonomy-and-policy-tag).
```hcl
module "dfe_analytics" {
source = "./vendor/modules/dfe-terraform-modules//aks/dfe_analytics"
azure_resource_prefix = var.azure_resource_prefix
cluster = var.cluster
namespace = var.namespace
service_short = var.service_short
environment = var.environment
gcp_keyring = "afqts-key-ring"
gcp_key = "afqts-key"
gcp_project_id = "apply-for-qts-in-england"
gcp_project_number = 385922361840
gcp_taxonomy_id = 5456044749211275650
gcp_policy_tag_id = 2399328962407973209
}
```

### Configure application
#### Enable in Ruby
```ruby
DfE::Analytics.configure do |config|
...
config.azure_federated_auth = ENV.include? "GOOGLE_CLOUD_CREDENTIALS"
end
```

#### Enable in .NET
```cs
builder.Services.AddDfeAnalytics()
.UseFederatedAksBigQueryClientProvider();
```
Ensure the `ProjectNumber`, `WorkloadIdentityPoolName`, `WorkloadIdentityPoolProviderName` and `ServiceAccountEmail` configuration keys are populated within the `DfeAnalytics` configuration section.

#### Variables
Each variable is available as a separate output. For convenience, the `variables_map` output provides them all:
- BIGQUERY_PROJECT_ID
- BIGQUERY_TABLE_NAME
- BIGQUERY_DATASET
- GOOGLE_CLOUD_CREDENTIALS

```hcl
module "application_configuration" {
source = "./vendor/modules/dfe-terraform-modules//aks/application_configuration"
...
secret_variables = merge(
module.dfe_analytics.variables_map,
{
...
}
)
}
```

#### Enable on each app that requires it
```hcl
module "worker_application" {
source = "./vendor/modules/dfe-terraform-modules//aks/application"
...
enable_gcp_wif = true
}
```

## Authentication - Command line
The user should have Owner role on the Google project.

- Run `gcloud auth application-default login`
- Run terraform

## Authentication - Github actions
We set up workfload identity federation on the Google side and configure the workflow. The user should have Owner role on the Google project. This is done once per repository.

- Run the `authorise_workflow.sh` located in *aks/dfe_analytics*:
```
./authorise_workflow.sh PROJECT_ID REPO
```
Example:
```
./authorise_workflow.sh apply-for-qts-in-england apply-for-qualified-teacher-status
```
- The script shows the *permissions* and *google-github-actions/auth step* to add to the workflow job
- Adding the permission removes the [default token permissions](https://docs.github.com/en/actions/security-for-github-actions/security-guides/automatic-token-authentication#permissions-for-the-github_token), which may be an issue for some actions that rely on them. For example, the [marocchino/sticky-pull-request-comment](https://github.com/marocchino/sticky-pull-request-comment) action requires `pull-requests: write`. It must then be added explicitly.
- Run the workflow

## Taxonomy and policy tag
The user should have Owner role on the Google project.

- Authenticate: `gcloud auth application-default login`
- Get projects list: `gcloud projects list`
- Select project e.g.: `gcloud config set project apply-for-qts-in-england`
- Get taxonomies list:
```
gcloud data-catalog taxonomies list --location=europe-west2 --format="value(name)"
```
The path contains the taxonomy id as a number e.g. 5456044749211275650
- Get policy tags e.g.:
```
gcloud data-catalog taxonomies policy-tags list --taxonomy="projects/apply-for-qts-in-england/locations/europe-west2/taxonomies/5456044749211275650" --location="europe-west2" --filter="displayName:hidden" --format="value(name)"
```
The path contains the policy tag id as a number e.g. 2399328962407973209
89 changes: 89 additions & 0 deletions aks/dfe_analytics/authorise_workflow.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
#!/usr/bin/env bash
# Set up Direct Workload Identity Federation
# See https://github.com/google-github-actions/auth?tab=readme-ov-file#preferred-direct-workload-identity-federation

PROJECT_ID=$1
REPO=$2

if [[ -z "$PROJECT_ID" || -z "$REPO" ]]; then
cat <<EOF
Set up Direct Workload Identity Federation between Github action workflows from a repository and GCP for setting up Bigquery. The user must have the 'Owner' role on the project.
Usage: ./authorise_workflow.sh PROJECT_ID REPO - Example: ./authorise_workflow.sh apply-for-qts-in-england apply-for-qualified-teacher-status
EOF
exit 1
fi

set -eu

GITHUB_ORG=DFE-Digital
ORG_REPO=${GITHUB_ORG}/${REPO}
# The pool name must be up to 32 characters
WORKLOAD_ID="${REPO:0:32}"

echo Login to Google cloud. The user must have the Owner role on the project.
gcloud auth application-default login

echo "Create ${WORKLOAD_ID} workload identity pool"
gcloud iam workload-identity-pools create "${WORKLOAD_ID}" \
--project="${PROJECT_ID}" \
--location="global" \
--display-name="${WORKLOAD_ID}"

WORKLOAD_IDENTITY_POOL_ID=$(gcloud iam workload-identity-pools describe "${WORKLOAD_ID}" \
--project="${PROJECT_ID}" \
--location="global" \
--format="value(name)")

echo WORKLOAD_IDENTITY_POOL_ID=$WORKLOAD_IDENTITY_POOL_ID

echo "Create ${WORKLOAD_ID} workload identity pool provider"
gcloud iam workload-identity-pools providers create-oidc "${WORKLOAD_ID}" \
--project="${PROJECT_ID}" \
--location="global" \
--workload-identity-pool="${WORKLOAD_ID}" \
--display-name="${WORKLOAD_ID}" \
--attribute-mapping="google.subject=assertion.sub,attribute.actor=assertion.actor,attribute.repository=assertion.repository,attribute.repository_owner=assertion.repository_owner" \
--attribute-condition="assertion.repository_owner == '${GITHUB_ORG}' && attribute.repository == '${ORG_REPO}' " \
--issuer-uri="https://token.actions.githubusercontent.com"

echo Get workload identity pool provider id
WORKLOAD_IDENTITY_POOL_PROVIDER_ID=$(gcloud iam workload-identity-pools providers describe "${WORKLOAD_ID}" \
--project="${PROJECT_ID}" \
--location="global" \
--workload-identity-pool="${WORKLOAD_ID}" \
--format="value(name)")

echo Bind role roles/iam.serviceAccountCreator
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
--role="roles/iam.serviceAccountAdmin" \
--member="principalSet://iam.googleapis.com/${WORKLOAD_IDENTITY_POOL_ID}/attribute.repository/${ORG_REPO}"

echo Bind role roles/bigquery.admin
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
--role="roles/bigquery.admin" \
--member="principalSet://iam.googleapis.com/${WORKLOAD_IDENTITY_POOL_ID}/attribute.repository/${ORG_REPO}"

echo Bind role roles/dataplex.taxonomyViewer
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
--role="roles/dataplex.taxonomyViewer" \
--member="principalSet://iam.googleapis.com/${WORKLOAD_IDENTITY_POOL_ID}/attribute.repository/${ORG_REPO}"

echo Bind role roles/cloudkms.viewer
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
--role="roles/cloudkms.viewer" \
--member="principalSet://iam.googleapis.com/${WORKLOAD_IDENTITY_POOL_ID}/attribute.repository/${ORG_REPO}"

echo
echo Now add this step to the workflow to authenticate to Google:
cat <<EOF
deploy_job:
permissions:
id-token: write
...
...
steps:
- uses: google-github-actions/auth@v2
with:
project_id: ${PROJECT_ID}
workload_identity_provider: ${WORKLOAD_IDENTITY_POOL_PROVIDER_ID}
EOF
11 changes: 11 additions & 0 deletions aks/dfe_analytics/data.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
module "cluster_data" {
source = "../cluster_data"
name = var.cluster
}

data "azurerm_client_config" "current" {}

data "azurerm_user_assigned_identity" "gcp_wif" {
name = "${var.azure_resource_prefix}-gcp-wif-${var.cluster}-${var.namespace}-id"
resource_group_name = module.cluster_data.configuration_map.resource_group_name
}
Loading

0 comments on commit 8e53a3d

Please sign in to comment.