This repo demonstrates how to deliver a Modern Data Warehouse using Azure and Terraform.
- An Azure Subscription
- An Azure DevOps Organisation
- Visual Studio Community Edition with SQL Server Data Tools (SSDT)
git
,az
, andterraform
installed in your local development environment.
NB: This example focuses on orchestration of data movement and transformation with the Synapse Workspace. yaml templates are included to deliver similar functionality using Azure Data Factory (see ./.ado/templates/jobs/adf-*.yml
), but are not used in this example.
The above diagram shows a high level architecture of the services that are defined in the Terraform scripts, and the CI/CD process that you can implement using the yaml scripts in the .ado
folder. The development and test environments are deployed from the same .tf
files, using Terraform workspaces. The environments can be deployed using the Azure DevOps pipelines included, or using terraform
in your local environment.
The instructions for configuring and deploying this environment are broadly chunked into 3 parts; deployment of shared infrastructure (for terraform state), configuration of Azure DevOps, and then deployment of the infrastructure itself.
- Import this repository into a new Azure DevOps repo in your organisation. Instructions for doing this are available in the docs.
- Open your terminal and clone the repo from your Azure DevOps repo to your local development environment using
git clone
. - Run
az login
to authenticate to Azure, and select the subscription you want to deploy these resources into withaz account set
cd
into the./sharedinfra
directory- Open the
storage.tf
file and update the shared storage account name (line 2) - Deploy the shared resources for the terraform state by running
terraform init
to initialize your terraform environment,terraform plan
to see what will be deployed, andterraform apply
to deploy the shared resources.
This will deploy:
- a resource group to hold these shared resources
- a storage account which will be used to store the terraform state files
- a Key Vault which will hold secrets to access the storage account
- a Service Principal to be used to deploy the infrastructure via the Azure DevOps pipeline
- an Active Directory Group which will be used to manage access into the Synapse Workspace
- We will use the Microsoft Terraform extension for Azure DevOps to enable us to install and run terraform tasks with Azure Pipelines. Navigate to the Azure DevOps Marketplace and find the
Terraform
extension from Microsoft DevLabs link and clickGet it free
. Install the extension in the organisation you are working in.
- Navigate to your Azure DevOps project and click "Pipelines", then add a new pipeline by clicking "New Pipeline"
- Choose "Azure Repos Git YAML", then select the repo in your project. Choose "Existing Azure Pipelines YAML file, leave branch as
main
and select/.ado/terraform-status-check.yml
. - Save the pipeline. Run through the above steps 4 more times to create the other 4 pipelines (
terraform-plan
,terraform-apply
,synapse-ci
andsynapse-cd
).
- Navigate to Project settings > Service connections.
- Create a new
Azure Resource Manager
service connection, using Service Principal (Manual). Scope it to the subscription you will be using the for the deployment (insertid
andname
values from the publish settings file you can access for your subscription) - Input the service principal id and tenant from the terraform output from the
Deployment of shared infrastructure
step. You can view these again by runningterraform output
in the./sharedinfra
directory if needed. Get the client secret from the Key Vault deployment - instructions to get secret value from key vault here. Take a note of the service connection name - you will need this in the next step to update the pipelines to make use of this service connection.
- Navigate to Pipelines > Library > + Add New Variable Group.
- Name your variable group
mdw-shared-westeurope-01
. - Select the
Link secrets from an Azure key vault as variables
toggle, and then select the Key Vault deployed as part of the shared infrastructure in theDeployment of shared infrastructure
steps. - Under Variables, click
Add +
. Select all the secrets in the vault with the checkboxes, and click ok. - Click Save.
- Navigate to Repos > Branches. On the
main
branch, click More options (this is an elipsis on the right hand side when you hover over the branch) and select Branch policies. - Click the + on the build validation section. Select your
terraform-status-check
pipeline as the build pipeline. Type/iac/*
into the path filter (this means that this will only be required for changes to our data platform infrastructure). Set trigger to automatic, Policy requirement as required, and for build to expire Immediately when main is updated. - Click save.
-
Return to your local version of the repo. Create a new working branch with
git checkout -b <branch-name>
. Open the repo in your text editor of choice and make the following changes:File Change ./iac/backend.tf
Update the Azure_rm
backendstorage_account_name
on line 4, to the name of the shared infrastructure storage account you deployed in the previous step.terraform-apply.yml
Update the project
andpipeline
on line 32 and 33.project
should be the name of your Azure DevOps project. Pipeline should be the pipeline id of yourterraform-plan
pipeline (you can get these by opening the pipeline in your web browser and looking at the URL - e.g.https://dev.azure.com/<organisation>/<project>/_build?definitionId=<pipline id>
).synapse-ci.yml
Update the repository name on line 10. Format is projectname/reponame
.synapse-cd.yml
Update the project
(project name) andpipeline
(pipeline id for thesynapse-ci
pipeline) variables on lines 17 and 19. Note the empty variable fordevWorkspaceName
on line 21 - we will come back and update this once the infrastructure has been deployed. Update theserviceConnection
parameter with the service connection name that you created in the earlier steps. -
Commit the changes to your branch and push the changes to your Azure DevOps repo. Open a pull request from your development branch to the
main
branch. This should trigger theterraform-status-check
pipeline set up in earlier steps. Wait for the status check to pass, then when successful, Approve the PR and merge tomain
.
You can now deploy the development infrastructure from the DevOps pipeline.
- Open the
terraform-plan
pipeline and manually trigger the pipeline, selectingdev
as the parameter. This will deploy the infrastructure for your dev environment. NB: when you run these pipeline the first time, you will be prompted for permission for the pipeline to access Azure DevOps resources (i.e. the Service Connection). View and enable these permissions to start the job. - To deploy the test environment, manually trigger the
terraform-status-check
pipeline, selectingtest
as the parameter. This initializes the terraform state file in the remote storage. After this has run, open theterraform-plan
pipeline and manually trigger the pipeline, selectingtest
as the parameter. This will deploy the test environment.
There are two additional steps you need to complete to finish the environment setup:
- Link the development Synapse workspace with your DevOps repo. As per Synapse DevOps best practice, you should only integrate your development workspace with your git repo. Follow these instructions in the docs to link your repo to the Synapse workspace.
- Update the CD pipeline for Synapse artefacts with the name of the dev workspace. As discussed earlier in this README, the Synapse CD template needs to reference the development workspace name in the artefacts as part of the deployment process.
- Navigate to your development resource group and note the synapse workspace name.
- Create a new branch on your local version of the repo and add the dev workspace name to line 21 of the
synapse-cd.yml
file. - Commit, push and merge the changes with your main branch via PR.
You now have a working set of infrastructure and associated pipelines to manage changes to your environment.
TODO: author steps for demonstration of adding functionality into the Synapse environment.
Contains sample yaml pipelines for use in Azure DevOps for CI/CD of ADF and Synapse artefacts.
Contains Terraform for the infrastructure.
Contains SSDT project that manages and maintains Synapse data model.
- Variables for the CI/CD pipelines need to be manually updated in the pipelines instances on import.
- These templates do not implement best practice wrt network security. This is beyond the scope of this example.
I would like to acknowledge the input and support on this repo from @jtracey93 and @ejones18.
Jack's blog on Terraform with Azure DevOps was instrumental for me in understanding and implementing the DevOps pipelines for infrastructure management.
Ethan's support with validating the artefacts and instructions in this repo has been fantastic. Together, we have ironed out a lot of issues that would have otherwise made it much harder for you to make use of these artefacts.