Skip to content

Latest commit

 

History

History
276 lines (174 loc) · 15.3 KB

13-setup-aifactory.md

File metadata and controls

276 lines (174 loc) · 15.3 KB

Setup AIFactory - Infra Automation

To configura the infra automation the first time takes ~4h.

After this is done, you can setup as many AIFactory's you want, with configuration time 2-15min per AIFactory.

1) Azure Devops (or Github): Create an empty repo, aifactory-infra-001

This aifactory-infra-001 will be your repo, where you have your configuration is overwriting the AIfactory config-template files.

2) Add the Enteprise Scale AIFactory repo as a GIT submodule, to your repo aifactory-infra-001

FAQ: How-to clone repo with submodule to local computer?

  • Open GIT command prompt, go to your local root folder for the code (you should see the folder azure-enterprise-scale-ml and notebook_demos with a dir in the GIT CMD)run below:

    git config --system core.longpaths true

    git submodule add https://github.com/jostrm/azure-enterprise-scale-ml

  • Note: If the sumodule is already added by another team member in your project, the above command, git submodule add, will not work. Then you need to run the below instead:

    git submodule update --init --recursive

3) Copy template files (pipelines, workflows, parameter templates) locally

  • Open the notebook 01_init_templates_ALL.ipynb
  • Run all cells.
    • Note: When you run the first cell, VS code will ask you to choose a kernel - choose _Python environment. Recommended Python version is 3.12.5 but any Python version above 3.7 should work.
  • After you executed all cells in the notebook, you will have a new folder called ai factory with sub-folders, that includes templates.
  • Verify that it looks as the screenshot below, that you have an aifactory folder at the top.

4) Import pipelines/workflows to Azure Devops (or Github)

For Azure Devops classic pipelines:

Import all, but start to imort two of them and execute them in the following order:

  1. esml-infra-common-bicep.json
  2. esml-infra-project-bicep-adv.json

Thess two are the Azure Devops Release pipelines we need for an AIFactory, and its first project (and upcoming projects):

Start with the 1st file esml-infra-common-bicep.json

  1. Open the Azure Devops portal, and browse to your org and project, to click the main menue to the left on Pipelines/Releases

  1. Click the New button, to find the Import release pipeline button

After import, it should look like this:

  • Click on the red marking at Tasks where there are three task stages: esml-common-dev,esml-common-test,esml-common-prod, you need to configure all of them.

4) Configure the pipeline

  • Click on the red marking at Tasks where there are three tasks stages: esml-common-dev,esml-common-test,esml-common-prod, you need to configure all of them, start with the task stage esml-common-dev
  • Click on Agent job, where it says Some settings need attention

  • Select Agent pool
    • Option A) Choose Hosted/Azure Pipelines, with the Agent Specification _windows-2022 (windows-latest usually works also)
    • Option B) You may also use your own self-hosted Windows server (Windows 2019 or Windows 2022)

Creating 3 ARM connections:

  1. Click the Azure CLI task called 11-Common RG and RBAC, and then cliick the Manage link to get to the Azure Resource Manager connection page, where you can create connections. A new browers tab will open.
  • Click the New service connection button, and select Azure Resource Manager radio button, click NEXT.

  • Select _Service principal (manual) in the dialog, click NEXT

  • Use the service principal information for esml-common-bicep-sp, you created in the seeding keuyvault in the prerequisites-steup to configure it.
  • That service principal should have the priviledged role ONWEr on the subscription, and be able to assigne other priviledged roles, such as CONTRIBUTOR and OWNER on Resource groups scope, as image:

  • Verify the ARM connection, and also checl the box Grant access permission to all pipelines

  1. Create all 3: You need to create three Azure Resource Manager Connections (ARM connections). The ARM connections should be created with a service principle that has OWNER permissions to the subscription we want to work with in the AIFactory, as eithe DEV, TEST, or PROD environment.

You may create all 3 ARM connections at once, either based on same service principle from the seeding keyvault called esml-common-bicep-sp that in that case are owner on all three subscriptions, or you may have three service principals.

  • ARM connection names: esml-aifactory-infra-dev, esml-aifactory-infra-test, esml-aifactory-infra-prod
  • Service principal info
    • Role: OWNER (able to assign other idnetities priviledged roles)
      • Scope: Subscription (DEV if Task is esml-common-dev, TEST subscription if esml-common-test)
    • If external vNet (BYO vNet): - CONTRIBUTOR the Resourcegroup where the external vNet resides for Dev, Test, Prod subscriptions/spokes - Reason: To be able to create Network sercurity groups - Network Contributor to the vNet Reason: To be able to create subnets, and to be able to assigne network security groups to the subnets. Read more about: Permissions for the service principle

TODO: Support federated ARM connections https://learn.microsoft.com/en-us/azure/devops/pipelines/release/configure-workload-identity?view=azure-devops

Configure the tasks, with ARM connections:

  1. Go back to the other TAB, where you have the RELEASE pipeline open, at the TASK view with task esml-common-dev
  2. Click the Azure CLI task called 11-Common RG and RBAC to configure it, and select the ARM connection you created earlier, called esml-aifactory-infra-dev
    • Note: You may need to click the refresh icon, for the combobox to re-load the newly created ARM connections to be selectable.
  3. Repeat this process, 1 and 2, for all steps 12-Common Networking, 13-Deploy resources
  4. Repeat 1-3 for all task stages - also for esml-common-test,esml-common-prod_ where you select the other respective ARM connections
    • esml-common-test stage using the ARM connection: esml-aifactory-infra-test
    • esml-common-prod stage using the ARM connection: esml-aifactory-infra-prod
  5. SAVE the release pipeline.

5) Edit the Azure Devops Variables

More information about variables can bee seen here

To get "my ip":

  • Option A) Go to any storage account in Azure, and click networking. At the green marking in image, your public IP is seen

  • Option B) Open a terminal and run: nslookup myip.opendns.com resolver1.opendns.com

6) Edit the Base parameters

More information about variables can bee seen here

NB! Azure Databricks Object ID (OID) may not exist, is global in your tenant

The AzureDatabricks application in your Microsoft EntraID is global, and does not exist if not anyone have created it before. It is a global application, same ObjectID (OID) for all Azure Databricks instances.

This is about the parameter: databricksOID in the file 10-esml-globals-5-13_23.json

  • Problem: If you have a new tenant, without any subscriptions yet to have created an Azure Databricks services, then you will not have any Object ID for the AzureDatabricks enterprise application
  • Solution: Create a dummy Azure databricks service. For example in the seeding keyvault. Then the ObjectID will be created.

Before, if not AzureDatabricks application:

After, when Azure Databricks dummy is created, and application exists:

BYOVnet - Brin your own vNet: Externally injected vNet to spoke

If you cannot allow the AIFactory orchestration to create it own vNets, you can configure your precreated vNet in the parameter file 10-esml-globals-override.json

Example, of what you need to override:

If you want to BYOVnets for Dev, Stage, Prod, you need to pre-created them, and match some parameters more such as

  • Your vNet: vnet-spoke-aifactory-sdc-dev-001
    • Your addressspace: _10.11.0.0/18
    • Parameter file that need to match the CIDR: 12-esml-cmn-parameters.json
      • Parameter that needs to be matching: 12-esml-cmn-parameters.json "10.XX.0.0/18"
      • Variable (Azure Devops, Github) that needs to be matching: cidr_range "11"

Seeding keyvault = inputKeyvault parameter

NB! seeding keyvault = inputKeyvault when speaking of variables and parameters in the AIFActory.

  • This, due to legacy reason (ESML AIFactory was established 2019), but will be synced in the future as seeding keyvault

7) Check in your code, and add artifact to point at your sources code in Azure Devops Release pipeline

  1. Check in your code
  2. Click EDIT button

  1. Remove the artifact with source alias name: _esml-aifactory

  • Click on the artifact box, a dialog opens
    • Copy the source alias name at the bottom. You will need to add a new artifact with same source alias name
  • Click the DELETE button
  1. Add artifact with name _esml-aifactory

Clich Add artifact

Configure as below, and keep everything else as default

  • Source Type
    • Azure repository (If classic ADO)
    • BUILD (if .yaml ADO)
  • Project: Select your Azure Devops project (e.g. where you have the parameter files and azure-enterprise-scale-ml submodule)
  • Respository: Select your repo (e.g. where you have the parameter files and azure-enterprise-scale-ml submodule)
  • Branch: main (e.g. where you have the parameter files and azure-enterprise-scale-ml submodule)
  • Default version: latest
  • Checkbox: "Checkout submodules" needs to be checked.
  • Source alias name: _esml-aifactory

Cick SAVE button.

CHECKLIST

This is a checkpoint to see if all prerequisites setup have been done, before you run the pipeline.

1a) PrivateDNS in HUB, and not locally Private DNS zones

Q: Have you created all Private DNS zones in the hub, manually?

E.g. if you want to have your Private DNS zones in your HUB, as recommended, e.g. that you have the flag centralDnsZoneByPolicyInHub=true in the file 10-esml-globals-4-13_21_22.json and that you have specified parameters: privDnsSubscription_param, privDnsResourceGroup_param

TODO:

  • Ensure you have all Private DNS zones, pre-created in the HUB, manually (util-script are work in in TODO list)
  • Ensure you have created vNet Link to the Hub vNet, for all Private DNS Zones
  • Ensure you have the Azure Policy and Azure Initiative assigned How-To: Networking: peering-of-spookes-to-hub
  • Ensure you have peered the spoke vNets to the Hub How-To: Networking: peering-of-spookes-to-hub
  • Ensure you have all settings set in the parameter file 10-esml-globals-4-13_21_22.json
    • The parameters: privDnsSubscription_param, privDnsResourceGroup_param, centralDnsZoneByPolicyInHub

Private DNS zones, when created:

Azure Policy's, when created:

1b) PrivateDNS locally centralDnsZoneByPolicyInHub=false

E.g. if you want to have your Private DNS zones locally in each AIfactory spoke, in common resource group, only recommended if you do not want to peer th AIFactory to your hub, e.g DMEO mode - You have the flag centralDnsZoneByPolicyInHub=false in the file 10-esml-globals-4-13_21_22.json

TODO: You do not need to do anything.

  • Note: But you cannot peer it either in an efficient way. Usuallu this is only done when testing the AIFactory isolated, via Bastion-only access mode.

2) Have you enabled all resource providers?

If you don't know. Please go back to this step 12-resourceproviders.md where you have an automationscript to do this.

3) Have checked in your code?

The parameters you edited, do they look as you configured them locally in Azure Devops also?

4) Is all permissions set for the service principle esml-common-bicep-sp?

  • The BICEP will have to create artifact under 1 or many subscriptions.
  • Note: If you have an external vNet (BYO vNet) in another subscription than its AIFactory environment subscription, it needs Contributor on ResourceGroup to create NSG's, and Network Contributor on the vNet to be able to assign the NSG's. Read more about: Permissions for the service principle

5) Verify Azure Devops, inline script arguemts. Especially for service principle esml-common-bicep-sp

Check specifically the service principle name, of the secret name, in the seeding keyvault. If you have the default name, it should work. If now, you need to edit in the Azure Devops Task setup, Script Arguments inline. See image

6) Verify that the service principle, esml-common-bicep-sp, have ACL access to seeding keuvault.

Even in the service principle esml-common-bicep-sp has OWNER on the Azure subscription, the Access Policy on secrets: GET,LIST,SET

Otherwise you will encoutner an error message similas as below:

If so, you need to visit the seeding keyvault, Access policys, and give the service principle Get, List, Set , and rerun the pipeline release.

DONE! Ready to Run the pipeline

Now you can go ahead and run the pipeline in Azure Devops.

The process for this is described here in a process flow diagram - Add AIFactory project

TROUBLE SHOOTING

For more trouble shooting, Visit the FAQ