In this module, you will use Azure Monitor to monitor the operation and status of Databricks. Also you will setup altering rule in Azure Monitor to monitor key ingestion metrics of the data ingestion pipeline. When the alerting criterion are reached, system administrator will receive notification mail.
We aim to provision the light yellow rectangle areas in the following system architecture diagram.
Module Goal
- Config and send Databricks operation metric to Azure Monitor.
- Deploy dashboard to visualize the metrics of Databricks data processing status.
- Setup altering criteria for key components in Azure Monitor, and send E-mail notification when the criteria is met
Module Preparation
- Azure Subscription
- Finish all steps in Module 6
- Powershell Core (version 6.x up) environment (PowerShell runs on Windows, macOS, and Linux platforms)
- Azure CLI (Azure CLI is available to install in Windows, macOS and Linux environments)
- Scripts provided in this module:
- create-log-analytics.ps1
- configure-log-analytics-for-databricks-job.ps1
- create-databricks-job.ps1
- create-azure-dashboard.ps1
- create-azure-alert.ps1
- Databrick CLI , if you don't have, please follow the below command:
pip install databricks-cli
- Azure CLI with Databricks extension installed, if you don't have, please follow the below command:
az extension add --name databricks
References
- Azure Monitor Logs overview
- Monitoring Azure Databricks in an Azure Log Analytics Workspace
- Databricks secret scope with Azure Key Vault back-end
- Overview of Log Analytics in Azure Monitor
- Quickstart: Create a dashboard in the Azure portal by using an ARM template
- Create, view, and manage log alerts using Azure Monitor
Make sure you have all the preparation items ready and let's start.
We will need to provision Azure Log Analytics Service so Databricks can send its logs. Modify the parameters in the provision-config.json file. You should update the configuration values according to your needs.
{
"LogAnalytics": {
"WorkspaceName": "-dbs-log-analytics",
"ServiceTier":"PerGB2018",
"SecretScope": "logsecretscope",
"SecretScopeKeyWorkspaceId": "databrickslogworkspaceid",
"SecretScopeKeyWorkspaceKey": "databrickslogworkspacekey",
"ARMTemplatePath": "../Azure/loganalytics/LogAnalytics.json"
}
}
*Note: You have to provide a value for SecretScope
Then run create-log-analytics.ps1 to provision Azure LogAnalytics Services.
After the creation is done, you can verify the creation result in Azure Portal.
In this step, we will create Databricks secret scope with Azure Key Vault back-end and then add the Log Analytics workspace information into Azure Key Vault so Databricks can access this information in a secure way.
Run create-db-secret-kv-backend.ps1 to create Databricks secret-scope with Azure Key Vault back-end.
*Note: You have to login using an Azure account that can create service principle, so we can't use the Service Principle created in Module 0
Then run update-db-log-analytics-key-vault.ps1 to add Log Analytics connection information in Key Vault so that DataBricks-LogAnalytics connector can access them in a secure way.
After the update is done, you can verify the creation result in Azure Portal. First you need to add access policy for your Azure account.
After that your can check the created key vault items.
In this step, we will send application logs and metrics from Azure Databricks to a Log Analytics workspace. It uses the Azure Databricks Monitoring Library, which is available on GitHub.
Modify the following parameters in the provision-config.json file. You should update the configuration values according to your needs.
{
"LogAnalytics": {
"SparkMonitoringScript": "../Azure/databricks-monitoring/spark-monitoring.sh"
}
}
Then run configure-log-analytics-for-databricks-job.ps1 to deploy the script that can send Databricks log data to Azure Log Analytics.
Finally run create-databricks-job.ps1 to create a new Databricks job. The new job will use the script we just deployed and send it's execution status log to Azure Log Analytics. .
After the script is done, you can verify the creation of the new Databricks Job in Databricks Workspace.
Click the "Edit" link in Cluster setting and check the Environment Variables, you should find the Log Analytics Workspace setting there.
Now you have finished the configuration for Databricks Monitoring Library. You can start the new Databrick Jobs and run ingest-telemetry-data.ps1 again to ingest data. We can monitor the ingestion status using Azure Dashboard in the next step.
In this step we will deploy a predefined metric dashboard to monitor Databricks and key system services. Modify the following parameters in the provision-config.json file. You should update the configuration values according to your needs.
"AzureMonitor":{
"Dashboard":{
"MainDashboardName":"-ingestion-dashboard",
"MainDashboardTemplatePath": "../Azure/dashboard/main_dashboard.json",
"DBSDashboardName":"-dbs-dashboard",
"DBSDashboardTemplatePath": "../Azure/dashboard/dbs_dashboard.json"
}
}
Then run create-azure-dashboard.ps1 to provision Azure Dashboard.
When the script is finished, open Azure Portal and select dashboard. .
Then choose "Browse all dashboards"
Select "Shared dashboards" and you will find there are two dashboards which have names that end with "dbs-dashboard" and "ingestion-dashboard"
The Dashboard with name end with "dbs-dashboard" will show performance metric for Databricks.
The Dashboard with name end with "ingestion-dashboard" will show performance metric for other Azure Services such as Azure Data Explore, Azure Functions, Azure Data Lake.
In this steps, we will create some metric alters to help us get notified when there is something wrong. So we can quickly react to the incidents.
Modify the following parameters in the provision-config.json file. You should updated the configuration values according to your needs.
{
"AzureMonitor":{
"ActionGroup":{
"Name":"-kusto-lab-action-group",
"ShortName":"kusto-lab",
"EmailGroupName":"email_alert_team",
"EmailRecipients":"abc@microsoft.com",
"AzureOpsGenieAPIUrl":"https://api.opsgenie.com/v1/json/azure",
"AzureOpsGenieAPIKey":"None",
"ActionGroupTemplatePath": "../Azure/alert/ActionGroups.json"
},
"FunctionAlert":{
"ErrorHandlingAlertTriggerThreshold":1,
"ErrorHandlingFuncAlertTemplatePath": "../Azure/alert/ErrorHandlingFuncAlert.json",
"IngestionFuncNotTriggerThreshold": 1,
"IngestionFuncAlertTemplatePath": "../Azure/alert/IngestionFuncAlert.json"
},
"ADXAlert":{
"ADXClusterHighCPUThreshold":70,
"ADXClusterHighIngestionLatencyThreshold":180,
"ADXClusterHighIngestionUtilThreshold":70,
"ADXAlertTemplatePath": "../Azure/alert/ADXAlert.json"
},
"DatalakeAlert":{
"DatalakeLowIngressThreshold":1048576,
"DatalakeAlertTemplatePath": "../Azure/alert/DatalakeAlert.json"
},
"EventGridAlert":{
"EventGridLowPublishedThreshold":0,
"EventGridHighDroppedThreshold":1,
"EventGridAlertTemplatePath": "../Azure/alert/EventGridAlert.json"
}
}
Then run create-azure-alert.ps1 to create altering rule based on metrics in Azure Data Explorer, Azure Data Lake, Azure Event Grid, Ingestion Function, Databricks-ErrorHandler Functions and ADX-Ingestion-ErrorHandler Functions. These altering rules will monitor the performance metrics of these key services.
When the script is finished, open Azure Monitor, select Alerts and you will see current altering status.
Select Manage alter rules, and you will see all the altering rules in the system. You can modify them based on your environment.