Skip to content

Latest commit

 

History

History
219 lines (147 loc) · 10.8 KB

File metadata and controls

219 lines (147 loc) · 10.8 KB

Module 7 - System Observability: Monitor, Alert

In this module, you will use Azure Monitor to monitor the operation and status of Databricks. Also you will setup altering rule in Azure Monitor to monitor key ingestion metrics of the data ingestion pipeline. When the alerting criterion are reached, system administrator will receive notification mail.

We aim to provision the light yellow rectangle areas in the following system architecture diagram.

architecture-module7

Module Goal

  • Config and send Databricks operation metric to Azure Monitor.
  • Deploy dashboard to visualize the metrics of Databricks data processing status.
  • Setup altering criteria for key components in Azure Monitor, and send E-mail notification when the criteria is met

Module Preparation

  • Azure Subscription
  • Finish all steps in Module 6
  • Powershell Core (version 6.x up) environment (PowerShell runs on Windows, macOS, and Linux platforms)
  • Azure CLI (Azure CLI is available to install in Windows, macOS and Linux environments)
  • Scripts provided in this module:
    • create-log-analytics.ps1
    • configure-log-analytics-for-databricks-job.ps1
    • create-databricks-job.ps1
    • create-azure-dashboard.ps1
    • create-azure-alert.ps1
  • Databrick CLI , if you don't have, please follow the below command:
pip install databricks-cli
  • Azure CLI with Databricks extension installed, if you don't have, please follow the below command:
az extension add --name databricks

References


Make sure you have all the preparation items ready and let's start.

Step 1: Provision Azure Log Analytics WorkSpace

We will need to provision Azure Log Analytics Service so Databricks can send its logs. Modify the parameters in the provision-config.json file. You should update the configuration values according to your needs.

{
    "LogAnalytics": {
        "WorkspaceName": "-dbs-log-analytics",
        "ServiceTier":"PerGB2018",
        "SecretScope": "logsecretscope",
        "SecretScopeKeyWorkspaceId": "databrickslogworkspaceid",
        "SecretScopeKeyWorkspaceKey": "databrickslogworkspacekey",
        "ARMTemplatePath": "../Azure/loganalytics/LogAnalytics.json"
    }
}

*Note: You have to provide a value for SecretScope

Then run create-log-analytics.ps1 to provision Azure LogAnalytics Services.

After the creation is done, you can verify the creation result in Azure Portal.

loganalytics-server

Step 2: Configure Databricks secret scope with Azure Key Vault back-end

In this step, we will create Databricks secret scope with Azure Key Vault back-end and then add the Log Analytics workspace information into Azure Key Vault so Databricks can access this information in a secure way.

Run create-db-secret-kv-backend.ps1 to create Databricks secret-scope with Azure Key Vault back-end.

*Note: You have to login using an Azure account that can create service principle, so we can't use the Service Principle created in Module 0

Then run update-db-log-analytics-key-vault.ps1 to add Log Analytics connection information in Key Vault so that DataBricks-LogAnalytics connector can access them in a secure way.

After the update is done, you can verify the creation result in Azure Portal. First you need to add access policy for your Azure account.

db-keyvault-access-policy

After that your can check the created key vault items.

db-keyvault-items

Step 3: Deploy Databricks Monitor Library

In this step, we will send application logs and metrics from Azure Databricks to a Log Analytics workspace. It uses the Azure Databricks Monitoring Library, which is available on GitHub.

Modify the following parameters in the provision-config.json file. You should update the configuration values according to your needs.

{
    "LogAnalytics": {
        "SparkMonitoringScript": "../Azure/databricks-monitoring/spark-monitoring.sh"
    }
}

Then run configure-log-analytics-for-databricks-job.ps1 to deploy the script that can send Databricks log data to Azure Log Analytics.

Finally run create-databricks-job.ps1 to create a new Databricks job. The new job will use the script we just deployed and send it's execution status log to Azure Log Analytics. .

After the script is done, you can verify the creation of the new Databricks Job in Databricks Workspace.

db-keyvault-items

Click the "Edit" link in Cluster setting and check the Environment Variables, you should find the Log Analytics Workspace setting there.

db-keyvault-items

Now you have finished the configuration for Databricks Monitoring Library. You can start the new Databrick Jobs and run ingest-telemetry-data.ps1 again to ingest data. We can monitor the ingestion status using Azure Dashboard in the next step.

Step 4: Create Azure Dashboard

In this step we will deploy a predefined metric dashboard to monitor Databricks and key system services. Modify the following parameters in the provision-config.json file. You should update the configuration values according to your needs.

   "AzureMonitor":{
        "Dashboard":{
            "MainDashboardName":"-ingestion-dashboard",
            "MainDashboardTemplatePath": "../Azure/dashboard/main_dashboard.json",
            "DBSDashboardName":"-dbs-dashboard",
            "DBSDashboardTemplatePath": "../Azure/dashboard/dbs_dashboard.json"
        }
    }

Then run create-azure-dashboard.ps1 to provision Azure Dashboard.

When the script is finished, open Azure Portal and select dashboard. . deploy-notebook

Then choose "Browse all dashboards" browse-all-dashboard

Select "Shared dashboards" and you will find there are two dashboards which have names that end with "dbs-dashboard" and "ingestion-dashboard" shared-dashboard

The Dashboard with name end with "dbs-dashboard" will show performance metric for Databricks. dbs-dashboard

The Dashboard with name end with "ingestion-dashboard" will show performance metric for other Azure Services such as Azure Data Explore, Azure Functions, Azure Data Lake. ingestion-dashboard

Step 5: Create Azure Alert

In this steps, we will create some metric alters to help us get notified when there is something wrong. So we can quickly react to the incidents.
Modify the following parameters in the provision-config.json file. You should updated the configuration values according to your needs.

{
    "AzureMonitor":{
        "ActionGroup":{
            "Name":"-kusto-lab-action-group",
            "ShortName":"kusto-lab",
            "EmailGroupName":"email_alert_team",
            "EmailRecipients":"abc@microsoft.com",
            "AzureOpsGenieAPIUrl":"https://api.opsgenie.com/v1/json/azure",
            "AzureOpsGenieAPIKey":"None",
            "ActionGroupTemplatePath": "../Azure/alert/ActionGroups.json"
        },
        "FunctionAlert":{
            "ErrorHandlingAlertTriggerThreshold":1, 
            "ErrorHandlingFuncAlertTemplatePath": "../Azure/alert/ErrorHandlingFuncAlert.json",
            "IngestionFuncNotTriggerThreshold": 1,     
            "IngestionFuncAlertTemplatePath": "../Azure/alert/IngestionFuncAlert.json"            
        },
        "ADXAlert":{
            "ADXClusterHighCPUThreshold":70,       
            "ADXClusterHighIngestionLatencyThreshold":180,             
            "ADXClusterHighIngestionUtilThreshold":70,
            "ADXAlertTemplatePath": "../Azure/alert/ADXAlert.json"
        },
        "DatalakeAlert":{
            "DatalakeLowIngressThreshold":1048576,
            "DatalakeAlertTemplatePath": "../Azure/alert/DatalakeAlert.json"
        },
        "EventGridAlert":{       
            "EventGridLowPublishedThreshold":0,
            "EventGridHighDroppedThreshold":1,
            "EventGridAlertTemplatePath": "../Azure/alert/EventGridAlert.json"
        }
}

Then run create-azure-alert.ps1 to create altering rule based on metrics in Azure Data Explorer, Azure Data Lake, Azure Event Grid, Ingestion Function, Databricks-ErrorHandler Functions and ADX-Ingestion-ErrorHandler Functions. These altering rules will monitor the performance metrics of these key services.

When the script is finished, open Azure Monitor, select Alerts and you will see current altering status.
alters.png

Select Manage alter rules, and you will see all the altering rules in the system. You can modify them based on your environment. alters.png