This guide will walk you through integrating Looker with BigQuery ML via Cloud Functions using the Looker Action API. Users can use the Looker Explore to examine data, then send the Looker query to a Cloud Function, specifying any BigQuery ML parameters on the form submissions.
There are four Cloud Functions included in this demo. Three of the functions are used to communicate with Looker via the Action API:
action_list
- Lists the metadata for the action, including the form and execute endpointsaction_form
- The dynamic form template to presented to users to send parameters to the execute endpoint. This can be easily customised to add more functionality to theCREATE MODEL
statementaction_execute
- The function to create a BigQuery ML model and corresponding BigQuery view to be used for predictions from Looker. This function will parse the SQL query that Looker generates, remove the row limit, and wrap it in aCREATE MODEL
statement with the specified parameters
An additional function predict
can be used to return predictions dynamically by submitting a model name and input data
Before following the steps below, make sure you have enabled the Secret Manager API, Cloud Build API, and the Cloud Functions API. It will take a few minutes after enabling this APIs for it to propagate through the systems.
Use Cloud Shell or the gcloud CLI
for the following steps.
The only variable you need to modify is the PROJECT
ID you want to deploy the Cloud Functions to. Update the REGION
and DATASET
if you wish to change this from the default values.
-
Set the variables below:
ACTION_LABEL="BigQuery ML Generator" ACTION_NAME="bigquery-ml-generator" REGION="us-east1" PROJECT="my-project-id" DATASET="bqml"
-
Clone this repo
git clone https://github.com/looker-open-source/bqml-actions.git cd bqml-actions/
-
Create a .env.yaml with variables:
printf "ACTION_LABEL: ${ACTION_LABEL}\nACTION_NAME: ${ACTION_NAME}\nREGION: ${REGION}\nPROJECT: ${PROJECT}\nDATASET: ${DATASET}" > .env.yaml
-
Generate the LOOKER_AUTH_TOKEN secret. The auth token secret can be any randomly generated string. You can generate such a string with the openssl command:
LOOKER_AUTH_TOKEN="`openssl rand -hex 64`"
-
Add the Auth Token as a Secret, then create a Service Account to run the Cloud Functions and give it access to the Secret:
printf ${LOOKER_AUTH_TOKEN} | gcloud secrets create LOOKER_AUTH_TOKEN --data-file=- --replication-policy=user-managed --locations=${REGION} --project=${PROJECT} gcloud iam service-accounts create bqml-actions-cloud-functions --display-name="BQML Actions Cloud Functions" SERVICE_ACCOUNT_EMAIL=bqml-actions-cloud-functions@${PROJECT}.iam.gserviceaccount.com eval gcloud projects add-iam-policy-binding ${PROJECT} --member=serviceAccount:${SERVICE_ACCOUNT_EMAIL} --role='roles/cloudfunctions.invoker' eval gcloud projects add-iam-policy-binding ${PROJECT} --member=serviceAccount:${SERVICE_ACCOUNT_EMAIL} --role='roles/secretmanager.secretAccessor' eval gcloud secrets add-iam-policy-binding LOOKER_AUTH_TOKEN --member=serviceAccount:${SERVICE_ACCOUNT_EMAIL} --role='roles/secretmanager.secretAccessor' --project=${PROJECT}
-
Create a dataset and
bqml_models
table for generated models and models table to store metadata on generated BigQuery models and corresponding views:bq --location=US mk --dataset ${PROJECT}:${DATASET} bq mk --table ${DATASET}.bqml_models bqml_models.json
-
Deploy 4 cloud functions for action hub listing, action form, action execute, and predictions (this may take a few minutes):
gcloud functions deploy bigquery-ml-generator-list --entry-point action_list --env-vars-file .env.yaml --trigger-http --runtime=python38 --allow-unauthenticated --timeout=540s --region=${REGION} --project=${PROJECT} --service-account ${SERVICE_ACCOUNT_EMAIL} --set-secrets 'LOOKER_AUTH_TOKEN=LOOKER_AUTH_TOKEN:latest' gcloud functions deploy bigquery-ml-generator-form --entry-point action_form --env-vars-file .env.yaml --trigger-http --runtime=python38 --allow-unauthenticated --timeout=540s --region=${REGION} --project=${PROJECT} --service-account ${SERVICE_ACCOUNT_EMAIL} --set-secrets 'LOOKER_AUTH_TOKEN=LOOKER_AUTH_TOKEN:latest' gcloud functions deploy bigquery-ml-generator-execute --entry-point action_execute --env-vars-file .env.yaml --trigger-http --runtime=python38 --allow-unauthenticated --timeout=540s --region=${REGION} --project=${PROJECT} --service-account ${SERVICE_ACCOUNT_EMAIL} --set-secrets 'LOOKER_AUTH_TOKEN=LOOKER_AUTH_TOKEN:latest' gcloud functions deploy bigquery-ml-generator-predict --entry-point predict --env-vars-file .env.yaml --trigger-http --runtime=python38 --allow-unauthenticated --timeout=540s --region=${REGION} --project=${PROJECT} --service-account ${SERVICE_ACCOUNT_EMAIL} --set-secrets 'LOOKER_AUTH_TOKEN=LOOKER_AUTH_TOKEN:latest'
-
Copy the Action Hub URL (
action_list
endpoint) and theLOOKER_AUTH_TOKEN
to input into Looker:echo Action Hub URL: https://${REGION}-${PROJECT}.cloudfunctions.net/${ACTION_NAME}-list echo LOOKER_AUTH_TOKEN: $LOOKER_AUTH_TOKEN
-
In Looker, go to the Admin > Actions page and click Add Action Hub
- Enter the Action Hub URL
- click Configure Authorization and enter the
LOOKER_AUTH_TOKEN
value for the Authorization Token and click Enable - Toggle the Enabled button and click Save
-
Copy the boilerplate LookML code in the LookML folder to a new project and update the
connection
name in the retail model file-
Explore the data and choose columns to send, e.g.:
https://${YOUR_LOOKER_DOMAIN}.com/explore/retail/order_items?fields=customers.id,customers.city,customers.state,customers.country,customers.age,customers.days_time_as_a_customer,customers.traffic_source,customers.gender,order_items.last_order_date,order_items.average_basket_value,order_items.average_basket_size,order_items.return_count,order_items.has_returns,order_items.total_returns,order_items.number_of_orders,inventory_items.list_of_product_category,order_items.gross_sales,order_items.net_sales&f[predictions.model_name]=&sorts=order_items.average_basket_value+desc&limit=20
-
Send to the BigQuery ML Generator destination. Specify the model type, model parameters, and the model name, ID column, and target column. The columns should be the SQL column names:
-
Predictions can be made in an Explore query or via the predict
Cloud Function. It may take a few minutes to up to a few hours for the model to build.
- In Looker - select your new model name from
predictions.model_name
and run a query, e.g. new customers that have registered in the past day:
-
Using the
predict
Cloud Function directly - you can dynamically send data to run predictions on the fly, e.g.:Make sure to include the auth token in the header in following format:
Authorization: Token token="${LOOKER_AUTH_TOKEN}"
If the model build fails, you will receive an email notification. Go to the Admin > Scheduler History page to view the error message returned from the Action or use scheduled_plan
System Activity Explore:
-
Explore query to see details on action executions:
https://${YOUR_LOOKER_DOMAIN}.com/explore/system__activity/scheduled_plan?fields=scheduled_job.id,scheduled_job.created_time,scheduled_plan_destination.action_type,scheduled_plan_destination.format,scheduled_job.status,scheduled_plan.run_once,scheduled_plan_destination.parameters,scheduled_job.status_detail&f[scheduled_plan_destination.action_type]=bigquery-ml-generator&sorts=scheduled_job.created_time+desc&limit=500&vis=%7B%7D&filter_config=%7B%22scheduled_plan_destination.action_type%22%3A%5B%7B%22type%22%3A%22%3D%22%2C%22values%22%3A%5B%7B%22constant%22%3A%22bigquery-ml-generator%22%7D%2C%7B%7D%5D%2C%22id%22%3A1%2C%22error%22%3Afalse%7D%5D%7D&dynamic_fields=%5B%7B%22measure%22%3A%22sum_of_runtime_in_seconds%22%2C%22based_on%22%3A%22scheduled_job_stage.runtime%22%2C%22expression%22%3A%22%22%2C%22label%22%3A%22Sum+of+Runtime+in+Seconds%22%2C%22type%22%3A%22sum%22%2C%22_kind_hint%22%3A%22measure%22%2C%22_type_hint%22%3A%22number%22%7D%5D&origin=share-expanded