By its nature, renewable energy is highly weather-dependent, and the ongoing expansion of renewables is making our global power supply more vulnerable to changing weather conditions. Predicting how much power will be generated based on the weather forecast might be crucial, especially for areas such as Orkney in Scotland.
In this repository I showcase how to:
- Build a retrainable ZenML pipeline
- Feature engineering: build numerical 2-dimensional vectors from the corresponding wind cardinal directions
- Load data from Google Cloud BigQuery as a part of a ZenML pipeline
- Train your model remotely in Google Cloud Vertex AI
The goal is to create a pipeline that can load electricity power production and wind forecast data from Google BigQuery. We then wish to prepare and transform this data for a suitable model. The model will be trained to predict how much electricity power will be generated based on wind weather forecast (wind speed and its direction).
The figure above explains the data that we are going to work with. We can see that with increasing wind speed the electricity power production follows a sigmoid curve as expected.
🪜 Steps of the pipeline:
importer.py
: Imports weather forecast and electricity power production from Google BigQuerypreparator.py
: Cleans and prepares the datasettransformer.py
: Transforms cardinal directions (North, South...) into 2-dimensional feature vectorstrainer.py
: Trains a Random Forest Regressorevaluator.py
Evaluates the regressor on test data
Note: The data is included in this repository, you can therefore upload it to your own GCP project's BigQuery and follow the rest of this tutorial.
Using poetry
(install):
git clone https://github.com/zenml-io/zenml-projects.git
cd zenml-projects/time-series-forecast
poetry install
Using requirements.txt
:
git clone https://github.com/zenml-io/zenml-projects.git
cd zenml-projects/time-series-forecast
pip install -r requirements.txt
ZenML integrations:
zenml integration install -y sklearn gcp
Initialize ZenML repository:
zenml init
zenml up
I will show how to create Google Cloud resources for this project using gcloud cli
. Follow this if you don't have it set up.
List the current configurations and check that project_id
is set to your GCP project:
gcloud config list
If not, use:
gcloud config set project <PROJECT_ID>
Create a bucket:
gsutil mb -p PROJECT ID gs://BUCKET_NAME
# Example:
gsutil mb -p zenml-vertex-ai gs://time-series-bucket
Upload the data set:
gsutil cp src/data/wind_forecast.csv gs://time-series-bucket
Create a dataset in BigQuery (BQ):
bq mk --dataset <PROJECT-ID>:<DATASET-NAME>
# Example:
bq mk --dataset computas-project-345810:zenml_dataset
Import data from Cloud Storage into BQ:
bq load \
--autodetect \
--source_format=CSV \
zenml_dataset.windforecast \
gs://time-series-bucket/wind_forecast.csv
Create a service account:
gcloud iam service-accounts create <NAME>
# Example:
gcloud iam service-accounts create zenml-sa
Grant permission to the service account:
gcloud projects add-iam-policy-binding <PROJECT_ID> --member="serviceAccount:<SA-NAME>@<PROJECT_ID>.iam.gserviceaccount.com" --role=<ROLE>
# Example:
gcloud projects add-iam-policy-binding zenml-vertex-ai --member="serviceAccount:zenml-sa@zenml-vertex-ai.iam.gserviceaccount.com" --role=roles/storage.admin
gcloud projects add-iam-policy-binding zenml-vertex-ai --member="serviceAccount:zenml-sa@zenml-vertex-ai.iam.gserviceaccount.com" --role=roles/aiplatform.admin
Generate a key file:
gcloud iam service-accounts keys create <FILE-NAME>.json --iam-account=<SA-NAME>@<PROJECT_ID>.iam.gserviceaccount.com
# Example:
gcloud iam service-accounts keys create credentials.json --iam-account=zenml-sa@zenml-vertex-ai.iam.gserviceaccount.com
Set the environment variable. To use service accounts with the Google Cloud CLI, you need to set an environment variable where your code runs:
export GOOGLE_APPLICATION_CREDENTIALS=<KEY-FILE-LOCATION>
For the BigQuery step you also need to point to the same file:
class BigQueryImporterConfig(BaseStepConfig):
query: str = 'SELECT * FROM `computas_dataset.windforecast`'
project_id: str = 'computas-project-345810'
@step
def bigquery_importer(config: BigQueryImporterConfig) -> pd.DataFrame:
credentials = service_account.Credentials.from_service_account_file('credentials.json')
return pandas_gbq.read_gbq(config.query, project_id = config.project_id, credentials = credentials)
NOTE: You also need to change the query
and your project_id
accordingly.
Vertex AI and ZenML will use this bucket for output of any artifacts from the training run:
gsutil mb -l <REGION> gs://bucket-name
# Example:
gsutil mb -l europe-west1 gs://zenml-bucket
ZenML will use this registry to push your job images that Vertex will use.
a) Enable Container Registry
b) Authenticate your local docker
cli with your GCP container registry:
docker pull busybox
docker tag busybox gcr.io/<PROJECT-ID/busybox
docker push gcr.io/<PROJECT-ID>/busybox
Note that you may need to run gcloud auth configure-docker
in order to
authenticate your local docker
cli with your GCP container registry and in
order for the docker push...
command to work. See our
documentation
for more information on making this work.
6. Enable Vertex AI API
To be able to use custom Vertex AI jobs, you first need to enable their API inside Google Cloud console.
cd src
docker build --tag zenmlcustom:0.1 .
Set a GCP bucket as your artifact store:
zenml artifact-store register <NAME> --flavor=gcp --path=<GCS_BUCKET_PATH>
# Example:
zenml artifact-store register gcp-store --flavor=gcp --path=gs://zenml-bucket
Create a Vertex step operator:
zenml step-operator register <NAME> \
--flavor=vertex \
--project=<PROJECT-ID> \
--region=<REGION> \
--machine_type=<MACHINE-TYPE> \
# Example:
zenml step-operator register vertex \
--flavor=vertex \
--project=zenml-core \
--region=europe-west1 \
--machine_type=n1-standard-4 \
List of available machines
Register a container registry:
zenml container-registry register <NAME> --flavor=default --uri=gcr.io/<PROJECT-ID>/<IMAGE>
# Example:
zenml container-registry register gcr_registry --flavor=default --uri=gcr.io/zenml-vertex-ai/busybox
Register the new stack (change names accordingly):
zenml stack register vertex_training_stack \
-o default \
-c gcr_registry \
-a gcp-store \
-s vertex
View all your stacks: zenml stack list
Activate the stack:
zenml stack set vertex_training_stack
Now we're ready. Execute:
python main.py
Documentation on Step Operators
More on Step Operators
Documentation on how to create a GCP service account
ZenML CLI documentation