From 1fa1a3eb7128c33e08804d58b2c26ab83a24b766 Mon Sep 17 00:00:00 2001 From: ravi-databricks <37003292+ravi-databricks@users.noreply.github.com> Date: Fri, 26 Jul 2024 16:23:25 -0700 Subject: [PATCH 1/2] Corrected readme --- README.md | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index e8c9376..ab0d744 100755 --- a/README.md +++ b/README.md @@ -126,10 +126,6 @@ If you want to run existing demo files please follow these steps before running pip install databricks-sdk ``` -```commandline - databricks labs dlt-meta onboard -``` - ```commandline dlt_meta_home=$(pwd) ``` @@ -137,11 +133,12 @@ If you want to run existing demo files please follow these steps before running ```commandline export PYTHONPATH=$dlt_meta_home ``` -![onboardingDLTMeta.gif](docs/static/images/onboardingDLTMeta.gif) - ```commandline databricks labs dlt-meta onboard ``` +![onboardingDLTMeta.gif](docs/static/images/onboardingDLTMeta.gif) + + Above commands will prompt you to provide onboarding details. If you have cloned dlt-meta git repo then accept defaults which will launch config from demo folder. ![onboardingDLTMeta_2.gif](docs/static/images/onboardingDLTMeta_2.gif) From c5c3c829cc6bc324d3a913fdb3267947b72ef4a9 Mon Sep 17 00:00:00 2001 From: ravi-databricks <37003292+ravi-databricks@users.noreply.github.com> Date: Mon, 29 Jul 2024 17:11:42 -0700 Subject: [PATCH 2/2] Added demo to docsite --- demo/README.md | 108 +++++++++++++++++++++++++++- docs/content/demo/Append_FLOW_CF.md | 46 ++++++++++++ docs/content/demo/Append_FLOW_EH.md | 63 ++++++++++++++++ docs/content/demo/_index.md | 4 +- 4 files changed, 219 insertions(+), 2 deletions(-) create mode 100644 docs/content/demo/Append_FLOW_CF.md create mode 100644 docs/content/demo/Append_FLOW_EH.md diff --git a/demo/README.md b/demo/README.md index 8f15191..b699582 100644 --- a/demo/README.md +++ b/demo/README.md @@ -1,6 +1,11 @@ # [DLT-META](https://github.com/databrickslabs/dlt-meta) DEMO's 1. [DAIS 2023 DEMO](#dais-2023-demo): Showcases DLT-META's capabilities of creating Bronze and Silver DLT pipelines with initial and incremental mode automatically. 2. [Databricks Techsummit Demo](#databricks-tech-summit-fy2024-demo): 100s of data sources ingestion in bronze and silver DLT pipelines automatically. +<<<<<<< Updated upstream +======= + 3. [Append FLOW Autoloader Demo](#append-flow-autoloader-file-metadata-demo): Write to same target from multiple sources using [dlt.append_flow](https://docs.databricks.com/en/delta-live-tables/flows.html#append-flows) and adding [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html) + 4. [Append FLOW Eventhub Demo](#append-flow-eventhub-demo): Write to same target from multiple sources using [dlt.append_flow](https://docs.databricks.com/en/delta-live-tables/flows.html#append-flows) and adding [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html) +>>>>>>> Stashed changes # DAIS 2023 DEMO @@ -98,4 +103,105 @@ This demo will launch auto generated tables(100s) inside single bronze and silve - Copy the displayed token - - Paste to command prompt \ No newline at end of file +<<<<<<< Updated upstream + - Paste to command prompt +======= + - Paste to command prompt + +# Append Flow Autoloader file metadata demo: +This demo will perform following tasks: +- Read from different source paths using autoloader and write to same target using append_flow API +- Read from different delta tables and write to same silver table using append_flow API +- Add file_name and file_path to target bronze table for autoloader source using [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html) +## Append flow with autoloader + +1. Launch Terminal/Command prompt + +2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html) + +3. ```commandline + git clone https://github.com/databrickslabs/dlt-meta.git + ``` + +4. ```commandline + cd dlt-meta + ``` + +5. Set python environment variable into terminal + ```commandline + dlt_meta_home=$(pwd) + ``` + + ```commandline + export PYTHONPATH=$dlt_meta_home + ``` + +6. ```commandline + python demo/launch_af_cloudfiles_demo.py --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/tmp/DLT-META/demo/ --uc_catalog_name=ravi_dlt_meta_uc + ``` + +- cloud_provider_name : aws or azure or gcp +- db_version : Databricks Runtime Version +- dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines +- uc_catalog_name: Unity catalog name +- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token + +![af_am_demo.png](docs/static/images/af_am_demo.png) + +# Append Flow Eventhub demo: +- Read from different eventhub topics and write to same target tables using append_flow API + +### Steps: +1. Launch Terminal/Command prompt + +2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html) + +3. ```commandline + git clone https://github.com/databrickslabs/dlt-meta.git + ``` + +4. ```commandline + cd dlt-meta + ``` +5. Set python environment variable into terminal + ```commandline + dlt_meta_home=$(pwd) + ``` + ```commandline + export PYTHONPATH=$dlt_meta_home + ``` +6. Eventhub +- Needs eventhub instance running +- Need two eventhub topics first for main feed (eventhub_name) and second for append flow feed (eventhub_name_append_flow) +- Create databricks secrets scope for eventhub keys + - ``` + commandline databricks secrets create-scope eventhubs_dltmeta_creds + ``` + - ```commandline + databricks secrets put-secret --json '{ + "scope": "eventhubs_dltmeta_creds", + "key": "RootManageSharedAccessKey", + "string_value": "<>" + }' + ``` +- Create databricks secrets to store producer and consumer keys using the scope created in step 2 + +- Following are the mandatory arguments for running EventHubs demo + - cloud_provider_name: Cloud provider name e.g. aws or azure + - dbr_version: Databricks Runtime Version e.g. 15.3.x-scala2.12 + - uc_catalog_name : unity catalog name e.g. ravi_dlt_meta_uc + - dbfs_path: Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines e.g. dbfs:/tmp/DLT-META/demo/ + - eventhub_namespace: Eventhub namespace e.g. dltmeta + - eventhub_name : Primary Eventhubname e.g. dltmeta_demo + - eventhub_name_append_flow: Secondary eventhub name for appendflow feed e.g. dltmeta_demo_af + - eventhub_producer_accesskey_name: Producer databricks access keyname e.g. RootManageSharedAccessKey + - eventhub_consumer_accesskey_name: Consumer databricks access keyname e.g. RootManageSharedAccessKey + - eventhub_secrets_scope_name: Databricks secret scope name e.g. eventhubs_dltmeta_creds + - eventhub_port: Eventhub port + +7. ```commandline + python3 demo/launch_af_eventhub_demo.py --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/tmp/DLT-META/demo/ --uc_catalog_name=ravi_dlt_meta_uc --eventhub_name=dltmeta_demo --eventhub_name_append_flow=dltmeta_demo_af --eventhub_secrets_scope_name=dltmeta_eventhub_creds --eventhub_namespace=dltmeta --eventhub_port=9093 --eventhub_producer_accesskey_name=RootManageSharedAccessKey --eventhub_consumer_accesskey_name=RootManageSharedAccessKey --eventhub_accesskey_secret_name=RootManageSharedAccessKey --uc_catalog_name=ravi_dlt_meta_uc + ``` + +![af_eh_demo.png](docs/static/images/af_eh_demo.png) +>>>>>>> Stashed changes diff --git a/docs/content/demo/Append_FLOW_CF.md b/docs/content/demo/Append_FLOW_CF.md new file mode 100644 index 0000000..2834371 --- /dev/null +++ b/docs/content/demo/Append_FLOW_CF.md @@ -0,0 +1,46 @@ +--- +title: "Append FLOW Autoloader Demo" +date: 2021-08-04T14:25:26-04:00 +weight: 23 +draft: false +--- + +### Append FLOW Autoloader Demo: +This demo will perform following tasks: +- Read from different source paths using autoloader and write to same target using [dlt.append_flow](https://docs.databricks.com/en/delta-live-tables/flows.html#append-flows) API +- Read from different delta tables and write to same silver table using append_flow API +- Add file_name and file_path to target bronze table for autoloader source using [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html) +## Append flow with autoloader + +1. Launch Terminal/Command prompt + +2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html) + +3. ```commandline + git clone https://github.com/databrickslabs/dlt-meta.git + ``` + +4. ```commandline + cd dlt-meta + ``` + +5. Set python environment variable into terminal + ```commandline + dlt_meta_home=$(pwd) + ``` + + ```commandline + export PYTHONPATH=$dlt_meta_home + ``` + +6. ```commandline + python demo/launch_af_cloudfiles_demo.py --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/tmp/DLT-META/demo/ --uc_catalog_name=ravi_dlt_meta_uc + ``` + +- cloud_provider_name : aws or azure or gcp +- db_version : Databricks Runtime Version +- dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines +- uc_catalog_name: Unity catalog name +- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token + +![af_am_demo.png](docs/static/images/af_am_demo.png) \ No newline at end of file diff --git a/docs/content/demo/Append_FLOW_EH.md b/docs/content/demo/Append_FLOW_EH.md new file mode 100644 index 0000000..a4f995b --- /dev/null +++ b/docs/content/demo/Append_FLOW_EH.md @@ -0,0 +1,63 @@ +--- +title: "Append FLOW Eventhub Demo" +date: 2021-08-04T14:25:26-04:00 +weight: 24 +draft: false +--- + +### Append FLOW Autoloader Demo: +- Read from different eventhub topics and write to same target tables using [dlt.append_flow](https://docs.databricks.com/en/delta-live-tables/flows.html#append-flows) API + +### Steps: +1. Launch Terminal/Command prompt + +2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html) + +3. ```commandline + git clone https://github.com/databrickslabs/dlt-meta.git + ``` + +4. ```commandline + cd dlt-meta + ``` +5. Set python environment variable into terminal + ```commandline + dlt_meta_home=$(pwd) + ``` + ```commandline + export PYTHONPATH=$dlt_meta_home + ``` +6. Eventhub +- Needs eventhub instance running +- Need two eventhub topics first for main feed (eventhub_name) and second for append flow feed (eventhub_name_append_flow) +- Create databricks secrets scope for eventhub keys + - ``` + commandline databricks secrets create-scope eventhubs_dltmeta_creds + ``` + - ```commandline + databricks secrets put-secret --json '{ + "scope": "eventhubs_dltmeta_creds", + "key": "RootManageSharedAccessKey", + "string_value": "<>" + }' + ``` +- Create databricks secrets to store producer and consumer keys using the scope created in step 2 + +- Following are the mandatory arguments for running EventHubs demo + - cloud_provider_name: Cloud provider name e.g. aws or azure + - dbr_version: Databricks Runtime Version e.g. 15.3.x-scala2.12 + - uc_catalog_name : unity catalog name e.g. ravi_dlt_meta_uc + - dbfs_path: Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines e.g. dbfs:/tmp/DLT-META/demo/ + - eventhub_namespace: Eventhub namespace e.g. dltmeta + - eventhub_name : Primary Eventhubname e.g. dltmeta_demo + - eventhub_name_append_flow: Secondary eventhub name for appendflow feed e.g. dltmeta_demo_af + - eventhub_producer_accesskey_name: Producer databricks access keyname e.g. RootManageSharedAccessKey + - eventhub_consumer_accesskey_name: Consumer databricks access keyname e.g. RootManageSharedAccessKey + - eventhub_secrets_scope_name: Databricks secret scope name e.g. eventhubs_dltmeta_creds + - eventhub_port: Eventhub port + +7. ```commandline + python3 demo/launch_af_eventhub_demo.py --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/tmp/DLT-META/demo/ --uc_catalog_name=ravi_dlt_meta_uc --eventhub_name=dltmeta_demo --eventhub_name_append_flow=dltmeta_demo_af --eventhub_secrets_scope_name=dltmeta_eventhub_creds --eventhub_namespace=dltmeta --eventhub_port=9093 --eventhub_producer_accesskey_name=RootManageSharedAccessKey --eventhub_consumer_accesskey_name=RootManageSharedAccessKey --eventhub_accesskey_secret_name=RootManageSharedAccessKey --uc_catalog_name=ravi_dlt_meta_uc + ``` + +![af_eh_demo.png](docs/static/images/af_eh_demo.png) diff --git a/docs/content/demo/_index.md b/docs/content/demo/_index.md index 9e3c67a..c7bbf9c 100644 --- a/docs/content/demo/_index.md +++ b/docs/content/demo/_index.md @@ -6,4 +6,6 @@ draft: false --- 1. **DAIS 2023 DEMO**: Showcases DLT-META's capabilities of creating Bronze and Silver DLT pipelines with initial and incremental mode automatically. - 2. **Databricks Techsummit Demo**: 100s of data sources ingestion in bronze and silver DLT pipelines automatically. \ No newline at end of file + 2. **Databricks Techsummit Demo**: 100s of data sources ingestion in bronze and silver DLT pipelines automatically. + 3. **Append FLOW Autoloader Demo**: Write to same target from multiple sources using append_flow and adding file metadata using [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html) + 4. **Append FLOW Eventhub Demo**: Write to same target from multiple sources using append_flow and adding using [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html) \ No newline at end of file