Skip to content

Commit

Permalink
Merge pull request #81 from databrickslabs/feature/v0.0.8_docs
Browse files Browse the repository at this point in the history
Feature/v0.0.8 docs
  • Loading branch information
ravi-databricks authored Jul 30, 2024
2 parents e3e526e + 72242c0 commit 759623e
Show file tree
Hide file tree
Showing 5 changed files with 121 additions and 14 deletions.
9 changes: 3 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,22 +126,19 @@ If you want to run existing demo files please follow these steps before running
pip install databricks-sdk
```

```commandline
databricks labs dlt-meta onboard
```

```commandline
dlt_meta_home=$(pwd)
```

```commandline
export PYTHONPATH=$dlt_meta_home
```
![onboardingDLTMeta.gif](docs/static/images/onboardingDLTMeta.gif)

```commandline
databricks labs dlt-meta onboard
```
![onboardingDLTMeta.gif](docs/static/images/onboardingDLTMeta.gif)


Above commands will prompt you to provide onboarding details. If you have cloned dlt-meta git repo then accept defaults which will launch config from demo folder.
![onboardingDLTMeta_2.gif](docs/static/images/onboardingDLTMeta_2.gif)

Expand Down
13 changes: 6 additions & 7 deletions demo/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
# [DLT-META](https://github.com/databrickslabs/dlt-meta) DEMO's
1. [DAIS 2023 DEMO](#dais-2023-demo): Showcases DLT-META's capabilities of creating Bronze and Silver DLT pipelines with initial and incremental mode automatically.
2. [Databricks Techsummit Demo](#databricks-tech-summit-fy2024-demo): 100s of data sources ingestion in bronze and silver DLT pipelines automatically.
3. [Append FLOW Autoloader Demo](#append-flow-autoloader-file-metadata-demo): Write to same target from multiple sources using append_flow and adding file metadata using
autoloaders _metadata column
3. [Append FLOW Eventhub Demo](#append-flow-eventhub-demo): Write to same target from multiple sources using append_flow and adding file metadata using
autoloaders _metadata column
3. [Append FLOW Autoloader Demo](#append-flow-autoloader-file-metadata-demo): Write to same target from multiple sources using [dlt.append_flow](https://docs.databricks.com/en/delta-live-tables/flows.html#append-flows) and adding [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html)
4. [Append FLOW Eventhub Demo](#append-flow-eventhub-demo): Write to same target from multiple sources using [dlt.append_flow](https://docs.databricks.com/en/delta-live-tables/flows.html#append-flows) and adding [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html)



# DAIS 2023 DEMO
Expand Down Expand Up @@ -107,12 +106,12 @@ This demo will launch auto generated tables(100s) inside single bronze and silve

- Paste to command prompt


# Append Flow Autoloader file metadata demo:
This demo will perform following tasks:
- Read from different source paths using autoloader and write to same target using append_flow API
- Read from different delta tables and write to same silver table using append_flow API
- Add file_name and file_path to target bronze table for autoloader source
## Append flow with autoloader
- Add file_name and file_path to target bronze table for autoloader source using [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html)

1. Launch Terminal/Command prompt

Expand Down Expand Up @@ -202,4 +201,4 @@ This demo will perform following tasks:
python3 demo/launch_af_eventhub_demo.py --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/tmp/DLT-META/demo/ --uc_catalog_name=ravi_dlt_meta_uc --eventhub_name=dltmeta_demo --eventhub_name_append_flow=dltmeta_demo_af --eventhub_secrets_scope_name=dltmeta_eventhub_creds --eventhub_namespace=dltmeta --eventhub_port=9093 --eventhub_producer_accesskey_name=RootManageSharedAccessKey --eventhub_consumer_accesskey_name=RootManageSharedAccessKey --eventhub_accesskey_secret_name=RootManageSharedAccessKey --uc_catalog_name=ravi_dlt_meta_uc
```

![af_eh_demo.png](docs/static/images/af_eh_demo.png)
![af_eh_demo.png](docs/static/images/af_eh_demo.png)
46 changes: 46 additions & 0 deletions docs/content/demo/Append_FLOW_CF.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
title: "Append FLOW Autoloader Demo"
date: 2021-08-04T14:25:26-04:00
weight: 23
draft: false
---

### Append FLOW Autoloader Demo:
This demo will perform following tasks:
- Read from different source paths using autoloader and write to same target using [dlt.append_flow](https://docs.databricks.com/en/delta-live-tables/flows.html#append-flows) API
- Read from different delta tables and write to same silver table using append_flow API
- Add file_name and file_path to target bronze table for autoloader source using [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html)
## Append flow with autoloader

1. Launch Terminal/Command prompt

2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html)

3. ```commandline
git clone https://github.com/databrickslabs/dlt-meta.git
```

4. ```commandline
cd dlt-meta
```

5. Set python environment variable into terminal
```commandline
dlt_meta_home=$(pwd)
```

```commandline
export PYTHONPATH=$dlt_meta_home
```

6. ```commandline
python demo/launch_af_cloudfiles_demo.py --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/tmp/DLT-META/demo/ --uc_catalog_name=ravi_dlt_meta_uc
```

- cloud_provider_name : aws or azure or gcp
- db_version : Databricks Runtime Version
- dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines
- uc_catalog_name: Unity catalog name
- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token

![af_am_demo.png](docs/static/images/af_am_demo.png)
63 changes: 63 additions & 0 deletions docs/content/demo/Append_FLOW_EH.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
title: "Append FLOW Eventhub Demo"
date: 2021-08-04T14:25:26-04:00
weight: 24
draft: false
---

### Append FLOW Autoloader Demo:
- Read from different eventhub topics and write to same target tables using [dlt.append_flow](https://docs.databricks.com/en/delta-live-tables/flows.html#append-flows) API

### Steps:
1. Launch Terminal/Command prompt

2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html)

3. ```commandline
git clone https://github.com/databrickslabs/dlt-meta.git
```

4. ```commandline
cd dlt-meta
```
5. Set python environment variable into terminal
```commandline
dlt_meta_home=$(pwd)
```
```commandline
export PYTHONPATH=$dlt_meta_home
```
6. Eventhub
- Needs eventhub instance running
- Need two eventhub topics first for main feed (eventhub_name) and second for append flow feed (eventhub_name_append_flow)
- Create databricks secrets scope for eventhub keys
- ```
commandline databricks secrets create-scope eventhubs_dltmeta_creds
```
- ```commandline
databricks secrets put-secret --json '{
"scope": "eventhubs_dltmeta_creds",
"key": "RootManageSharedAccessKey",
"string_value": "<<value>>"
}'
```
- Create databricks secrets to store producer and consumer keys using the scope created in step 2

- Following are the mandatory arguments for running EventHubs demo
- cloud_provider_name: Cloud provider name e.g. aws or azure
- dbr_version: Databricks Runtime Version e.g. 15.3.x-scala2.12
- uc_catalog_name : unity catalog name e.g. ravi_dlt_meta_uc
- dbfs_path: Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines e.g. dbfs:/tmp/DLT-META/demo/
- eventhub_namespace: Eventhub namespace e.g. dltmeta
- eventhub_name : Primary Eventhubname e.g. dltmeta_demo
- eventhub_name_append_flow: Secondary eventhub name for appendflow feed e.g. dltmeta_demo_af
- eventhub_producer_accesskey_name: Producer databricks access keyname e.g. RootManageSharedAccessKey
- eventhub_consumer_accesskey_name: Consumer databricks access keyname e.g. RootManageSharedAccessKey
- eventhub_secrets_scope_name: Databricks secret scope name e.g. eventhubs_dltmeta_creds
- eventhub_port: Eventhub port

7. ```commandline
python3 demo/launch_af_eventhub_demo.py --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/tmp/DLT-META/demo/ --uc_catalog_name=ravi_dlt_meta_uc --eventhub_name=dltmeta_demo --eventhub_name_append_flow=dltmeta_demo_af --eventhub_secrets_scope_name=dltmeta_eventhub_creds --eventhub_namespace=dltmeta --eventhub_port=9093 --eventhub_producer_accesskey_name=RootManageSharedAccessKey --eventhub_consumer_accesskey_name=RootManageSharedAccessKey --eventhub_accesskey_secret_name=RootManageSharedAccessKey --uc_catalog_name=ravi_dlt_meta_uc
```

![af_eh_demo.png](docs/static/images/af_eh_demo.png)
4 changes: 3 additions & 1 deletion docs/content/demo/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,6 @@ draft: false
---

1. **DAIS 2023 DEMO**: Showcases DLT-META's capabilities of creating Bronze and Silver DLT pipelines with initial and incremental mode automatically.
2. **Databricks Techsummit Demo**: 100s of data sources ingestion in bronze and silver DLT pipelines automatically.
2. **Databricks Techsummit Demo**: 100s of data sources ingestion in bronze and silver DLT pipelines automatically.
3. **Append FLOW Autoloader Demo**: Write to same target from multiple sources using append_flow and adding file metadata using [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html)
4. **Append FLOW Eventhub Demo**: Write to same target from multiple sources using append_flow and adding using [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html)

0 comments on commit 759623e

Please sign in to comment.