Skip to content

Commit

Permalink
Merge pull request #83 from databrickslabs/82-create-demo-to-showcase…
Browse files Browse the repository at this point in the history
…-fanout-architecture-in-silver-layer-using-dlt-meta

Added demo to showcase:

Silver fanout architecture
used cars input dataset containing rows for different countries
created 5 silver tables from single cars tables based on filter condition
  • Loading branch information
ravi-databricks authored Aug 5, 2024
2 parents 759623e + e085a0a commit a15c517
Show file tree
Hide file tree
Showing 20 changed files with 30,470 additions and 24 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
- Added support for Bring your own custom transformation: [Issue](https://github.com/databrickslabs/dlt-meta/issues/68)
- Added support to Unify PyPI releases with GitHub OIDC: [PR](https://github.com/databrickslabs/dlt-meta/pull/62)
- Added demo for append_flow and file_metadata options: [PR](https://github.com/databrickslabs/dlt-meta/issues/74)
- Added Demo for silver fanout architecture: [PR](https://github.com/databrickslabs/dlt-meta/pull/83)
- Added documentation in docs site for new features: [PR](https://github.com/databrickslabs/dlt-meta/pull/64)
- Added unit tests to showcase silver layer fanout examples: [PR](https://github.com/databrickslabs/dlt-meta/pull/67)
- Fixed issue for No such file or directory: '/demo' :[PR](https://github.com/databrickslabs/dlt-meta/issues/59)
Expand Down
56 changes: 55 additions & 1 deletion demo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
2. [Databricks Techsummit Demo](#databricks-tech-summit-fy2024-demo): 100s of data sources ingestion in bronze and silver DLT pipelines automatically.
3. [Append FLOW Autoloader Demo](#append-flow-autoloader-file-metadata-demo): Write to same target from multiple sources using [dlt.append_flow](https://docs.databricks.com/en/delta-live-tables/flows.html#append-flows) and adding [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html)
4. [Append FLOW Eventhub Demo](#append-flow-eventhub-demo): Write to same target from multiple sources using [dlt.append_flow](https://docs.databricks.com/en/delta-live-tables/flows.html#append-flows) and adding [File metadata column](https://docs.databricks.com/en/ingestion/file-metadata-column.html)
5. [Silver Fanout Demo](#silver-fanout-demo): This demo showcases the implementation of fanout architecture in the silver layer.



Expand Down Expand Up @@ -35,7 +36,7 @@ This Demo launches Bronze and Silver DLT pipelines with following activities:
export PYTHONPATH=$dlt_meta_home
```

6. Run the command ```python demo/launch_dais_demo.py --source=cloudfiles --uc_catalog_name=<<uc catalog name>> --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/dais-dlt-meta-demo-automated_new```
6. Run the command ```python demo/launch_dais_demo.py --source=cloudfiles --uc_catalog_name=<<uc catalog name>> --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/dais-dlt-meta-demo-automated```
- cloud_provider_name : aws or azure or gcp
- db_version : Databricks Runtime Version
- dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines
Expand Down Expand Up @@ -202,3 +203,56 @@ This demo will perform following tasks:
```

![af_eh_demo.png](docs/static/images/af_eh_demo.png)


# Silver Fanout Demo
- This demo will showcase the onboarding process for the silver fanout pattern.
- Run the onboarding process for the bronze cars table, which contains data from various countries.
- Run the onboarding process for the silver tables, which have a `where_clause` based on the country condition specified in [silver_transformations_cars.json](https://github.com/databrickslabs/dlt-meta/blob/main/demo/conf/silver_transformations_cars.json).
- Run the Bronze DLT pipeline which will produce cars table.
- Run Silver DLT pipeline, fanning out from the bronze cars table to country-specific tables such as cars_usa, cars_uk, cars_germany, and cars_japan.

### Steps:
1. Launch Terminal/Command prompt

2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html)

3. ```commandline
git clone https://github.com/databrickslabs/dlt-meta.git
```

4. ```commandline
cd dlt-meta
```
5. Set python environment variable into terminal
```commandline
dlt_meta_home=$(pwd)
```
```commandline
export PYTHONPATH=$dlt_meta_home
6. Run the command ```python demo/launch_silver_fanout_demo.py --source=cloudfiles --uc_catalog_name=<<uc catalog name>> --cloud_provider_name=aws --dbr_version=15.3.x-scala2.12 --dbfs_path=dbfs:/dais-dlt-meta-silver-fanout```
- cloud_provider_name : aws or azure
- db_version : Databricks Runtime Version
- dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines
- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token.

- - 6a. Databricks Workspace URL:
- - Enter your workspace URL, with the format https://<instance-name>.cloud.databricks.com. To get your workspace URL, see Workspace instance names, URLs, and IDs.

- - 6b. Token:
- In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the drop down.

- On the Access tokens tab, click Generate new token.

- (Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).

- Click Generate.

- Copy the displayed token

- Paste to command prompt

![silver_fanout_workflow.png](docs/static/images/silver_fanout_workflow.png)

![silver_fanout_dlt.png](docs/static/images/silver_fanout_dlt.png)
21 changes: 21 additions & 0 deletions demo/conf/onboarding_cars.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
[
{
"data_flow_id": "100",
"data_flow_group": "A1",
"source_system": "mysql",
"source_format": "cloudFiles",
"source_details": {
"source_path_demo": "{dbfs_path}/demo/resources/data/cars"
},
"bronze_database_demo": "{uc_catalog_name}.{bronze_schema}",
"bronze_table": "cars",
"bronze_reader_options": {
"cloudFiles.format": "csv",
"cloudFiles.rescuedDataColumn": "_rescued_data",
"header": "true"
},
"silver_database_demo": "{uc_catalog_name}.{silver_schema}",
"silver_table": "cars_usa",
"silver_transformation_json_demo": "{dbfs_path}/demo/conf/silver_transformations_cars.json"
}
]
29 changes: 29 additions & 0 deletions demo/conf/onboarding_fanout_cars.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
[
{
"data_flow_id": "101",
"data_flow_group": "A1",
"bronze_database_demo": "{uc_catalog_name}.{bronze_schema}",
"bronze_table": "cars",
"silver_database_demo": "{uc_catalog_name}.{silver_schema}",
"silver_table": "cars_germany",
"silver_transformation_json_demo": "{dbfs_path}/demo/conf/silver_transformations_cars.json"
},
{
"data_flow_id": "102",
"data_flow_group": "A1",
"bronze_database_demo": "{uc_catalog_name}.{bronze_schema}",
"bronze_table": "cars",
"silver_database_demo": "{uc_catalog_name}.{silver_schema}",
"silver_table": "cars_uk",
"silver_transformation_json_demo": "{dbfs_path}/demo/conf/silver_transformations_cars.json"
},
{
"data_flow_id": "103",
"data_flow_group": "A1",
"bronze_database_demo": "{uc_catalog_name}.{bronze_schema}",
"bronze_table": "cars",
"silver_database_demo": "{uc_catalog_name}.{silver_schema}",
"silver_table": "cars_japan",
"silver_transformation_json_demo": "{dbfs_path}/demo/conf/silver_transformations_cars.json"
}
]
50 changes: 50 additions & 0 deletions demo/conf/silver_transformations_cars.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
[
{
"target_table": "cars_usa",
"select_exp": [
"concat(first_name,' ',last_name) as full_name",
"country",
"brand",
"model",
"color",
"cc_type"
],
"where_clause": ["country = 'United States'"]
},
{
"target_table": "cars_germany",
"select_exp": [
"concat(first_name,' ',last_name) as full_name",
"country",
"brand",
"model",
"color",
"cc_type"
],
"where_clause": ["country = 'Germany'"]
},
{
"target_table": "cars_uk",
"select_exp": [
"concat(first_name,' ',last_name) as full_name",
"country",
"brand",
"model",
"color",
"cc_type"
],
"where_clause": ["country = 'United Kingdom'"]
},
{
"target_table": "cars_japan",
"select_exp": [
"concat(first_name,' ',last_name) as full_name",
"country",
"brand",
"model",
"color",
"cc_type"
],
"where_clause": ["country = 'Japan'"]
}
]
Binary file modified demo/dbc/afam_eventhub_runners.dbc
Binary file not shown.
Binary file added demo/dbc/silver_fout_runners.dbc
Binary file not shown.
2 changes: 1 addition & 1 deletion demo/launch_af_cloudfiles_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ def launch_workflow(self, runner_conf: DLTMetaRunnerConf):
"--profile": "provide databricks cli profile name, if not provide databricks_host and token",
"--uc_catalog_name": "provide databricks uc_catalog name, this is required to create volume, schema, table",
"--cloud_provider_name": "provide cloud provider name. Supported values are aws , azure , gcp",
"--dbr_version": "Provide databricks runtime spark version e.g 11.3.x-scala2.12",
"--dbr_version": "Provide databricks runtime spark version e.g 15.3.x-scala2.12",
"--dbfs_path": "Provide databricks workspace dbfs path where you want run integration tests \
e.g --dbfs_path=dbfs:/tmp/DLT-META/"
}
Expand Down
2 changes: 1 addition & 1 deletion demo/launch_af_eventhub_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ def launch_workflow(self, runner_conf: DLTMetaRunnerConf):
"--profile": "provide databricks cli profile name, if not provide databricks_host and token",
"--uc_catalog_name": "provide databricks uc_catalog name, this is required to create volume, schema, table",
"--cloud_provider_name": "provide cloud provider name. Supported values are aws , azure , gcp",
"--dbr_version": "Provide databricks runtime spark version e.g 11.3.x-scala2.12",
"--dbr_version": "Provide databricks runtime spark version e.g 15.3.x-scala2.12",
"--dbfs_path": "Provide databricks workspace dbfs path where you want run integration tests \
e.g --dbfs_path=dbfs:/tmp/DLT-META/",
"--eventhub_name": "Provide eventhub_name e.g --eventhub_name=iot",
Expand Down
2 changes: 1 addition & 1 deletion demo/launch_dais_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ def create_daisdemo_workflow(self, runner_conf: DLTMetaRunnerConf):
"--uc_catalog_name": "provide databricks uc_catalog name, \
this is required to create volume, schema, table",
"--cloud_provider_name": "provide cloud provider name. Supported values are aws , azure , gcp",
"--dbr_version": "Provide databricks runtime spark version e.g 11.3.x-scala2.12",
"--dbr_version": "Provide databricks runtime spark version e.g 15.3.x-scala2.12",
"--dbfs_path": "Provide databricks workspace dbfs path where you want run integration tests \
e.g --dbfs_path=dbfs:/tmp/DLT-META/"}

Expand Down
Loading

0 comments on commit a15c517

Please sign in to comment.