Skip to content

Commit

Permalink
Merge pull request #28 from databrickslabs/feature/dlt-meta-uc
Browse files Browse the repository at this point in the history
Unity Catalog and Databricks Labs CLI Support
  • Loading branch information
ravi-databricks authored Jan 5, 2024
2 parents f5f6c34 + b0e2e31 commit 2a93dd9
Show file tree
Hide file tree
Showing 122 changed files with 4,377 additions and 2,251 deletions.
3 changes: 3 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ include = src/*.py
omit =
*/site-packages/*
tests/*
src/install.py
src/config.py
src/cli.py

[report]
exclude_lines =
Expand Down
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -151,4 +151,9 @@ deployment-merged.yaml
.vscode/

# ignore integration test onboarding file.
integration-tests/conf/dlt-meta/onboarding.json
integration-tests/conf/dlt-meta/onboarding.json

.databricks
.databricks-login.json
demo/conf/onboarding.json
integration_tests/conf/onboarding.json
10 changes: 3 additions & 7 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,9 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

**NOTE:** For CLI interfaces, we support SemVer approach. However, for API components we don't use SemVer as of now. This may lead to instability when using dbx API methods directly.

[Please read through the Keep a Changelog (~5min)](https://keepachangelog.com/en/1.0.0/).
## [v.0.0.5]
- Enabled Unity Catalog support: [PR](https://github.com/databrickslabs/dlt-meta/pull/28)
- Added databricks labs cli: [PR](https://github.com/databrickslabs/dlt-meta/pull/28)

## [v0.0.4] - 2023-10-09
### Added
Expand Down
6 changes: 6 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
clean:
rm -fr build .databricks dlt_meta.egg-info

dev:
python3 -m venv .databricks
.databricks/bin/python -m pip install -e .
96 changes: 96 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,103 @@ With this framework you need to record the source and target metadata in an onbo

## Getting Started
Refer to the [Getting Started](https://databrickslabs.github.io/dlt-meta/getting_started)
### Databricks Labs DLT-META CLI lets you run onboard and deploy in interactive python terminal
#### pre-requisites:
- [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/tutorial.html)
- Python 3.8.0 +
#### Steps:
- ``` git clone dlt-meta ```
- ``` cd dlt-meta ```
- ``` python -m venv .venv ```
- ```source .venv/bin/activate ```
- ``` pip install databricks-sdk ```
- ```databricks labs dlt-meta onboard```
- - Above command will prompt you to provide onboarding details. If you have cloned dlt-meta git repo then accept defaults which will launch config from demo folder.

``` Provide onboarding file path (default: demo/conf/onboarding.template):
Provide onboarding files local directory (default: demo/):
Provide dbfs path (default: dbfs:/dlt-meta_cli_demo):
Provide databricks runtime version (default: 14.2.x-scala2.12):
Run onboarding with unity catalog enabled?
[0] False
[1] True
Enter a number between 0 and 1: 1
Provide unity catalog name: ravi_dlt_meta_uc
Provide dlt meta schema name (default: dlt_meta_dataflowspecs_203b9da04bdc49f78cdc6c379d1c9ead):
Provide dlt meta bronze layer schema name (default: dltmeta_bronze_cf5956873137432294892fbb2dc34fdb):
Provide dlt meta silver layer schema name (default: dltmeta_silver_5afa2184543342f98f87b30d92b8c76f):
Provide dlt meta layer
[0] bronze
[1] bronze_silver
[2] silver
Enter a number between 0 and 2: 1
Provide bronze dataflow spec table name (default: bronze_dataflowspec):
Provide silver dataflow spec table name (default: silver_dataflowspec):
Overwrite dataflow spec?
[0] False
[1] True
Enter a number between 0 and 1: 1
Provide dataflow spec version (default: v1):
Provide environment name (default: prod): prod
Provide import author name (default: ravi.gawai):
Provide cloud provider name
[0] aws
[1] azure
[2] gcp
Enter a number between 0 and 2: 0
Do you want to update ws paths, catalog, schema details to your onboarding file?
[0] False
[1] True
```
- Goto your databricks workspace and located onboarding job under: Workflow->Jobs runs
- Once onboarding jobs is finished deploy `bronze` and `silver` DLT using below command
- ```databricks labs dlt-meta deploy```
- - Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps
- - Bronze DLT
```
Deploy DLT-META with unity catalog enabled?
[0] False
[1] True
Enter a number between 0 and 1: 1
Provide unity catalog name: ravi_dlt_meta_uc
Deploy DLT-META with serverless?
[0] False
[1] True
Enter a number between 0 and 1: 1
Provide dlt meta layer
[0] bronze
[1] silver
Enter a number between 0 and 1: 0
Provide dlt meta onboard group: A1
Provide dlt_meta dataflowspec schema name: dlt_meta_dataflowspecs_203b9da04bdc49f78cdc6c379d1c9ead
Provide bronze dataflowspec table name (default: bronze_dataflowspec):
Provide dlt meta pipeline name (default: dlt_meta_bronze_pipeline_2aee3eb837f3439899eef61b76b80d53):
Provide dlt target schema name: dltmeta_bronze_cf5956873137432294892fbb2dc34fdb
```

- Silver DLT
- - ```databricks labs dlt-meta deploy```
- - Above command will prompt you to provide dlt details. Please provide respective details for schema which you provided in above steps
```
Deploy DLT-META with unity catalog enabled?
[0] False
[1] True
Enter a number between 0 and 1: 1
Provide unity catalog name: ravi_dlt_meta_uc
Deploy DLT-META with serverless?
[0] False
[1] True
Enter a number between 0 and 1: 1
Provide dlt meta layer
[0] bronze
[1] silver
Enter a number between 0 and 1: 1
Provide dlt meta onboard group: A1
Provide dlt_meta dataflowspec schema name: dlt_meta_dataflowspecs_203b9da04bdc49f78cdc6c379d1c9ead
Provide silver dataflowspec table name (default: silver_dataflowspec):
Provide dlt meta pipeline name (default: dlt_meta_silver_pipeline_2147545f9b6b4a8a834f62e873fa1364):
Provide dlt target schema name: dltmeta_silver_5afa2184543342f98f87b30d92b8c76f
```
## More questions
Refer to the [FAQ](https://databrickslabs.github.io/dlt-meta/faq)
and DLT-META [documentation](https://databrickslabs.github.io/dlt-meta/)
Expand Down
84 changes: 84 additions & 0 deletions demo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# [DLT-META](https://github.com/databrickslabs/dlt-meta) DEMO's
1. [DAIS 2023 DEMO](#dais-2023-demo): Showcases DLT-META's capabilities of creating Bronze and Silver DLT pipelines with initial and incremental mode automatically.
2. [Databricks Techsummit Demo](#databricks-tech-summit-fy2024-demo): 100s of data sources ingestion in bronze and silver DLT pipelines automatically.


# DAIS 2023 DEMO
This Demo launches Bronze and Silver DLT pipleines with following activities:
- Customer and Transactions feeds for initial load
- Adds new feeds Product and Stores to existing Bronze and Silver DLT pipelines with metadata changes.
- Runs Bronze and Silver DLT for incremental load for CDC events

### Steps:
1. Launch Terminal/Command promt

2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html)

3. ```git clone https://github.com/databrickslabs/dlt-meta.git ```

4. ```cd dlt-meta```

5. Set python environment variable into terminal
```
export PYTHONPATH=<<local dlt-meta path>>
```

6. Run the command ```python demo/launch_dais_demo.py --username=<<your databricks username>> --source=cloudfiles --uc_catalog_name=<<uc catalog name>> --cloud_provider_name=aws --dbr_version=13.3.x-scala2.12 --dbfs_path=dbfs:/dais-dlt-meta-demo-automated_new```
- cloud_provider_name : aws or azure or gcp
- db_version : Databricks Runtime Version
- dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines
- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token.

- - 6a. Databricks Workspace URL:
- - Enter your workspace URL, with the format https://<instance-name>.cloud.databricks.com. To get your workspace URL, see Workspace instance names, URLs, and IDs.

- - 6b. Token:
- In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the drop down.

- On the Access tokens tab, click Generate new token.

- (Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).

- Click Generate.

- Copy the displayed token

- Paste to command prompt

# Databricks Tech Summit FY2024 DEMO:
This demo will launch auto generated tables(100s) inside single bronze and silver DLT pipeline using dlt-meta.

1. Launch Terminal/Command promt

2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html)

3. ```git clone https://github.com/databrickslabs/dlt-meta.git ```

4. ```cd dlt-meta```

5. Set python environment variable into terminal
```
export PYTHONPATH=<<local dlt-meta path>>
```

6. Run the command ```python demo/launch_techsummit_demo.py --username=ravi.gawai@databricks.com --source=cloudfiles --cloud_provider_name=aws --dbr_version=13.3.x-scala2.12 --dbfs_path=dbfs:/techsummit-dlt-meta-demo-automated ```
- cloud_provider_name : aws or azure or gcp
- db_version : Databricks Runtime Version
- dbfs_path : Path on your Databricks workspace where demo will be copied for launching DLT-META Pipelines
- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token

- - 6a. Databricks Workspace URL:
- Enter your workspace URL, with the format https://<instance-name>.cloud.databricks.com. To get your workspace URL, see Workspace instance names, URLs, and IDs.

- - 6b. Token:
- In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the drop down.

- On the Access tokens tab, click Generate new token.

- (Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).

- Click Generate.

- Copy the displayed token

- Paste to command prompt
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit 2a93dd9

Please sign in to comment.