This template was developed to help me in my learning when I started studying Databrick/Spark, but now I'm making it available to provide a good experience for other absolute beginners. You can start it even if you don't know how to create a dataframe or even a new directory to write files, tables and databases.
- Create 3 pySpark DataFrames for relational data transformation practice
- Create 4 folders to write data
[current user directory, raw, structured, curated]
- Create 1 databese in the strutured zone
- You can see the source code and learn from it
- Reset the Environment (has the functions to clean your environment)
- How to create tables (in Hive database), see the 04-Table_Reference notebook
- Python Unit Testing with unittest on Databricks
- Config-DataFrame
- Config-Directories
- Config-Database
- common
- Reset-Environment
- Helpers
- Test
- Test_Runner
- 01-Training_Python
- 02-Table-Reference
- Just import the data-engineering.dbc file in your Databricks Community account and run the
01-Training_Python
notebook.