Databricks and Data Factory: Creating and Orchestrating Pipelines in the Cloud

In this project, an entire data engineering pipeline was developed following the workflow below:

I started by creating and structuring a Data Lake in Azure. This Data Lake was organized into three layers:

Inbound Layer;
Bronze Layer;
Silver Layer.

The Inbound Layer is the entry point, where I added the raw real estate database. With this data in the entry layer, I used Databricks to apply specific transformations to the data and pass it through the Bronze and Silver layers of the Data Lake.

Once the data flow was structured, I used Azure Data Factory to orchestrate and automate the execution of this pipeline based on a specific time interval.

This project was developed for a course I taught at Alura. You can access it by clicking on the link: Course's link

Technologies used

Azure Data Lake Storage Gen 2;
Azure Databricks;
Azure Data Factory;
Scala.

Contact

Email: millenagena@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
databricks-curso		databricks-curso
factory		factory
linkedService		linkedService
notebooks		notebooks
pipeline		pipeline
trigger		trigger
.gitignore		.gitignore
README.md		README.md
publish_config.json		publish_config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Databricks and Data Factory: Creating and Orchestrating Pipelines in the Cloud

Technologies used

Contact

About

Releases

Packages

Languages

millenagena/pipeline-databricks-azure

Folders and files

Latest commit

History

Repository files navigation

Databricks and Data Factory: Creating and Orchestrating Pipelines in the Cloud

Technologies used

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages