GitHub - polo2444172276/Udacity-Data-Engineering-Nanodegree: Completed Udacity's data engineering nano degree. Went through a series of exercises and projects to learn and practice the trendy big data management tools.

About

Data engineers are responsible for making data accessible to all the people who use it across an organization. That could mean creating a data warehouse for the analytics team, building a data pipeline for a front-end application, or summarizing massive datasets to be more user-friendly.

Program Details

During this program, we will complete four courses and five projects. Throughout the projects, we will play the part of a data engineer at a music streaming company. We will work with the same type of data in each project, but with increasing data volume, velocity, and complexity. Here’s a course-by- course breakdown.

Course 1 – Data Modeling

In this course, we will learn to create relational and NoSQL data models to fit the diverse needs of data consumers. In the project, we will build SQL (Postgres) and NoSQL (Apache Cassandra) data models using user activity data for a music streaming app.

Course 2 – Cloud Data Warehouses

In this course, we will learn to create cloud-based data warehouses. In the project, we will build an ELT pipeline that extracts data from Amazon S3, stages it in Amazon Redshift, and transforms it into a set of dimensional tables.

Course 3 – Data Lakes with Apache Spark

In this course, we will learn more about the big data ecosystem, how to work with massive datasets with Apache Spark, and how to store big data in a data lake. In the project, we will build an ETL pipeline for a data lake using Apache Spark and S3.

Course 4 – Data Pipelines with Apache Airflow

In this course, we will learn to schedule, automate, and monitor data pipelines using Apache Airflow. In the project, they’ll continue your work on the music streaming company’s data infrastructure by creating and automating a set of data pipelines.

Capstone Project

In the Capstone project, we combine Twitter data, World happiness index data and Earth surface temperature data data to explore whether there is any correlation between the above. The Twitter data is dynamic and the other two dataset are static in nature. The general idea of this project is to extract Twitter data, analyze its sentiment and use the resulting data to gain insights with the other datasets.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Exercises		Exercises
Project 1 Data Modeling with PostgreSQL		Project 1 Data Modeling with PostgreSQL
Project 2 Data Modeling with Apache Cassandra		Project 2 Data Modeling with Apache Cassandra
Project 3 Data Warehouse on AWS Redshift		Project 3 Data Warehouse on AWS Redshift
Project 4 Data Lake on AWS S3		Project 4 Data Lake on AWS S3
Project 5 Data Pipelines with Apache Airflow		Project 5 Data Pipelines with Apache Airflow
Project 6 Capstone Project		Project 6 Capstone Project
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Program Details

Course 1 – Data Modeling

Course 2 – Cloud Data Warehouses

Course 3 – Data Lakes with Apache Spark

Course 4 – Data Pipelines with Apache Airflow

Capstone Project

About

Releases

Packages

Languages

polo2444172276/Udacity-Data-Engineering-Nanodegree

Folders and files

Latest commit

History

Repository files navigation

About

Program Details

Course 1 – Data Modeling

Course 2 – Cloud Data Warehouses

Course 3 – Data Lakes with Apache Spark

Course 4 – Data Pipelines with Apache Airflow

Capstone Project

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages