Data-Engineer-Nanodegree-Projects-Udacity

Projects done in the Data Engineer Nanodegree by Udacity.com

Course 1: Data Modeling

Introduction to Data Modeling

Understand the purpose of data modeling
Identify the strengths and weaknesses of different types of databases and data storage techniques
Create a table in Postgres and Apache Cassandra

Relational Data Models

Understand when to use a relational database
Understand the difference between OLAP and OLTP databases
Create normalized data tables
Implement denormalized schemas (e.g. STAR, Snowflake)

NoSQL Data Models

Understand when to use NoSQL databases and how they differ from relational databases
Select the appropriate primary key and clustering columns for a given use case
Create a NoSQL database in Apache Cassandra

Project 1: Data Modeling with Postgres and Apache Cassandra

Course 2: Cloud Data Warehouses

Introduction to the Data Warehouses

Understand Data Warehousing architecture
Run an ETL process to denormalize a database (3NF to Star)
Create an OLAP cube from facts and dimensions
Compare columnar vs. row oriented approaches

Introduction to the Cloud with AWS

Understand cloud computing
Create an AWS account and understand their services
Set up Amazon S3, IAM, VPC, EC2, RDS PostgreSQL

Implementing Data Warehouses on AWS

Identify components of the Redshift architecture
Run ETL process to extract data from S3 into Redshift
Set up AWS infrastructure using Infrastructure as Code (IaC)
Design an optimized table by selecting the appropriate distribution style and sorting key

Project 2: Data Infrastructure on the Cloud

Course 3: Data Lakes with Spark

The Power of Spark

Understand the big data ecosystem
Understand when to use Spark and when not to use it

Data Wrangling with Spark

Manipulate data with SparkSQL and Spark Dataframes
Use Spark for ETL purposes

Debugging and Optimization

Troubleshoot common errors and optimize their code using the Spark WebUI

Introduction to Data Lakes

Understand the purpose and evolution of data lakes
Implement data lakes on Amazon S3, EMR, Athena, and Amazon Glue
Use Spark to run ELT processes and analytics on data of diverse sources, structures, and vintages
Understand the components and issues of data lakes

Project 3: Big Data with Spark

Course 4: Automate Data Pipelines

Data Pipelines

Create data pipelines with Apache Airflow
Set up task dependencies
Create data connections using hooks

Data Quality

Track data lineage
Set up data pipeline schedules
Partition data to optimize pipelines
Write tests to ensure data quality
Backfill data

Production Data Pipelines

Build reusable and maintainable pipelines
Build your own Apache Airflow plugins
Implement subDAGs
Set up task boundaries
Monitor data pipelines

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
Capstone-Project		Capstone-Project
Project01-Data-Modeling-Postgres		Project01-Data-Modeling-Postgres
Project02-Data-Modeling-Cassandra		Project02-Data-Modeling-Cassandra
Project03-Data-warehouse		Project03-Data-warehouse
Project04-Data-lake		Project04-Data-lake
Project05-Data-pipeline		Project05-Data-pipeline
images		images
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Engineer-Nanodegree-Projects-Udacity

Course 1: Data Modeling

Introduction to Data Modeling

Relational Data Models

NoSQL Data Models

Project 1: Data Modeling with Postgres and Apache Cassandra

Course 2: Cloud Data Warehouses

Introduction to the Data Warehouses

Introduction to the Cloud with AWS

Implementing Data Warehouses on AWS

Project 2: Data Infrastructure on the Cloud

Course 3: Data Lakes with Spark

The Power of Spark

Data Wrangling with Spark

Debugging and Optimization

Introduction to Data Lakes

Project 3: Big Data with Spark

Course 4: Automate Data Pipelines

Data Pipelines

Data Quality

Production Data Pipelines

Project 4: Data Pipelines with Airflow

About

Releases

Packages

Languages

kroudir/Data-Engineer-Nanodegree-Projects-Udacity

Folders and files

Latest commit

History

Repository files navigation

Data-Engineer-Nanodegree-Projects-Udacity

Course 1: Data Modeling

Introduction to Data Modeling

Relational Data Models

NoSQL Data Models

Project 1: Data Modeling with Postgres and Apache Cassandra

Course 2: Cloud Data Warehouses

Introduction to the Data Warehouses

Introduction to the Cloud with AWS

Implementing Data Warehouses on AWS

Project 2: Data Infrastructure on the Cloud

Course 3: Data Lakes with Spark

The Power of Spark

Data Wrangling with Spark

Debugging and Optimization

Introduction to Data Lakes

Project 3: Big Data with Spark

Course 4: Automate Data Pipelines

Data Pipelines

Data Quality

Production Data Pipelines

Project 4: Data Pipelines with Airflow

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages