GitHub - jeantardelli/data-engineering-with-python: Here I will be exploring various tools and methods that are used in data engineering process with Python.

Data Engineering w/ Python

This repo contains my code and pipelines explained on Data Engineering with Python book.

Software and Hardware List

Software required	OS used
Python 3.x, Spark 3.x, Nifi 1.x, MySQL 8.0.x, Elasticsearch 7.x, Kibana 7.x, Apache Kafka 2.x	Linux (any distro)

Directories

airflow-dag: this directory contains the airflow DAG modules used in this repo
great_expectations: contains all the important components of a local Great Expectation deployment
kafka-producer-consumer: contains modules that produce and consume Kafka topics in Python
load-database: this directory contains modules that load and query data from MySQL
load-nosql: this directory contains modules that load and query data from Elasticsearch
nifi-datalake: this directory contains Nifi Pipelines to simulate reading data from the datalike
nifi-files: this directory contains the files derived from the Nifi template pipelines
nifi-scanfiles: this directory contains dictionary files read by ScanContent processor (e.g. VIP)
nifi-scripts: this directory contains shell scripts that are used with ExecuteStreamCommand in Nifi
nifi-templates: this directory contains different Apache Nifi pipeline templates
nifi-versioning: this directory contains Nifi pipelines with version control (NiFi Regsitry)
pyspark: this directory contains Jupyter Notebooks that connect to PySpark data processor
scooter-data: this directory contains the scooter dataset and wrangling data modules (pandas)
sql-user: this directory contains the query to create a user and its credentials data
writing-reading-data: this directory contains modules that create and read fake data

Setup working environment

To setup the working environment run the command:

$ source start-working-environment

If you want to stop/kill the working environment, run the command:

$ ./stop-working-environment

Creating DB user

To create the MySQL user, run the following statement as the root user:

$ mysql -u root -p -e "SOURCE sql-user/create-user.sql"

This will also grant access to the databases used in this repo.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering w/ Python

Software and Hardware List

Directories

Setup working environment

Creating DB user

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
airflow-dag		airflow-dag
great_expectations		great_expectations
kafka-producer-consumer		kafka-producer-consumer
load-database		load-database
load-nosql		load-nosql
nifi-datalake		nifi-datalake
nifi-files		nifi-files
nifi-scanfiles		nifi-scanfiles
nifi-scripts		nifi-scripts
nifi-templates		nifi-templates
nifi-versioning		nifi-versioning
pyspark		pyspark
scooter-data		scooter-data
sql-user		sql-user
writing-reading-data		writing-reading-data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
rest-api.py		rest-api.py
start-working-environment		start-working-environment
stop-working-environment		stop-working-environment

License

jeantardelli/data-engineering-with-python

Folders and files

Latest commit

History

Repository files navigation

Data Engineering w/ Python

Software and Hardware List

Directories

Setup working environment

Creating DB user

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages