E.ON Course - Big Data with PySpark

Description

This project served as the final assignment for the Hands-On Advanced Analytics with Apache Spark course. The training spanned 5 weeks and focused on mastering big data technologies. The project was completed at Fii practic.

Languages and Utilities Used

Python
PySpark
Jupyter Notebook

Implementation Details

The dataset contained approximately 3,549,246 entries.
The primary objective of the project was to clean the dataset, addressing inconsistencies intentionally introduced by our trainers, as well as more realistic inconsistencies.
Upon completion of the cleaning process, we performed data aggregation.

Project Task

For detailed tasks, please refer to the Tasks document.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
Proiect.pdf		Proiect.pdf
README.md		README.md
project.ipynb		project.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

E.ON Course - Big Data with PySpark

Description

Languages and Utilities Used

Implementation Details

Project Task

About

Languages

License

CirsteanPaul/pyspark-project

Folders and files

Latest commit

History

Repository files navigation

E.ON Course - Big Data with PySpark

Description

Languages and Utilities Used

Implementation Details

Project Task

About

Topics

Resources

License

Stars

Watchers

Forks

Languages