Data Analysis using Spark

Disclaimer

This repository contains my submission for the Final Project: Data Analysis using Spark. The original files were provided by the IBM Skills Network as part of the Introduction to Big Data with Spark and Hadoop course on Coursera. I have made modifications to fulfill the project requirements.

Usage

You are welcome to use this repository as a reference or starting point for your own project.
If you choose to fork this repository, please ensure that you comply with the terms of the Apache License and give proper credit to the original authors.

Project Scenario

As a data engineer, I’ve been assigned by our HR department to design a robust data pipeline capable of ingesting employee data in CSV format. My responsibilities include analyzing the data, implementing necessary transformations, and enabling the extraction of valuable insights from the processed data.

Objectives

Create a DataFrame from a CSV file
Define a schema for the data
Perform transformations and actions using Spark SQL

Setup

Install the required libraries using the provided requirements.txt file. The command syntax is:

python3 -m pip install -r requirements.txt

Download the required employees csv file using the terminal command:

wget https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-BD0225EN-SkillsNetwork/data/employees.csv

Learner

Pravin Regismond

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
FinalProject.ipynb		FinalProject.ipynb
LICENSE		LICENSE
README.md		README.md
employees.csv		employees.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Analysis using Spark

Disclaimer

Usage

Project Scenario

Objectives

Setup

Learner

Acknowledgments

About

Languages

License

pregismond/data-analysis-using-spark

Folders and files

Latest commit

History

Repository files navigation

Data Analysis using Spark

Disclaimer

Usage

Project Scenario

Objectives

Setup

Learner

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Languages