You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repo contains implementations of PySpark for real-world use cases for batch data processing, streaming data processing sourced from Kafka, sockets, etc., spark optimizations, business specific bigdata processing scenario solutions, and machine learning use cases.
This is technically a RESTful API, but using PySpark module instead of the restful module! In this case, this is a template using PySpark for website development!
Designing and the implementation of different Spark applications to accomplish different jobs used to analyze a dataset on Covid-19 disease created by Our World In Data.
Python scripts utilizing the PySpark API to convert a huge data set (about 3.5 GB) of flight data into various data storage formats such as CSV, JSON, Sequence file system