Skip to content

prasadpatil99/PySpark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

PySpark

PySpark is the Python API written in python to support Apache Spark where as Apache Spark is a distributed framework that can handle Big Data analysis in a parallel fashion. Pyspark is faster than python's library pandas and has many features like processing data with SQL as well HiveQL, parallel processing on clusters with RDD & many more.

Dependencies

$ pip install pyspark

Author

  • Prasad Patil