Author: Mahmoud Parsian (mahmoud.parsian@yahoo.com)
- This book is about PySpark (Python API for Spark)
- Introductory book on how to solve data problems using PySpark
- Learn how to use mappers, filters, and reducers
- Learn how to partition data for fast queries
- Learn how to use the
mapPartitions()
transformation - Learn how to use
reduceByKey()
,groupByKey()
, andcombineByKey()
transformations - Learn how to use Spark's transformations and actions for solving real problems
- Learn how to use RDDs and DataFrames
- Learn how to read/write data from many data sources
- Learn how to use Logistic regression
- Learn how to use Spark's reduction transformations
- Learn how to use GraphFrames
- Learn how to use Motifs in GraphFrames
- Learn how to use Monoids in MapReduce algorithms
chap01: Introduction to PySpark
chap02: Hello World
chap03: Data Abstractions
chap04: Getting Started -- Sample Chapter
chap05: Transformations in Spark
chap06: Reductions in Spark
chap07: DataFrames and SQL
chap08: Spark DataSources
chap09: Logistic Regression
chap10: Movie Recommendations
chap11: Graph Algorithms
chap12: Design Patterns and Monoids
Appendix A: How To Install Spark
Appendix B: How to Use Lambda Expressions
Appendix C: Questions And Answers (50+ QA)
chap13: FP-Growth
chap14: LDA
chap15: Linear Regression