Created by Juliet Houghland + Sandy Ryza (juliet@cloudera.com)
The source notebook demonstrates building a churn prediction model using Spark
and Spark MlLib's pipeline API for cross validation and model tuning. The Pipeline API is available in PySpark in version 1.6 or higher.
Status: Demo Ready
Use Case: Telco Churn Prediction
Steps:
- Open a terminal and run setup.sh
- Create a Python Session and run setup.py
- In your python session run ds-for-telco.py
- When finished, run cleanup.sh in the terminal
Recommended Session Sizes: 2 CPU, 4 GB RAM
Estimated Runtime:
ds-for-telco.py --> approx 1 min
Recommended Jobs/Pipeline:
None
Demo Script
TBD
Related Content:
http://blog.cloudera.com/blog/2016/02/how-to-predict-telco-churn-with-apache-spark-mllib/