xgboost in python and pyspark (using py4j to call jvm-packages)
xgboost4j version: 0.82
TODO: xgboost4j is not the latest version since 0.90 only supports
python3
andspark 2.4
- download
xgboost4j-0.82
jar files from xgboost-jars - copy to
pyspark_xgb/jars
- rename to
xgboost4j-0.82.jar
andxgboost4j-spark-0.82.jar
respectively - set your
SPARK_HOME
andJAVA_HOME
inpyspark/start.sh
- [opt] change spark-submit parameters if needed
python version 2.7
- binary logistic
python python_xgb/train_binary.py
- multi classification
python python_xgb/train_multi.py
spark version 2.3.*
- binary logistic
pyspark_xgb/start.sh train_binary.py
- multi classification
pyspark_xgb/start.sh train_multi.py
run the program within docker
it takes some time to build the images ...
cd docker
docker build -t xgb:latest . --no-cache
docker run -i -t xgb:latest /bin/bash
cd xgboost-python-pyspark