Create a local but iso to production environnement to be as autonomous as possible while working on spark projects.
this project contain all needed configuration files to create :
- Dockerized environnement
- Local but a real distributed environnement
- 1 Namenode
- 1 Datanode (to increase as you wish)
- Yarn resource manager
- 3 Yarn node managers
- Yarn hitory server
- Spark history
- Spark shell
- Line up with exact Hadoop components version on production
- Deployment to dockerized cluster via sbt command line
- Mount data to hdfs via docker volumes from withing project folder
- Access spark history webui for inspection :)
- Access Yarn logs for debugging :)
- Access to Spark shell for fiddling :)
echo "127.0.0.1 namenode datanode resourcemanager nodemanager nodemanager-1 nodemanager-2 nodemanager-3 historyserver spark-master spark-worker spark-history" >> /etc/hosts
# start up the cluster if already has been built
docker-compose up -d
# Load dev data placed in the data directory into hdfs
docker exec -it namenode bash /scripts/hdfs-loader.sh
sbt
;clean;reload;compile;docker;dockerComposeUp
docker exec -it spark-shell /spark/bin/spark-shell
chrome|firefox http://localhost:8188
chrome|firefox http://localhost:18080
chrome|firefox http://localhost:9870
docker stop $(docker ps -a -q) && docker rm $(docker ps -a -q) && docker volume prune -f && docker network prune -f