Follow the instructions from Get Docker CE for Ubuntu page.
Follow the instructions from Post-installation steps for Linux page.
- Go to your terminal.
- Clone this repository and go inside it
git clone https://github.com/mjaglan/docker-spark-yarn-cluster-mode.git cd docker-spark-yarn-cluster-mode
- Run the following script
# Here, N = number of slave nodes to create (default value is 3). . ./restart-all.sh N
The spark-services.sh is running following commands after starting Hadoop Multi-Node Cluster -
-
Basic Hadoop filesystem information and statistics
Configured Capacity: 37912903680 (35.31 GB) Present Capacity: 11530969088 (10.74 GB) DFS Remaining: 11530944512 (10.74 GB) DFS Used: 24576 (24 KB) DFS Used%: 0.00% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 ------------------------------------------------- Live datanodes (3): ...
-
Java Virtual Machine Process Status Tool (jps)
<pid> <process name> 838 org.apache.spark.deploy.master.Master --host testbed-master --port 7077 --webui-port 8080 142 org.apache.hadoop.hdfs.server.namenode.NameNode 428 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode 579 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
-
Spark Example: org.apache.spark.examples.SparkPi
-
NameNode can be accessed on browser at http://CONTAINER-IP:8088/
-
Resource Manager can be accessed on browser at http://CONTAINER-IP:50070/
-
Secondary can be accessed on browser at http://CONTAINER-IP:50090/
-
Spark Master can be accessed on browser at http://CONTAINER-IP:8080/
Docker version 17.06.0-ce
Ubuntu Trusty 14.04 Host OS
Eclipse IDE for Java EE Developers Oxygen (4.7.0)
Eclipse Docker Tooling 3.1.0