Install Ubuntu On Windows Using WSL

Seach Windows PowerShell in Windows search bar, then select Run as administrator
To install WSL in the command prompt, run:

  wsl --install

After WSL has been installed, restart the laptop
Go to Microsoft Store > Search Ubuntu > Download Ubuntu

Configure Ubuntu

After Ubuntu has been installed, open Ubuntu
To install latest updates, run:

  sudo apt upgrade

Setup Local Docker Hadoop Cluster

Prerequisites: Docker and Docker Compose

In the command prompt, change the current working directory to the location where you want the cloned directory to be
To clone the repository, run:

  git clone https://github.com/hanashah-01/docker-hadoop-with-python-mapreduce.git

Change the directory to where docker-compose.yml is at. In this case, type 'cd docker-hadoop'
To start the docker containers, run:

  docker-compose up -d

To confirm the availability of containers, run:

  docker ps

Running Python MapReduce function

To access the container of Hadoop cluster's namenode, run:

  docker exec -it namenode bash

To create folder structure in HDFS to allocate files, run:

  hdfs dfs -l/

  hdfs dfs -mkdir -p /user/root

Exit the container. Then, to move the input file, mapper.py and reducer.py to namenode, run:

  docker cp input namenode:/tmp

  docker cp mapper.py namenode:/tmp

  docker cp reducer.py namenode:/tmp

Get in namenode container again. To create the input folder, run:

  hdfs dfs -mkdir /user/root/input

Change directory to /tmp
To move the input files to the input folder, run:

  hdfs dfs -put input/* /user/root/input

Find the path to the JAR file. To locate the hadoop string library JAR file, run:

  find / -name 'hadoop-streaming*.jar'

To run the MapReduce program, run:

  hadoop jar /opt/hadoop-3.2.1/share/hadoop/tools/lib/hadoop-streaming-3.2.1.jar -files mapper.py -mapper mapper.py -file reducer.py -reducer reducer.py -input /user/root/input/* -output /user/root/output

Python Configuration In Container

This must be done to run the MapReduce Python program

To install python in each container, run:

  docker exec -it namenode bash -c "apt update && apt install python -y"

  docker exec -it datanode bash -c "apt update && apt install python -y"

  docker exec -it resourcemanager bash -c "apt update && apt install python -y"

  docker exec -it nodemanager bash -c "apt update && apt install python -y"

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
base		base
datanode		datanode
historyserver		historyserver
input		input
namenode		namenode
nginx		nginx
nodemanager		nodemanager
python mapreduce		python mapreduce
resourcemanager		resourcemanager
submit		submit
Makefile		Makefile
README.md		README.md
datanode-deployment.yaml		datanode-deployment.yaml
docker-compose-v3.yml		docker-compose-v3.yml
docker-compose.yml		docker-compose.yml
hadoop-datanode-persistentvolumeclaim.yaml		hadoop-datanode-persistentvolumeclaim.yaml
hadoop-env-configmap.yaml		hadoop-env-configmap.yaml
hadoop-historyserver-persistentvolumeclaim.yaml		hadoop-historyserver-persistentvolumeclaim.yaml
hadoop-namenode-persistentvolumeclaim.yaml		hadoop-namenode-persistentvolumeclaim.yaml
hadoop.env		hadoop.env
historyserver-deployment.yaml		historyserver-deployment.yaml
namenode-deployment.yaml		namenode-deployment.yaml
namenode-service.yaml		namenode-service.yaml
nodemanager1-deployment.yaml		nodemanager1-deployment.yaml
resourcemanager-deployment.yaml		resourcemanager-deployment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Install Ubuntu On Windows Using WSL

Configure Ubuntu

Setup Local Docker Hadoop Cluster

Prerequisites: Docker and Docker Compose

Running Python MapReduce function

Python Configuration In Container

This must be done to run the MapReduce Python program

About

Releases

Packages

Languages

hanashah-01/Docker-Hadoop-With-Python-Mapreduce

Folders and files

Latest commit

History

Repository files navigation

Install Ubuntu On Windows Using WSL

Configure Ubuntu

Setup Local Docker Hadoop Cluster

Prerequisites: Docker and Docker Compose

Running Python MapReduce function

Python Configuration In Container

This must be done to run the MapReduce Python program

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages