An Debian vagrant box with Hadoop
- Install a git client (for example git SCM) and clone this repository:
git clone https://github.com/SergioSim/debian-hadoop.git
- Install virtualbox 7.0.10 with the extension pack 7.0.10
- Install vagrant
Install & start the vagrant box using VirtualBox:
Navigate to the debian-hadoop directory and run the following command:
$ vagrant up
Install & start the vagrant box using Docker:
$ vagrant up --provider docker
Connect to the vagrant box (VM):
$ vagrant ssh
Check the vagrant box (VM) status:
$ vagrant status
You can also check the status of all vagrant boxes you have installed with
vagrant global-status
.
Power-off the vagrant box:
$ vagrant halt
Remove the vagrant box from disk:
$ vagrant destroy
Start jupyter notebook:
$ jupyter notebook --ip=0.0.0.0
Once you have installed, started and connected to the vagrant box.
Check if Hadoop processes (HDFS & YARN) are running:
vagrant@bullseye:~$ jps
The command should output something like:
4416 DataNode
4978 NodeManager
4580 SecondaryNameNode
5351 Jps
4873 ResourceManager
4287 NameNode
If no Hadoop processes show up, try to start Hadoop:
vagrant@bullseye:~$ start-dfs.sh
vagrant@bullseye:~$ start-yarn.sh
Symptom: the vagrant up
command succeeded but Hadoop processes didn't start.
If you are running for the first time:
Try to reinstall the vagrant box:
vagrant up --provision
Else, if it was working previously - try to start Hadoop.
Symptom: the vagrant ssh
command fails with Permission denied (publickey)
.
- Get the location of your vagrant ssh private key:
vagrant ssh-config
Then copy the path of the IdentityFile - Run the following command instead of
vagrant ssh
to connect to the VM:ssh -i path/to/identity/file/private_key -p 2222 vagrant@127.0.0.1
To apply changes done in config/vagrant
- run the prerequisites
provisioner:
vagrant provision --provision-with prerequisites
To apply changes done in config/hadoop
- run the install_hadoop
provisioner:
vagrant provision --provision-with install_hadoop
Warning: this will format HDFS (removing all files on HDFS)