How to use HDFS/Spark Workbench

To start an HDFS/Spark Workbench:

    docker-compose up -d

docker-compose does not work to scale up spark-workers, for distributed setup see swarm folder

Starting workbench with Hive support

Before starting the next command, check that the previous service is running correctly (with docker logs servicename).

docker-compose -f docker-compose-hive.yml up -d namenode hive-metastore-postgresql
docker-compose -f docker-compose-hive.yml up -d datanode hive-metastore
docker-compose -f docker-compose-hive.yml up -d hive-server
docker-compose -f docker-compose-hive.yml up -d spark-master spark-worker spark-notebook hue

Interfaces

Namenode: http://localhost:50070
Datanode: http://localhost:50075
Spark-master: http://localhost:8080
Spark-notebook: http://localhost:9001
Hue (HDFS Filebrowser): http://localhost:8088/home

Important

When opening Hue, you might encounter NoReverseMatch: u'about' is not a registered namespace error after login. I disabled 'about' page (which is default one), because it caused docker container to hang. To access Hue when you have such an error, you need to append /home to your URI: http://docker-host-ip:8088/home

Docs

Motivation behind the repo and an example usage @BDE2020 Blog

Count Example for Spark Notebooks

val spark = SparkSession
  .builder()
  .appName("Simple Count Example")
  .getOrCreate()

val tf = spark.read.textFile("/data.csv")
tf.count()

Maintainer

Ivan Ermilov @earthquakesan

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
example		example
swarm		swarm
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
docker-compose-hive.yml		docker-compose-hive.yml
docker-compose.yml		docker-compose.yml
hadoop-hive.env		hadoop-hive.env
hadoop.env		hadoop.env
scale-up-spark-worker.sh		scale-up-spark-worker.sh
start-hadoop-spark-workbench-with-Hive.sh		start-hadoop-spark-workbench-with-Hive.sh
start-hadoop-spark-workbench.sh		start-hadoop-spark-workbench.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How to use HDFS/Spark Workbench

Starting workbench with Hive support

Interfaces

Important

Docs

Count Example for Spark Notebooks

Maintainer

About

Releases

Packages

Languages

GarrettLab-UF/docker-hadoop-spark-workbench

Folders and files

Latest commit

History

Repository files navigation

How to use HDFS/Spark Workbench

Starting workbench with Hive support

Interfaces

Important

Docs

Count Example for Spark Notebooks

Maintainer

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages