Skip to content

Commit

Permalink
SparkMonitor: update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
diocas committed Oct 16, 2020
1 parent c327cad commit 063bdd0
Showing 1 changed file with 44 additions and 6 deletions.
50 changes: 44 additions & 6 deletions SparkMonitor/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# SparkMonitor

SparkMonitor is an extension for Jupyter Notebook that enables the live monitoring of Apache Spark Jobs spawned from a notebook. The extension provides several features to monitor and debug a Spark job from within the notebook interface itself.
SparkMonitor is an extension for Jupyter that enables the live monitoring of Apache Spark Jobs spawned from a notebook. The extension provides several features to monitor and debug a Spark job from within the notebook interface itself.

![jobdisplay](https://user-images.githubusercontent.com/6822941/29753710-ff8849b6-8b94-11e7-8f9c-bdc59bf72143.gif)

It was originally developed as part of Google Summer of Code by @krishnan-r. The original repo can be seen here: https://github.com/krishnan-r/sparkmonitor
It was originally developed as part of Google Summer of Code by [@krishnan-r](https://github.com/krishnan-r). The original repo can be seen here: https://github.com/krishnan-r/sparkmonitor


This extension is composed of a Python package named `sparkmonitor`, which installs the nbextension, Kernel extension and a NPM package named `@swan-cern/sparkmonitor` for the JupyterLab extension.
This extension is composed of a Python package named `sparkmonitor`, which installs the nbextension, Kernel extension and a NPM package named `@swan-cern/sparkmonitor` for the JupyterLab extension (still under development).


## Requirements
Expand All @@ -30,18 +30,56 @@ jupyter lab build
```

To enable the Kernel extension, create the default profile configuration files (Skip if config file already exists) and configure the kernel to load the extension on startup. This is added to the configuration files in users home directory.

```bash
ipython profile create
echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >> $(ipython profile locate default)/ipython_kernel_config.py
```

## Configuration
By default the Spark Web UI runs on `localhost:4040`. If this is not the case, setting the environment variable `SPARKMONITOR_UI_HOST` and `SPARKMONITOR_UI_PORT` overrides the default Spark UI hostname `localhost` and port 4040 used by the Spark UI proxy.
<!-- ## Configuration
By default the Spark Web UI runs on `localhost:4040`. If this is not the case, setting the environment variable `SPARKMONITOR_UI_HOST` and `SPARKMONITOR_UI_PORT` overrides the default Spark UI hostname `localhost` and port 4040 used by the Spark UI proxy. -->

## Usage

To use the extension, it is necessary to set the monitor in the Spark configuration, like so:

```conf
spark.extraListeners = sparkmonitor.listener.JupyterSparkMonitorListener
# Pick one of the following:
# For Spark 2
park.driver.extraClassPath = /usr/local/lib/sparkmonitor/listener_2.11.jar #lives inside the sparkmonitor module
# For Spark 3
park.driver.extraClassPath = /usr/local/lib/sparkmonitor/listener_2.12.jar #lives inside the sparkmonitor module
```

To ease the configuration, and if the kernel extension is correctly installed, you should have the variable `swan_spark_conf` available from inside your notebook with everything already set.
To use it, just configure SparkContext like so:

```python
SparkContext.getOrCreate(conf=swan_spark_conf)
```

Complete example:

```python
from pyspark import SparkContext
sc = SparkContext.getOrCreate(conf=swan_spark_conf) #Start the spark context
rdd = sc.parallelize([1, 2, 4, 8])
rdd.count()
```

## Troubleshoot

If you are not seeing the frontend, check if it's installed:
Check if the server and nb extension are correctly installed:
```bash
jupyter nbextension list
jupyter serverextension list
```

If the problem is with the kernel extension, check the logs to see if it was loaded or if there was any problem with the ipython profile.

If you are not seeing the frontend JupyterLab extension, check if it's installed:

```bash
jupyter labextension list
Expand Down

0 comments on commit 063bdd0

Please sign in to comment.