Skip to content

Commit

Permalink
[DPE-2438] JupyterLab Spark integration (#43)
Browse files Browse the repository at this point in the history
  • Loading branch information
deusebio authored Oct 4, 2023
1 parent 0446775 commit 55b1279
Show file tree
Hide file tree
Showing 3 changed files with 43 additions and 4 deletions.
18 changes: 17 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,12 +63,28 @@ For more information about spark-client API and `spark8t` tooling, please refer

### Starting Pebble services

Charmed Spark Rock Image is delivered with Pebble already included in order to manage services. If you want to start a service, use the `\; start <service-name>` prefix, e.g.
Charmed Spark Rock Image is delivered with Pebble already included in order to manage services. If you want to start a service, use the `\; start <service-name>` prefix.

#### Starting History Server

```shell
docker run ghcr.io/canonical/charmed-spark:3.4.1-22.04_edge \; start history-server
```

#### Starting Jupyter Notebook

```shell
docker run \
-v /path/to/kube/config:/var/lib/spark/.kube/config \
-p <port>:8888
ghcr.io/canonical/charmed-spark:3.4.1-22.04_edge \
\; --args jupyter --username <spark-service-account> --namespace <spark-namespace> \
\; start jupyter
```

Make sure to have created the `<spark-service-account>` in the `<spark-namespace>` with the `spark8t` CLI beforehand.
You should be able to access the jupyter server at `http://0.0.0.0:<port>`.

## Developers and Contributing

Please see the [CONTRIBUTING.md](https://github.com/canonical/charmed-spark-rock/blob/3.4-22.04/edge/CONTRIBUTING.md) for guidelines and for developer guidance.
Expand Down
27 changes: 25 additions & 2 deletions rockcraft.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ environment:
SPARK_HOME: /opt/spark
SPARK_CONFS: /etc/spark8t/conf
JAVA_HOME: /usr/lib/jvm/java-11-openjdk-amd64
PYTHONPATH: /opt/spark/python:/opt/spark8t/python/dist:/usr/lib/python3/dist-packages
PYTHONPATH: /opt/spark/python:/opt/spark8t/python/dist:/usr/lib/python3.10/site-packages
PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/spark:/opt/spark/bin:/opt/spark/python/bin:/opt/spark-client/python/bin
HOME: /var/lib/spark
KUBECONFIG: /var/lib/spark/.kube/config
Expand Down Expand Up @@ -45,6 +45,14 @@ services:
startup: disabled
environment:
SPARK_PROPERTIES_FILE: /etc/spark8t/conf/spark-defaults.conf
jupyter:
command: "spark-client.pyspark [ --username spark --namespace spark ]"
summary: "This is the Spark-powered Jupyter service"
override: replace
startup: disabled
environment:
PYSPARK_DRIVER_PYTHON: jupyter
PYSPARK_DRIVER_PYTHON_OPTS: "lab --no-browser --port=8888 --ip=0.0.0.0 --NotebookApp.token='' --notebook-dir=/var/lib/spark/notebook"

parts:
spark:
Expand Down Expand Up @@ -116,7 +124,6 @@ parts:

spark8t:
plugin: nil
after: [ hadoop ]
build-packages:
- wget
- ssl-cert
Expand All @@ -126,9 +133,24 @@ parts:
overlay-script: |
mkdir -p $CRAFT_PART_INSTALL/opt/spark8t/python/dist
pip install --target=${CRAFT_PART_INSTALL}/opt/spark8t/python/dist https://github.com/canonical/spark-k8s-toolkit-py/releases/download/v0.0.1/spark8t-0.0.1-py3-none-any.whl
rm usr/bin/pip*
stage:
- opt/spark8t/python/dist

jupyter:
plugin: python
source: .
python-packages:
- jupyterlab
stage-packages:
- python3-venv
organize:
lib: usr/lib
bin: usr/bin
share: usr/share
stage:
- usr

kubectl:
plugin: nil
build-packages:
Expand Down Expand Up @@ -206,5 +228,6 @@ parts:
chmod -R 750 opt/spark
mkdir -p var/lib/spark
mkdir -p var/lib/spark/notebook
chown -R ${SPARK_GID}:${SPARK_UID} var/lib/spark
chmod -R 770 var/lib/spark
2 changes: 1 addition & 1 deletion tests/integration/resources/testpod.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ spec:
- image: ghcr.io/canonical/test-charmed-spark:3.4.1
name: spark
ports:
- containerPort: 18080
- containerPort: 18080

0 comments on commit 55b1279

Please sign in to comment.