Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docker] Add jupyter based demo of single colo batch push workflow #1271

Merged
merged 8 commits into from
Nov 4, 2024

Conversation

ZacAttack
Copy link
Contributor

@ZacAttack ZacAttack commented Oct 31, 2024

[docker] Add jupyter based demo of single colo batch push workflow

This adds a new interactive demo and explanation of a typical workflow of Venice batch push. It walks through downloading a dataset from hugging face, using spark to conver the parquet file format to avro, preparing a Venice store, and then using Spark and VPJ to push to the Venice cluster, all from the jupyter notebook.

To run this demo do the following:

Build the Image
From the repository root directory run:
./docker/build-venice-docker-images.sh

Run and compose the containers
Once that's done, you'll need to compose the images with:

docker compose -f ./docker/docker-compose-single-dc-setup.yaml up -d

Connect to jupyter
Depending on your environment you'll need to access the running logs of the venice-client-jupyter container. If you have the Docker Desktop app you can navigate to the running container list and click on the venice-client-jupyter container. From the log view you'll see a link that looks something like:

http://127.0.0.1:8888/lab?token=<some token string>

Open this link in your browser and you'll be treated to the jupyter notebook UI. From the file explorer on the left double click on the file called Venice_Demo.ipynb. From there, you can read and run the tutorial. Have fun!

Resolves #XXX

How was this PR tested?

Does this PR introduce any user-facing changes?

  • No. You can skip the rest of this section.
  • Yes. Make sure to explain your proposed changes and call out the behavior change.

This adds a new interactive demo and explanation of a typical workflow of Venice batch push. It walks through
downloading a dataset from hugging face, using spark to conver the parquet file format to avro, preparing a
Venice store, and then using Spark and VPJ to push to the Venice cluster, all from the jupyter notebook
Copy link
Contributor

@sushantmane sushantmane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, and it works perfectly—thanks so much, @ZacAttack! I left a few comments, and once those are addressed, it’s good to go. 🚀

docker/venice-client-jupyter/Venice_Demo.ipynb Outdated Show resolved Hide resolved
docker/venice-client-jupyter/Venice_Demo.ipynb Outdated Show resolved Hide resolved
docker/docker-compose-single-dc-setup.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@sushantmane sushantmane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks, @ZacAttack!

@ZacAttack ZacAttack merged commit 36d482e into linkedin:main Nov 4, 2024
45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants