First, go to your GitHub profile settings and make sure you have an SSH key uploaded.
Next, go to the GitHub repo of the image that you'd like to work on and create
a fork. To do this, click on the fork
button and then Create fork
.
After you create your fork of the new image repository, you should disable
GitHub Actions only for your fork. To do this, navigate to Settings
-->
Actions
--> General
and select Disable actions
. Then click Save
:
Now clone the primary image repo on your local device. You can get the URL to do
this by clicking on the green Code
button in the primary image repo (not your fork)
and clicking on SSH
and copying the text located therein:
Now cd
in to a handy folder on your device, and clone the primary repo by
running the following command (replace <image-name>
with the actual name
of the image):
git clone git@github.com:berkeley-dsep-infra/<image-name>.git
Now cd
in to <image-name>
and set up your local repo to point both at the primary
image repo (upstream
) and your fork (origin
). After the initial clone,
origin
will be pointing to the main repo and we'll need to change that.
cd <image-name>
git remote rename origin upstream # rename origin to upstream
git remote add origin git@github.com:<your github username>/<image-name>.git # add your fork as origin
To confirm these changes, run git remote -v
and see if everything is correct:
$ cd <image-name>
$ git remote -v # confirm the settings
origin git@github.com:<your github username>/<image-name>.git (fetch)
origin git@github.com:<your github username>/<image-name>.git (push)
upstream git@github.com:berkeley-dsep-infra/<image-name>.git (fetch)
upstream git@github.com:berkeley-dsep-infra/<image-name>.git (push)
Now you can sync your local repo from upstream
, and push those changes to your
fork (origin
):
git checkout main && \
git fetch --prune --all && \
git rebase upstream/main && \
git push origin main
%% State diagram documentation at
%% https://mermaid.js.org/syntax/stateDiagram.html
stateDiagram-v2
image_repo: github.com/berkeley-dsep-infra/hubname-user-image
user_repo: github.com/username/hubname-user-image
image_test_build: Image is built and tested
image_push_build: Image is built and pushed to registry
pr_created: A pull request is automatically<br/>created in the Datahub repo
deploy_to_staging: Hub is deployed to staging
contributor_tests: The contributor logs into the<br/>staging hub and tests the image.
deploy_to_prod: Hub is deployed to prod
image_repo --> user_repo: Contributor forks the image repo.
user_repo --> image_repo: Contributor creates a PR.
image_repo --> image_test_build
image_test_build --> image_push_build: Test build passes and Datahub staff merge pull request
image_push_build --> pr_created
pr_created --> deploy_to_staging: Datahub staff review and merge to staging
deploy_to_staging --> contributor_tests
contributor_tests --> deploy_to_prod: Datahub staff create a PR to merge to prod
These steps will be explained in detail below.
When developing for this deployment, always work in a fork of this repo. You should also make sure that your repo is up-to-date with this one prior to making changes. This is because other contributors may have pushed changes after you last synced with this repo but before you upstreamed your changes.
git checkout main && \
git fetch --prune --all && \
git rebase upstream/main && \
git push origin main
To create a new feature branch and switch to it, run the following command:
git checkout -b <branch name>
After you make your changes, you can use the following commands to see
what's been modified and check out the diffs: git status
and git diff
.
You need to use repo2docker to build and use/test the image on your own device before you push and create a pull request. It's better (and typically faster) to do this first before using CI/CD. There's no need to waste GitHub Action minutes to test build images when you can do this on your own device!
Run repo2docker
from inside the cloned image repo. To run on a linux/WSL2
or regular linux shell:
repo2docker . # <--- the path to the repo
If you are using an ARM CPU (Apple M* silicon), you will need to run
jupyter-repo2docker
with the following arguments:
jupyter-repo2docker --user-id=1000 --user-name=jovyan \
--Repo2Docker.platform=linux/amd64 \
--target-repo-dir=/home/jovyan/.cache \
-e PLAYWRIGHT_BROWSERS_PATH=/srv/conda \
. # <--- the path to the repo
If you just want to see if the image builds, but not automatically launch the
server, add --no-run
to the arguments (before the final .
).
When you're ready to push these changes, first you'll need to stage them for a commit:
git add <file1> <file2> ...
Commit these changes locally:
git commit -m "some pithy commit description"
Now push to your fork:
git push origin <branch name>
Once you've pushed to your fork, you can go to the image repo and there should
be a big green button on the top that says Compare and pull request
.
Click on that, check out the commits and file diffs, edit the title and
description if needed and then click Create pull request
.
If you're having issues, you can refer to the GitHub documentation for pull
requests.
Keep the choice for base
in the GitHub PR user interface, while the choice
for head
is your fork.
Once this is complete and if there are no problems, a GitHub action will automatically build and test the image. If this fails, please check the output of the workflow in the action, and make any changes required to get the build to pass.
Once the image build has completed successfully, you can request that someone review the PR before merging, or you can merge yourself if you are confident. This merge will trigger a second giuthub workflow that builds the image again, pushes it to the appropriate location in our Google Artifact Registry and finally creates a pull request in the Datahub. This pull request updates the hash used to identify the freshly built image.
Let the Datahub staff know that the pull request was created, and they will
review and merge it into the staging
branch. You can notify them via a
corresponding GitHub Issue, or on the UCTech #datahubs slack channel.
Once it's been merged to staging
, it will automatically deploy the new image
to the hub's staging environment in a few minutes and you'll be able to test it
out!
When you're happy with the results, let Datahub staff know, and they will
create another pull request to deploy the new image to prod
.