Skip to content

Commit

Permalink
Add GCS as a storage solution
Browse files Browse the repository at this point in the history
  • Loading branch information
chiayi committed Dec 1, 2023
1 parent 7048aad commit 2090fd4
Show file tree
Hide file tree
Showing 10 changed files with 242 additions and 23 deletions.
10 changes: 10 additions & 0 deletions gke-platform/modules/gke_standard/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,16 @@ resource "google_container_cluster" "ml_cluster" {
workload_pool = "${var.project_id}.svc.id.goog"
}

addons_config {
gcp_filestore_csi_driver_config {
enabled = true
}

gcs_fuse_csi_driver_config {
enabled = true
}
}

release_channel {
channel = "RAPID"
}
Expand Down
38 changes: 22 additions & 16 deletions jupyter-on-gke/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,17 +32,13 @@ Preinstall the following on your computer:
1. If needed, git clone https://github.com/GoogleCloudPlatform/ai-on-gke

2. Build the Jupyterhub Image following [README](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/jupyter-on-gke/authentication/README.MD). This is an important step for Authentication. (Currently Enabled By Default)

3. Once the image is built, navigate to `ai-on-gke/jupyter-on-gke/`

4. Edit `variables.tf` with your GCP settings. The `<your user name>` that you specify will become a K8s namespace for your Jupyterhub services. For more information about what the variabls do visit [here](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/jupyter-on-gke/variable_definitions.md)
2. Edit `variables.tf` with your GCP settings. The `<your user name>` that you specify will become a K8s namespace for your Jupyterhub services. For more information about what the variabls do visit [here](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/jupyter-on-gke/variable_definitions.md)
**Important Note:**
If using this with the Ray module (`ai-on-gke/ray-on-gke/`), it is recommended to use the same k8s namespace
for both i.e. set this to the same namespace as `ai-on-gke/ray-on-gke/user/variables.tf`.
If not, set `enable_create_namespace` to `true` so a new k8s namespace is created for the Jupyter resources.

5. If you have not enabled the IAP API before or created a Brand for your project, you can proceed to the next step. If not, ensure that the following variables within `variables.tf` are set:
3. If you have not enabled the IAP API before or created a Brand for your project, you can proceed to the next step. If not, ensure that the following variables within `variables.tf` are set:

* enable_iap_service - Enables the IAP service API. Leave as false if IAP is enabled before.
* brand - creates a brand for the project, only one is currently allowed per project. If there is already a brand, leave the variable empty.
Expand All @@ -60,9 +56,9 @@ If not, set `enable_create_namespace` to `true` so a new k8s namespace is create

![IAP API Screen](./images/consent_screen_screenshot.png)

5. Run `terraform init`
4. Run `terraform init`

6. Edit the `./allowlist` file to set the application users allowlist for Jupyterhub. These are the kinds of principals you can have:
5. Edit the `./allowlist` file to set the application users allowlist for Jupyterhub. These are the kinds of principals you can have:

* allUsers
* allAuthenticatedUsers
Expand All @@ -76,21 +72,21 @@ If not, set `enable_create_namespace` to `true` so a new k8s namespace is create

**Note:** Seperate each principals with a new line

7. Find the name and location of the GKE cluster you want to use.
6. Find the name and location of the GKE cluster you want to use.
Run `gcloud container clusters list --project=<your GCP project> to see all the available clusters.

Note: If you created the GKE cluster via the ai-on-gke/gke-platform repo, you can get the cluster info from `ai-on-gke/gke-platform/variables.tf`

8. Run `gcloud container clusters get-credentials %gke_cluster_name% --location=%location%`
7. Run `gcloud container clusters get-credentials %gke_cluster_name% --location=%location%`
Configuring `gcloud` [instructions](https://cloud.google.com/sdk/docs/initializing)

9. Run `terraform apply`
8. Run `terraform apply`

## Securing your Jupyter Endpoint

To secure the Jupyter endpoint, this example enables IAP by default. It is _strongly recommended_ to keep this configuration. If you wish to disable it, do the following: set the `add_auth` flag to false in the `variable.tf` file.

10. After installing Jupyterhub, you will need to retrieve the name of the backend-service from GCP using the following command:
9. After installing Jupyterhub, you will need to retrieve the name of the backend-service from GCP using the following command:

```cmd
gcloud compute backend-services list --project=%PROJECT_ID%
Expand All @@ -102,13 +98,13 @@ To secure the Jupyter endpoint, this example enables IAP by default. It is _stro
gcloud compute backend-services describe SERVICE_NAME --project=%PROJECT_ID% --global
```
11. Once you get the name of the backend-service, replace the variable in the [variables.tf](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/jupyter-on-gke/variables.tf) file.
10. Once you get the name of the backend-service, replace the variable in the [variables.tf](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/jupyter-on-gke/variables.tf) file.
12. Re-run `terraform apply`
11. Re-run `terraform apply`
13. Navigate to the [GCP IAP Cloud Console](https://console.cloud.google.com/security/iap) and select your backend-service checkbox.
12. Navigate to the [GCP IAP Cloud Console](https://console.cloud.google.com/security/iap) and select your backend-service checkbox.
14. Click on `Add Principal`, insert the new principle and select under `Cloud IAP` with role `IAP-secured Web App User`
13. Click on `Add Principal`, insert the new principle and select under `Cloud IAP` with role `IAP-secured Web App User`
> **_NOTE:_** Your managed certificate may take some time to finish provisioning. On average around 10-15 minutes.
Expand All @@ -134,6 +130,16 @@ Continue to Step 3 of [below](#if-auth-is-enabled).
4. Select profile and open a Jupyter Notebook
## Persistent Storage
Currently there are 2 choices for storage:
1. Default Jupyterhub Storage - `pd.csi.storage.gke.io` with reclaim policy `Delete`
2. GCSFuse - `gcsfuse.csi.storage.gke.io` uses GCS Buckets and require users to pre-create buckets with name format `gcsfuse-{username}`
For more information about Persistent storage and the available options, visit [here](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/jupyter-on-gke/storage.md)
## Running GPT-J-6B
This example is adapted from Ray AIR's examples [here](https://docs.ray.io/en/master/ray-air/examples/gptj_serving.html).
Expand Down
Binary file added jupyter-on-gke/images/gcs_bucket.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
80 changes: 78 additions & 2 deletions jupyter-on-gke/jupyter_config/config-selfauth.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
# Available chart versions: https://jupyterhub.github.io/helm-chart/
hub:
image:
name: <Repo name here>
name: us-docker.pkg.dev/ai-on-gke/jupyterhub-authentication-class/jupyter-auth-class
tag: latest
config:
JupyterHub:
Expand Down Expand Up @@ -78,10 +78,17 @@ singleuser:
ephemeral-storage: 10Gi
nodeSelector:
iam.gke.io/gke-metadata-server-enabled: "true"
extraEnv:
JUPYTER_ALLOW_INSECURE_WRITES: "true"
image:
name: jupyter/tensorflow-notebook
tag: python-3.10
startTimeout: 1000
extraAnnotations:
gke-gcsfuse/volumes: "true"
storage:
dynamic:
pvcNameTemplate: claim-{username}
# More info on kubespawner overrides: https://jupyterhub-kubespawner.readthedocs.io/en/latest/spawner.html#kubespawner.KubeSpawner
# profile example:
# - display_name: "Learning Data Science"
Expand All @@ -101,11 +108,57 @@ singleuser:
# - >
# gitpuller https://github.com/data-8/materials-fa17 master materials-fa;
profileList:
- display_name: "Basic"
- display_name: "CPU"
description: "Creates CPU VMs as the compute for notebook execution."
profile_options:
storage:
display_name: "Storage"
choices:
DefaultStorage:
display_name: "DefaultStorage"
kubespawner_override:
default: true
GCSFuse:
display_name: "GCSFuse"
kubespawner_override:
volume_mounts:
- name: gcs-fuse-csi-ephemeral
mountPath: /home/jovyan
volumes:
- name: gcs-fuse-csi-ephemeral
csi:
driver: gcsfuse.csi.storage.gke.io
volumeAttributes:
bucketName: gcsfuse-{username}
mountOptions: "uid=1000,gid=100,o=noexec,implicit-dirs,dir-mode=777,file-mode=777"
node_selector:
iam.gke.io/gke-metadata-server-enabled: "true"
default: true
- display_name: "GPU T4"
description: "Creates GPU VMs (T4) as the compute for notebook execution"
profile_options:
storage:
display_name: "Storage"
choices:
DefaultStorage:
display_name: "DefaultStorage"
kubespawner_override:
default: true
GCSFuse:
display_name: "GCSFuse"
kubespawner_override:
volume_mounts:
- name: gcs-fuse-csi-ephemeral
mountPath: /home/jovyan
volumes:
- name: gcs-fuse-csi-ephemeral
csi:
driver: gcsfuse.csi.storage.gke.io
volumeAttributes:
bucketName: gcsfuse-{username}
mountOptions: "uid=1000,gid=100,o=noexec,implicit-dirs,dir-mode=777,file-mode=777"
node_selector:
iam.gke.io/gke-metadata-server-enabled: "true"
kubespawner_override:
image: jupyter/tensorflow-notebook:python-3.10
extra_resource_limits:
Expand All @@ -116,6 +169,29 @@ singleuser:
cloud.google.com/gke-accelerator: "nvidia-tesla-t4"
- display_name: "GPU A100"
description: "Creates GPU VMs (A100) as the compute for notebook execution"
profile_options:
storage:
display_name: "Storage"
choices:
DefaultStorage:
display_name: "DefaultStorage"
kubespawner_override:
default: true
GCSFuse:
display_name: "GCSFuse"
kubespawner_override:
volume_mounts:
- name: gcs-fuse-csi-ephemeral
mountPath: /home/jovyan
volumes:
- name: gcs-fuse-csi-ephemeral
csi:
driver: gcsfuse.csi.storage.gke.io
volumeAttributes:
bucketName: gcsfuse-{username}
mountOptions: "uid=1000,gid=100,o=noexec,implicit-dirs,dir-mode=777,file-mode=777"
node_selector:
iam.gke.io/gke-metadata-server-enabled: "true"
kubespawner_override:
image: jupyter/tensorflow-notebook:python-3.10
extra_resource_limits:
Expand Down
17 changes: 13 additions & 4 deletions jupyter-on-gke/jupyterhub.tf
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ resource "google_project_service" "project_service" {
disable_on_destroy = false
}

# Creates a "Brand", equivalent to the OAuth consent screen on GCP UI
# Creates a "Brand", equivalent to the OAuth consent screen on Cloud console
resource "google_iap_brand" "project_brand" {
count = var.brand != "" ? 1 : 0
support_email = var.support_email
Expand All @@ -70,7 +70,7 @@ resource "google_iap_brand" "project_brand" {

# Creates the OAuth client used in IAP
resource "google_iap_client" "iap_oauth_client" {
count = var.client_id != "" ? 0 : 1
count = var.client_id != "" ? 0 : 1
display_name = "Jupyter-Client"
brand = "projects/${data.google_project.project.number}/brands/${data.google_project.project.number}"
}
Expand All @@ -93,17 +93,26 @@ module "iap_auth" {
project_id = var.project_id
namespace = var.namespace
service_name = var.service_name
client_id = var.client_id != "" ? var.client_id : google_iap_client.iap_oauth_client[0].client_id
client_secret = var.client_id != "" ? var.client_secret : google_iap_client.iap_oauth_client[0].secret
client_id = var.client_id != "" ? var.client_id : google_iap_client.iap_oauth_client[0].client_id
client_secret = var.client_id != "" ? var.client_secret : google_iap_client.iap_oauth_client[0].secret
url_domain_addr = var.url_domain_addr
url_domain_name = var.url_domain_name

depends_on = [
helm_release.jupyterhub,
kubernetes_namespace.namespace,
module.workload_identity_service_account
]
}

module "workload_identity_service_account" {
source = "./service_accounts_module"

project_id = var.project_id
namespace = var.namespace
service_account = "jupyter-service-account"
}

resource "helm_release" "jupyterhub" {
name = "jupyterhub"
repository = "https://jupyterhub.github.io/helm-chart"
Expand Down
51 changes: 51 additions & 0 deletions jupyter-on-gke/service_accounts_module/service_accounts.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

resource "google_service_account" "sa" {
project = "${var.project_id}"
account_id = "${var.service_account}"
display_name = "Jupyterhub service account"
}

resource "google_service_account_iam_binding" "workload-identity-user" {
service_account_id = google_service_account.sa.name
role = "roles/iam.workloadIdentityUser"

members = [
"serviceAccount:${var.project_id}.svc.id.goog[${var.namespace}/default]",
]
}

resource "google_project_iam_binding" "cloud_role" {
project = var.project_id
for_each = toset([
"roles/storage.admin",
"roles/artifactregistry.reader"
])
role = each.key
members = [
"serviceAccount:${var.project_id}.svc.id.goog[${var.namespace}/default]",
]
}

resource "kubernetes_annotations" "default" {
api_version = "v1"
kind = "ServiceAccount"
metadata {
name = "default"
}
annotations = {
"iam.gke.io/gcp-service-account" = "${google_service_account.sa.account_id}@${var.project_id}.iam.gserviceaccount.com"
}
}
30 changes: 30 additions & 0 deletions jupyter-on-gke/service_accounts_module/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

variable "project_id" {
type = string
description = "GCP project id"
}

variable "namespace" {
type = string
description = "Kubernetes namespace where resources are deployed"
default = "jup"
}

variable "service_account" {
type = string
description = "Google Cloud IAM service account for authenticating with GCP services"
default = "jup-system-account"
}
26 changes: 26 additions & 0 deletions jupyter-on-gke/service_accounts_module/versions.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "4.56.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "2.18.1"
}
}
}
11 changes: 11 additions & 0 deletions jupyter-on-gke/storage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Persistent Storage

## GCSFuse

**Important Note:** To use option, a GCS bucket must already be created within the project with the name in the format of `gcsfuse-{username}`

GCSFuse allow users to mount GCS Buckets as their local filesystem. This option allows ease of access on Cloud UI:

![Profiles Page](images/gcs_bucket.png)

Since this bucket in GCS, there is built in permission control and access outside of the clutser.
Loading

0 comments on commit 2090fd4

Please sign in to comment.