Skip to content

Commit

Permalink
update default RayCluster name to 'ray-cluster' (#417)
Browse files Browse the repository at this point in the history
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
  • Loading branch information
andrewsykim authored Mar 21, 2024
1 parent d42f79d commit bb0524d
Show file tree
Hide file tree
Showing 26 changed files with 66 additions and 66 deletions.
2 changes: 1 addition & 1 deletion applications/rag/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ Ensure your k8s client is using the correct cluster by running:
gcloud container clusters get-credentials ${CLUSTER_NAME:?} --location ${CLUSTER_REGION:?}
```

1. Verify Kuberay is setup: run `kubectl get pods -n ${NAMESPACE:?}`. There should be a Ray head (and Ray worker pod on GKE Standard only) in `Running` state (prefixed by `example-cluster-kuberay-head-` and `example-cluster-kuberay-worker-workergroup-`).
1. Verify Kuberay is setup: run `kubectl get pods -n ${NAMESPACE:?}`. There should be a Ray head (and Ray worker pod on GKE Standard only) in `Running` state (prefixed by `ray-cluster-kuberay-head-` and `ray-cluster-kuberay-worker-workergroup-`).

2. Verify Jupyterhub service is setup:
* Fetch the service IP/Domain:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -261,7 +261,7 @@
"source": [
"import ray, time\n",
"from ray.job_submission import JobSubmissionClient\n",
"client = JobSubmissionClient(\"ray://example-cluster-kuberay-head-svc:10001\")"
"client = JobSubmissionClient(\"ray://ray-cluster-kuberay-head-svc:10001\")"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,7 @@
"source": [
"import ray\n",
"from ray.job_submission import JobSubmissionClient\n",
"client = JobSubmissionClient(\"ray://example-cluster-kuberay-head-svc:10001\")"
"client = JobSubmissionClient(\"ray://ray-cluster-kuberay-head-svc:10001\")"
]
},
{
Expand Down Expand Up @@ -268,9 +268,9 @@
"metadata": {},
"outputs": [],
"source": [
"# Need to run kubectl port-forward -n <namespace> service/example-cluster-kuberay-head-svc 8265:8265 to see the UI\n",
"# Need to run kubectl port-forward -n <namespace> service/ray-cluster-kuberay-head-svc 8265:8265 to see the UI\n",
"# Fetch job status\n",
"!ray job status {job_id} --address \"ray://example-cluster-kuberay-head-svc:10001\" "
"!ray job status {job_id} --address \"ray://ray-cluster-kuberay-head-svc:10001\" "
]
}
],
Expand Down
2 changes: 1 addition & 1 deletion applications/ray/kuberay-tpu-webhook/tests/tpu-test.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ def tpu_cores():


ray.init(
address="ray://example-cluster-kuberay-head-svc:10001",
address="ray://ray-cluster-kuberay-head-svc:10001",
runtime_env={
"pip": [
"jax[tpu]==0.4.11",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,14 +57,14 @@ pod/kuberay-operator-64b7b88759-5ppfw 1/1 Running 0 95
```
$ kubectl get all -n example
NAME READY STATUS RESTARTS AGE
pod/example-cluster-kuberay-head-9x2q6 2/2 Running 0 3m12s
pod/example-cluster-kuberay-worker-workergroup-95nm2 2/2 Running 0 3m12s
pod/example-cluster-kuberay-worker-workergroup-tfg9n 2/2 Running 0 3m12s
pod/ray-cluster-kuberay-head-9x2q6 2/2 Running 0 3m12s
pod/ray-cluster-kuberay-worker-workergroup-95nm2 2/2 Running 0 3m12s
pod/ray-cluster-kuberay-worker-workergroup-tfg9n 2/2 Running 0 3m12s
pod/kuberay-operator-64b7b88759-5ppfw 1/1 Running 0 4m4s
pod/tensorflow-0 2/2 Running 0 16s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/example-cluster-kuberay-head-svc ClusterIP 10.8.10.33 <none> 10001/TCP,8265/TCP,8080/TCP,6379/TCP,8000/TCP 3m12s
service/ray-cluster-kuberay-head-svc ClusterIP 10.8.10.33 <none> 10001/TCP,8265/TCP,8080/TCP,6379/TCP,8000/TCP 3m12s
service/kuberay-operator ClusterIP 10.8.14.245 <none> 8080/TCP 4m4s
service/tensorflow ClusterIP None <none> 8888/TCP 16s
service/tensorflow-jupyter LoadBalancer 10.8.3.9 <pending> 80:31891/TCP 16s
Expand Down Expand Up @@ -96,7 +96,7 @@ http://tensorflow-0:8888/?token=<TOKEN> :: /home/jovyan
12. Follow the comments and execute the cells in the notebook to run a distributed training job and then inference on the tuned model
13. Port forward the ray service port to examine the ray dashboard for jobs progress details, The dashboard is reachable at localhost:8286 in the local browser
```
kubectl port-forward -n example service/example-cluster-kuberay-head-svc 8265:8265
kubectl port-forward -n example service/ray-cluster-kuberay-head-svc 8265:8265
```
14. During an ongoing traing, the pod resource usage of CPU, Memory, GPU, GPU Memory can be visualized with the GKE Cloud Console for the workloads
example ![Ray Head resources](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/ray-on-gke/raytrain-examples/images/ray-head-resources.png) and ![Ray Worker resources](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/ray-on-gke/raytrain-examples/images/ray-worker-resources.png)
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
# limitations under the License.

resource "helm_release" "ray-cluster" {
name = "example-cluster"
name = "ray-cluster"
repository = "https://ray-project.github.io/kuberay-helm/"
chart = "ray-cluster"
namespace = var.namespace
Expand Down
2 changes: 1 addition & 1 deletion applications/ray/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ variable "create_ray_cluster" {

variable "ray_cluster_name" {
type = string
default = "example-cluster"
default = "ray-cluster"
}

variable "enable_gpu" {
Expand Down
2 changes: 1 addition & 1 deletion applications/ray/workloads.tfvars
Original file line number Diff line number Diff line change
Expand Up @@ -38,5 +38,5 @@ workload_identity_service_account = "ray-service-account"
create_gcs_bucket = true
gcs_bucket = "ray-bucket-zydg"
create_ray_cluster = true
ray_cluster_name = "example-cluster"
ray_cluster_name = "ray-cluster"
enable_grafana_on_ray_dashboard = false
4 changes: 2 additions & 2 deletions cloudbuild.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ steps:
# Make sure pods are running
kubectl wait --all pods -n ml-$SHORT_SHA-$_BUILD_ID --for=condition=Ready --timeout=300s
kubectl port-forward -n ml-$SHORT_SHA-$_BUILD_ID service/example-cluster-kuberay-head-svc 8265:8265 &
kubectl port-forward -n ml-$SHORT_SHA-$_BUILD_ID service/ray-cluster-kuberay-head-svc 8265:8265 &
# Wait port-forwarding to take its place
sleep 5s
Expand Down Expand Up @@ -218,7 +218,7 @@ steps:
# Validate Ray: Make sure pods are running
kubectl wait --all pods -n rag-$SHORT_SHA-$_BUILD_ID --for=condition=Ready --timeout=300s
kubectl port-forward -n rag-$SHORT_SHA-$_BUILD_ID service/example-cluster-kuberay-head-svc 8265:8265 &
kubectl port-forward -n rag-$SHORT_SHA-$_BUILD_ID service/ray-cluster-kuberay-head-svc 8265:8265 &
# Wait port-forwarding to take its place
sleep 5s
Expand Down
2 changes: 1 addition & 1 deletion modules/kuberay-cluster/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ variable "project_id" {
variable "name" {
type = string
description = "Name of the ray cluster"
default = "example-cluster"
default = "ray-cluster"
}

variable "db_region" {
Expand Down
6 changes: 3 additions & 3 deletions ray-on-gke/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Validate that the RayCluster is ready:
```
$ kubectl get raycluster
NAME DESIRED WORKERS AVAILABLE WORKERS STATUS AGE
example-cluster-kuberay 1 1 ready 3m41s
ray-cluster-kuberay 1 1 ready 3m41s
```

### Install Ray
Expand All @@ -39,7 +39,7 @@ To submit a Ray job, first establish a connection to the Ray head. For this exam
to connect to the Ray head via localhost.

```bash
$ kubectl -n ml port-forward service/example-cluster-kuberay-head-svc 8265 &
$ kubectl -n ml port-forward service/ray-cluster-kuberay-head-svc 8265 &
```

Submit a Ray job that prints resources available in your Ray cluster:
Expand Down Expand Up @@ -79,7 +79,7 @@ To use the client, first establish a connection to the Ray head. For this exampl
to connect to the Ray head Service via localhost.
```bash
$ kubectl -n ml port-forward service/example-cluster-kuberay-head-svc 10001 &
$ kubectl -n ml port-forward service/ray-cluster-kuberay-head-svc 10001 &
```
Next, define a Python script containing remote code you want to run on your Ray cluster. Similar to the previous example,
Expand Down
4 changes: 2 additions & 2 deletions ray-on-gke/examples/notebooks/gpt-j-online.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@
"outputs": [],
"source": [
"ray.init(\n",
" address=\"ray://example-cluster-kuberay-head-svc:10001\",\n",
" address=\"ray://ray-cluster-kuberay-head-svc:10001\",\n",
" runtime_env={\n",
" \"pip\": [\n",
" \"IPython\",\n",
Expand Down Expand Up @@ -248,7 +248,7 @@
"\n",
"sample_input = {\"text\": prompt}\n",
"\n",
"output = requests.post(\"http://example-cluster-kuberay-head-svc:8000/\", json=[sample_input]).json()\n",
"output = requests.post(\"http://ray-cluster-kuberay-head-svc:8000/\", json=[sample_input]).json()\n",
"print(output)"
]
},
Expand Down
2 changes: 1 addition & 1 deletion ray-on-gke/examples/notebooks/jax-tpu.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
"import ray\n",
"\n",
"ray.init(\n",
" address=\"ray://example-cluster-kuberay-head-svc:10001\",\n",
" address=\"ray://ray-cluster-kuberay-head-svc:10001\",\n",
" runtime_env={\n",
" \"pip\": [\n",
" \"jax[tpu]==0.4.11\",\n",
Expand Down
2 changes: 1 addition & 1 deletion ray-on-gke/examples/notebooks/ray-dist-mnist.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@
"outputs": [],
"source": [
"ray.init(\n",
" address=\"ray://example-cluster-kuberay-head-svc:10001\",\n",
" address=\"ray://ray-cluster-kuberay-head-svc:10001\",\n",
" runtime_env={\n",
" \"pip\": [\n",
" \"IPython\",\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
"import ray\n",
"\n",
"ray.init(\n",
" address=\"ray://example-cluster-kuberay-head-svc:10001\",\n",
" address=\"ray://ray-cluster-kuberay-head-svc:10001\",\n",
" runtime_env={\n",
" \"pip\": [\n",
" \"IPython\",\n",
Expand Down
22 changes: 11 additions & 11 deletions ray-on-gke/examples/notebooks/ray_basic.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@
}
],
"source": [
"ray.init(\"ray://example-cluster-kuberay-head-svc:10001\")"
"ray.init(\"ray://ray-cluster-kuberay-head-svc:10001\")"
]
},
{
Expand All @@ -147,25 +147,25 @@
"output_type": "stream",
"text": [
"Iteration 0\n",
"Counter({('example-cluster-kuberay-worker-workergroup-9bnxn', 'example-cluster-kuberay-worker-workergroup-9bnxn'): 38, ('example-cluster-kuberay-worker-workergroup-9bnxn', 'example-cluster-kuberay-head-fd9g6'): 27, ('example-cluster-kuberay-head-fd9g6', 'example-cluster-kuberay-head-fd9g6'): 26, ('example-cluster-kuberay-head-fd9g6', 'example-cluster-kuberay-worker-workergroup-9bnxn'): 9})\n",
"Counter({('ray-cluster-kuberay-worker-workergroup-9bnxn', 'ray-cluster-kuberay-worker-workergroup-9bnxn'): 38, ('ray-cluster-kuberay-worker-workergroup-9bnxn', 'ray-cluster-kuberay-head-fd9g6'): 27, ('ray-cluster-kuberay-head-fd9g6', 'ray-cluster-kuberay-head-fd9g6'): 26, ('ray-cluster-kuberay-head-fd9g6', 'ray-cluster-kuberay-worker-workergroup-9bnxn'): 9})\n",
"Iteration 1\n",
"Counter({('example-cluster-kuberay-head-fd9g6', 'example-cluster-kuberay-head-fd9g6'): 31, ('example-cluster-kuberay-worker-workergroup-9bnxn', 'example-cluster-kuberay-worker-workergroup-9bnxn'): 26, ('example-cluster-kuberay-head-fd9g6', 'example-cluster-kuberay-worker-workergroup-9bnxn'): 23, ('example-cluster-kuberay-worker-workergroup-9bnxn', 'example-cluster-kuberay-head-fd9g6'): 20})\n",
"Counter({('ray-cluster-kuberay-head-fd9g6', 'ray-cluster-kuberay-head-fd9g6'): 31, ('ray-cluster-kuberay-worker-workergroup-9bnxn', 'ray-cluster-kuberay-worker-workergroup-9bnxn'): 26, ('ray-cluster-kuberay-head-fd9g6', 'ray-cluster-kuberay-worker-workergroup-9bnxn'): 23, ('ray-cluster-kuberay-worker-workergroup-9bnxn', 'ray-cluster-kuberay-head-fd9g6'): 20})\n",
"Iteration 2\n",
"Counter({('example-cluster-kuberay-head-fd9g6', 'example-cluster-kuberay-head-fd9g6'): 33, ('example-cluster-kuberay-worker-workergroup-9bnxn', 'example-cluster-kuberay-worker-workergroup-9bnxn'): 25, ('example-cluster-kuberay-worker-workergroup-9bnxn', 'example-cluster-kuberay-head-fd9g6'): 22, ('example-cluster-kuberay-head-fd9g6', 'example-cluster-kuberay-worker-workergroup-9bnxn'): 20})\n",
"Counter({('ray-cluster-kuberay-head-fd9g6', 'ray-cluster-kuberay-head-fd9g6'): 33, ('ray-cluster-kuberay-worker-workergroup-9bnxn', 'ray-cluster-kuberay-worker-workergroup-9bnxn'): 25, ('ray-cluster-kuberay-worker-workergroup-9bnxn', 'ray-cluster-kuberay-head-fd9g6'): 22, ('ray-cluster-kuberay-head-fd9g6', 'ray-cluster-kuberay-worker-workergroup-9bnxn'): 20})\n",
"Iteration 3\n",
"Counter({('example-cluster-kuberay-head-fd9g6', 'example-cluster-kuberay-head-fd9g6'): 32, ('example-cluster-kuberay-worker-workergroup-9bnxn', 'example-cluster-kuberay-head-fd9g6'): 26, ('example-cluster-kuberay-head-fd9g6', 'example-cluster-kuberay-worker-workergroup-9bnxn'): 23, ('example-cluster-kuberay-worker-workergroup-9bnxn', 'example-cluster-kuberay-worker-workergroup-9bnxn'): 19})\n",
"Counter({('ray-cluster-kuberay-head-fd9g6', 'ray-cluster-kuberay-head-fd9g6'): 32, ('ray-cluster-kuberay-worker-workergroup-9bnxn', 'ray-cluster-kuberay-head-fd9g6'): 26, ('ray-cluster-kuberay-head-fd9g6', 'ray-cluster-kuberay-worker-workergroup-9bnxn'): 23, ('ray-cluster-kuberay-worker-workergroup-9bnxn', 'ray-cluster-kuberay-worker-workergroup-9bnxn'): 19})\n",
"Iteration 4\n",
"Counter({('example-cluster-kuberay-head-fd9g6', 'example-cluster-kuberay-head-fd9g6'): 30, ('example-cluster-kuberay-worker-workergroup-9bnxn', 'example-cluster-kuberay-head-fd9g6'): 27, ('example-cluster-kuberay-worker-workergroup-9bnxn', 'example-cluster-kuberay-worker-workergroup-9bnxn'): 23, ('example-cluster-kuberay-head-fd9g6', 'example-cluster-kuberay-worker-workergroup-9bnxn'): 20})\n",
"Counter({('ray-cluster-kuberay-head-fd9g6', 'ray-cluster-kuberay-head-fd9g6'): 30, ('ray-cluster-kuberay-worker-workergroup-9bnxn', 'ray-cluster-kuberay-head-fd9g6'): 27, ('ray-cluster-kuberay-worker-workergroup-9bnxn', 'ray-cluster-kuberay-worker-workergroup-9bnxn'): 23, ('ray-cluster-kuberay-head-fd9g6', 'ray-cluster-kuberay-worker-workergroup-9bnxn'): 20})\n",
"Iteration 5\n",
"Counter({('example-cluster-kuberay-head-fd9g6', 'example-cluster-kuberay-head-fd9g6'): 41, ('example-cluster-kuberay-worker-workergroup-9bnxn', 'example-cluster-kuberay-worker-workergroup-9bnxn'): 32, ('example-cluster-kuberay-worker-workergroup-9bnxn', 'example-cluster-kuberay-head-fd9g6'): 15, ('example-cluster-kuberay-head-fd9g6', 'example-cluster-kuberay-worker-workergroup-9bnxn'): 12})\n",
"Counter({('ray-cluster-kuberay-head-fd9g6', 'ray-cluster-kuberay-head-fd9g6'): 41, ('ray-cluster-kuberay-worker-workergroup-9bnxn', 'ray-cluster-kuberay-worker-workergroup-9bnxn'): 32, ('ray-cluster-kuberay-worker-workergroup-9bnxn', 'ray-cluster-kuberay-head-fd9g6'): 15, ('ray-cluster-kuberay-head-fd9g6', 'ray-cluster-kuberay-worker-workergroup-9bnxn'): 12})\n",
"Iteration 6\n",
"Counter({('example-cluster-kuberay-head-fd9g6', 'example-cluster-kuberay-head-fd9g6'): 30, ('example-cluster-kuberay-head-fd9g6', 'example-cluster-kuberay-worker-workergroup-9bnxn'): 28, ('example-cluster-kuberay-worker-workergroup-9bnxn', 'example-cluster-kuberay-worker-workergroup-9bnxn'): 21, ('example-cluster-kuberay-worker-workergroup-9bnxn', 'example-cluster-kuberay-head-fd9g6'): 21})\n",
"Counter({('ray-cluster-kuberay-head-fd9g6', 'ray-cluster-kuberay-head-fd9g6'): 30, ('ray-cluster-kuberay-head-fd9g6', 'ray-cluster-kuberay-worker-workergroup-9bnxn'): 28, ('ray-cluster-kuberay-worker-workergroup-9bnxn', 'ray-cluster-kuberay-worker-workergroup-9bnxn'): 21, ('ray-cluster-kuberay-worker-workergroup-9bnxn', 'ray-cluster-kuberay-head-fd9g6'): 21})\n",
"Iteration 7\n",
"Counter({('example-cluster-kuberay-head-fd9g6', 'example-cluster-kuberay-head-fd9g6'): 33, ('example-cluster-kuberay-worker-workergroup-9bnxn', 'example-cluster-kuberay-head-fd9g6'): 24, ('example-cluster-kuberay-worker-workergroup-9bnxn', 'example-cluster-kuberay-worker-workergroup-9bnxn'): 23, ('example-cluster-kuberay-head-fd9g6', 'example-cluster-kuberay-worker-workergroup-9bnxn'): 20})\n",
"Counter({('ray-cluster-kuberay-head-fd9g6', 'ray-cluster-kuberay-head-fd9g6'): 33, ('ray-cluster-kuberay-worker-workergroup-9bnxn', 'ray-cluster-kuberay-head-fd9g6'): 24, ('ray-cluster-kuberay-worker-workergroup-9bnxn', 'ray-cluster-kuberay-worker-workergroup-9bnxn'): 23, ('ray-cluster-kuberay-head-fd9g6', 'ray-cluster-kuberay-worker-workergroup-9bnxn'): 20})\n",
"Iteration 8\n",
"Counter({('example-cluster-kuberay-head-fd9g6', 'example-cluster-kuberay-head-fd9g6'): 38, ('example-cluster-kuberay-worker-workergroup-9bnxn', 'example-cluster-kuberay-worker-workergroup-9bnxn'): 29, ('example-cluster-kuberay-head-fd9g6', 'example-cluster-kuberay-worker-workergroup-9bnxn'): 18, ('example-cluster-kuberay-worker-workergroup-9bnxn', 'example-cluster-kuberay-head-fd9g6'): 15})\n",
"Counter({('ray-cluster-kuberay-head-fd9g6', 'ray-cluster-kuberay-head-fd9g6'): 38, ('ray-cluster-kuberay-worker-workergroup-9bnxn', 'ray-cluster-kuberay-worker-workergroup-9bnxn'): 29, ('ray-cluster-kuberay-head-fd9g6', 'ray-cluster-kuberay-worker-workergroup-9bnxn'): 18, ('ray-cluster-kuberay-worker-workergroup-9bnxn', 'ray-cluster-kuberay-head-fd9g6'): 15})\n",
"Iteration 9\n",
"Counter({('example-cluster-kuberay-head-fd9g6', 'example-cluster-kuberay-head-fd9g6'): 28, ('example-cluster-kuberay-worker-workergroup-9bnxn', 'example-cluster-kuberay-head-fd9g6'): 27, ('example-cluster-kuberay-head-fd9g6', 'example-cluster-kuberay-worker-workergroup-9bnxn'): 26, ('example-cluster-kuberay-worker-workergroup-9bnxn', 'example-cluster-kuberay-worker-workergroup-9bnxn'): 19})\n",
"Counter({('ray-cluster-kuberay-head-fd9g6', 'ray-cluster-kuberay-head-fd9g6'): 28, ('ray-cluster-kuberay-worker-workergroup-9bnxn', 'ray-cluster-kuberay-head-fd9g6'): 27, ('ray-cluster-kuberay-head-fd9g6', 'ray-cluster-kuberay-worker-workergroup-9bnxn'): 26, ('ray-cluster-kuberay-worker-workergroup-9bnxn', 'ray-cluster-kuberay-worker-workergroup-9bnxn'): 19})\n",
"Success!\n"
]
}
Expand Down
Loading

0 comments on commit bb0524d

Please sign in to comment.