Merge branch 'main' into etang/finetuning-logging

anyscale · Sep 18, 2024 · c6c3b7b · c6c3b7b
2 parents c3962ad + 7c03379
commit c6c3b7b
Show file tree

Hide file tree

Showing 25 changed files with 791 additions and 447 deletions.
diff --git a/.github/workflows/pre-commit.yaml b/.github/workflows/pre-commit.yaml
@@ -11,10 +11,12 @@ jobs:
     steps:
     - uses: actions/checkout@v3
     - uses: actions/setup-python@v3
+      with:
+        python-version: '3.9'
 
     # Install pre-commit dependencies
     - name: Install pre-commit
-      run: pip install pre-commit jupyter
+      run: pip install pre-commit==3.8.0 jupyter==1.1.1
 
     # Run pre-commit hooks with verbose logging
     - name: Run pre-commit

diff --git a/templates/batch-llm/README.ipynb b/templates/batch-llm/README.ipynb
@@ -423,7 +423,7 @@
     "### Monitoring Dataset execution\n",
     "We can use the Ray Dashboard to monitor the Dataset execution. In the Ray Dashboard tab, navigate to the Job page and open the \"Ray Data Overview\" section. Click on the link for the running job, and open the \"Ray Data Overview\" section to view the details of the batch inference execution:\n",
     "\n",
-    "<img src=\"assets/ray-data-jobs.png\" width=900px/>"
+    "<img src=\"assets/ray-data-jobs.png\" width=900px />"
    ]
   },
   {

diff --git a/templates/batch-llm/README.md b/templates/batch-llm/README.md
@@ -288,7 +288,7 @@ print(f"Batch inference result is written into {output_path}.")
 ### Monitoring Dataset execution
 We can use the Ray Dashboard to monitor the Dataset execution. In the Ray Dashboard tab, navigate to the Job page and open the "Ray Data Overview" section. Click on the link for the running job, and open the "Ray Data Overview" section to view the details of the batch inference execution:
 
-<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/batch-llm/assets/ray-data-jobs.png" width=900px/>
+<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/batch-llm/assets/ray-data-jobs.png" width=900px />
 
 ### Handling GPU out-of-memory failures
 If you run into CUDA out of memory, your batch size is likely too large. Decrease the batch size as described above.

diff --git a/templates/e2e-llm-workflows/README.ipynb b/templates/e2e-llm-workflows/README.ipynb
diff --git a/templates/e2e-llm-workflows/README.md b/templates/e2e-llm-workflows/README.md
diff --git a/templates/e2e-llm-workflows/deploy/jobs/ft.yaml b/templates/e2e-llm-workflows/deploy/jobs/ft.yaml
@@ -1,5 +1,6 @@
 name: e2e-llm-workflows
 entrypoint: llmforge anyscale finetune configs/training/lora/llama-3-8b.yaml
-image_uri: localhost:5555/anyscale/llm-forge:0.5.3
+image_uri: localhost:5555/anyscale/llm-forge:0.5.4
 requirements: []
 max_retries: 1
+excludes: ["assets"]
diff --git a/templates/e2e-llm-workflows/src/ft.py b/templates/e2e-llm-workflows/src/ft.py
diff --git a/templates/e2e-llm-workflows/src/utils.py b/templates/e2e-llm-workflows/src/utils.py
@@ -1,8 +1,19 @@
+from google.cloud import storage
+from contextlib import contextmanager
 import os
+from tempfile import TemporaryDirectory
 import boto3
+from urllib.parse import urlparse
+from ray.data import Dataset
 
 
-def download_files_from_bucket(bucket, path, local_dir):
+def download_files_from_s3(s3_uri, local_dir):
+    parsed_uri = urlparse(s3_uri)
+    if parsed_uri.scheme != "s3":
+        raise ValueError(f"Expected S3 URI, got {s3_uri}")
+    bucket = parsed_uri.netloc
+    path = parsed_uri.path.lstrip("/")
+
     s3 = boto3.client('s3')
     paginator = s3.get_paginator('list_objects_v2')
     page_iterator = paginator.paginate(Bucket=bucket, Prefix=path)
@@ -13,3 +24,42 @@ def download_files_from_bucket(bucket, path, local_dir):
             os.makedirs(os.path.dirname(local_path), exist_ok=True)
             s3.download_file(bucket, key, local_path)
             print(f"Downloaded {key} to {local_path}")
+
+def download_files_from_gcs(gcs_uri, local_dir):
+    parsed_uri = urlparse(gcs_uri)
+    if parsed_uri.scheme != "gs":
+        raise ValueError(f"Expected GCS URI, got {gcs_uri}")
+    bucket_name = parsed_uri.netloc
+    prefix = parsed_uri.path.lstrip("/")
+
+    storage_client = storage.Client()
+    bucket = storage_client.bucket(bucket_name)
+    blobs = bucket.list_blobs(prefix=prefix)
+    for blob in blobs:
+        # Skip in case the blob is the root folder
+        if blob.name.rstrip("/") == prefix:
+            continue
+        local_path = os.path.join(local_dir, blob.name)
+        os.makedirs(os.path.dirname(local_path), exist_ok=True)
+        blob.download_to_filename(local_path)
+        print(f"Downloaded {blob.name} to {local_path}")
+
+def download_files_from_remote(uri, local_dir):
+    parsed_uri = urlparse(uri)
+    if parsed_uri.scheme == "gs":
+        download_files_from_gcs(uri, local_dir)
+    elif parsed_uri.scheme == "s3":
+        download_files_from_s3(uri, local_dir)
+    else:
+        raise ValueError(f"Expected S3 or GCS URI, got {uri}")
+
+
+@contextmanager
+def get_dataset_file_path(dataset: Dataset):
+    """Transforms a Ray `Dataset` into a single temp. JSON file written on disk.
+    Yields the path to the file."""
+    with TemporaryDirectory() as temp_path:
+        dataset.repartition(1).write_json(temp_path)
+        assert len(os.listdir(temp_path)) == 1, "The dataset should be written to a single file"
+        dataset_file_path = f"{temp_path}/{os.listdir(temp_path)[0]}"
+        yield dataset_file_path
diff --git a/templates/endpoints_v2/README.ipynb b/templates/endpoints_v2/README.ipynb
@@ -186,11 +186,11 @@
    "source": [
     "After the command runs, click the deploy notification (or navigate to ``Home > Services``) to access the Service UI:\n",
     "\n",
-    "<img src=\"assets/service-notify.png\" width=500px/>\n",
+    "<img src=\"assets/service-notify.png\" width=500px />\n",
     "\n",
     "Navigate to the Service UI and wait for the service to reach \"Active\". It will begin in \"Starting\" state:\n",
     "\n",
-    "<img src=\"assets/service-starting.png\" width=600px/>"
+    "<img src=\"assets/service-starting.png\" width=600px />"
    ]
   },
   {
@@ -204,7 +204,7 @@
     "\n",
     "You can also find this information by clicking the \"Query\" button in the Service UI.\n",
     "\n",
-    "<img src=\"assets/service-query.png\" width=600px/>"
+    "<img src=\"assets/service-query.png\" width=600px />"
    ]
   },
   {

diff --git a/templates/endpoints_v2/README.md b/templates/endpoints_v2/README.md
@@ -123,11 +123,11 @@ To deploy an application with one model as an Anyscale Service, update the file
 
 After the command runs, click the deploy notification (or navigate to ``Home > Services``) to access the Service UI:
 
-<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/endpoints_v2/assets/service-notify.png" width=500px/>
+<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/endpoints_v2/assets/service-notify.png" width=500px />
 
 Navigate to the Service UI and wait for the service to reach "Active". It will begin in "Starting" state:
 
-<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/endpoints_v2/assets/service-starting.png" width=600px/>
+<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/endpoints_v2/assets/service-starting.png" width=600px />
 
 
 ## Step 5 - Query the service endpoint
@@ -136,7 +136,7 @@ The above command should print something like `(anyscale +2.9s) curl -H 'Authori
 
 You can also find this information by clicking the "Query" button in the Service UI.
 
-<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/endpoints_v2/assets/service-query.png" width=600px/>
+<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/endpoints_v2/assets/service-query.png" width=600px />
 
 
 ```python

diff --git a/templates/intro-batch-inference/README.ipynb b/templates/intro-batch-inference/README.ipynb
@@ -90,7 +90,7 @@
     "\n",
     "In the Ray Dashboard tab, navigate to the Job page and open the \"Ray Data Overview\" section to view the details of the batch inference execution:\n",
     "\n",
-    "<img src=\"assets/ray-data-job.png\" width=800px/>\n",
+    "<img src=\"assets/ray-data-job.png\" width=800px />\n",
     "\n"
    ]
   },
@@ -134,7 +134,7 @@
     "\n",
     "The remaining is the same as in the code we ran above. To test this out, first make sure to either enable *Auto-select worker nodes* or configure your workspace cluster to have GPU worker nodes:\n",
     "\n",
-    "<img src=\"assets/ray-data-gpu.png\" width=300px/>\n",
+    "<img src=\"assets/ray-data-gpu.png\" width=300px />\n",
     "\n",
     "Run the below cell to test out the new code using GPUs:"
    ]

diff --git a/templates/intro-batch-inference/README.md b/templates/intro-batch-inference/README.md
@@ -70,7 +70,7 @@ Note that above we called ``ds.show()`` in order to print the results to the con
 
 In the Ray Dashboard tab, navigate to the Job page and open the "Ray Data Overview" section to view the details of the batch inference execution:
 
-<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/intro-batch-inference/assets/ray-data-job.png" width=800px/>
+<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/intro-batch-inference/assets/ray-data-job.png" width=800px />
 
 
 
@@ -101,7 +101,7 @@ To use GPUs for inference, make the following changes to your code:
 
 The remaining is the same as in the code we ran above. To test this out, first make sure to either enable *Auto-select worker nodes* or configure your workspace cluster to have GPU worker nodes:
 
-<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/intro-batch-inference/assets/ray-data-gpu.png" width=300px/>
+<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/intro-batch-inference/assets/ray-data-gpu.png" width=300px />
 
 Run the below cell to test out the new code using GPUs:
 

diff --git a/templates/intro-jobs/README.ipynb b/templates/intro-jobs/README.ipynb
@@ -92,7 +92,7 @@
     "\n",
     "You should see the job state and its output on the overview page.\n",
     "\n",
-    "<img src=\"assets/anyscale-job.png\" height=400px>"
+    "<img src=\"assets/anyscale-job.png\" height=400px />"
    ]
   },
   {

diff --git a/templates/intro-jobs/README.md b/templates/intro-jobs/README.md
@@ -59,7 +59,7 @@ You can view active and historical job runs at (`Home > Jobs`). Click into the j
 
 You should see the job state and its output on the overview page.
 
-<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/intro-jobs/assets/anyscale-job.png" height=400px>
+<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/intro-jobs/assets/anyscale-job.png" height=400px />
 
 ## Submitting a Job programmatically
 

diff --git a/templates/intro-services/README.ipynb b/templates/intro-services/README.ipynb
@@ -234,11 +234,11 @@
     "\n",
     "By clicking on the **Running** service, you can view the status of deployments and how many replicas each contains. For example, your `FastAPIDeployment` has `1` replica.\n",
     "\n",
-    "<img src=\"assets/service-overview.png\" height=400px>\n",
+    "<img src=\"assets/service-overview.png\" height=400px />\n",
     "\n",
     "In the Logs, you can search for the message “Handling request!” to view each request for easier debugging.\n",
     "\n",
-    "<img src=\"assets/service-logs.png\" height=400px>\n"
+    "<img src=\"assets/service-logs.png\" height=400px />\n"
    ]
   },
   {
@@ -275,7 +275,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "<img src=\"assets/service-replicas.png\" height=400px>\n",
+    "<img src=\"assets/service-replicas.png\" height=400px />\n",
     "\n",
     "**Note**: This approach is a way to quickly modify scale for this example. As a best practice in production, define [autoscaling behavior](https://docs.anyscale.com/platform/services/scale-a-service#autoscaling) in the [ServiceConfig](https://docs.anyscale.com/reference/service-api#serviceconfig) contained in a `config.yaml` file. The number of worker nodes that Anyscale launches dynamically scales up and down in response to traffic and is scoped by the overall cluster compute config you define.\n"
    ]
@@ -304,7 +304,7 @@
    "source": [
     "In the service overview page, you can monitor the status of the update and see Ray Serve shut down the previous cluster.\n",
     "\n",
-    "<img src=\"assets/service-rollout.png\" height=400px>\n",
+    "<img src=\"assets/service-rollout.png\" height=400px />\n",
     "\n",
     "**Note**: Using this command triggers an automatic rollout which gradually shifts traffic from the previous cluster, or primary version, to the incoming cluster, or canary version. To learn more about configuring rollout behavior, see [Update a service](https://docs.anyscale.com/platform/services/update-a-service).\n"
    ]

diff --git a/templates/intro-services/README.md b/templates/intro-services/README.md
@@ -160,11 +160,11 @@ To view the service, navigate to 🏠 **> Services > `my_service`**. On this pag
 
 By clicking on the **Running** service, you can view the status of deployments and how many replicas each contains. For example, your `FastAPIDeployment` has `1` replica.
 
-<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/intro-services/assets/service-overview.png" height=400px>
+<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/intro-services/assets/service-overview.png" height=400px />
 
 In the Logs, you can search for the message “Handling request!” to view each request for easier debugging.
 
-<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/intro-services/assets/service-logs.png" height=400px>
+<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/intro-services/assets/service-logs.png" height=400px />
 
 
 ### Configure scaling
@@ -193,7 +193,7 @@ my_app = FastAPIDeployment.bind()
 ```
 
 
-<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/intro-services/assets/service-replicas.png" height=400px>
+<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/intro-services/assets/service-replicas.png" height=400px />
 
 **Note**: This approach is a way to quickly modify scale for this example. As a best practice in production, define [autoscaling behavior](https://docs.anyscale.com/platform/services/scale-a-service#autoscaling) in the [ServiceConfig](https://docs.anyscale.com/reference/service-api#serviceconfig) contained in a `config.yaml` file. The number of worker nodes that Anyscale launches dynamically scales up and down in response to traffic and is scoped by the overall cluster compute config you define.
 
@@ -210,7 +210,7 @@ To deploy the update, execute the following command to trigger a staged rollout
 
 In the service overview page, you can monitor the status of the update and see Ray Serve shut down the previous cluster.
 
-<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/intro-services/assets/service-rollout.png" height=400px>
+<img src="https://raw.githubusercontent.com/anyscale/templates/main/templates/intro-services/assets/service-rollout.png" height=400px />
 
 **Note**: Using this command triggers an automatic rollout which gradually shifts traffic from the previous cluster, or primary version, to the incoming cluster, or canary version. To learn more about configuring rollout behavior, see [Update a service](https://docs.anyscale.com/platform/services/update-a-service).
 

diff --git a/templates/intro-tune/README.ipynb b/templates/intro-tune/README.ipynb
@@ -47,7 +47,7 @@
     "\n",
     "You should see during the run a table of the trials created by Tune. One trial is created for each individual value of `x` in the grid sweep. The table shows where the trial was run in the cluster, how long the trial took, and reported metrics:\n",
     "\n",
-    "<img src=\"assets/tune-status.png\" width=800px/>\n",
+    "<img src=\"assets/tune-status.png\" width=800px />\n",
     "\n",
     "On completion, it returns a `ResultGrid` object that captures the experiment results. This includes the reported trial metrics, the path where trial results are saved:\n",
     "\n",
@@ -73,7 +73,7 @@
     "\n",
     "To view the stdout and stderr of the trial, use the ``Logs`` tab in the Workspace UI. Navigate to the log page and search for \"hello\", and you'll be able to see the logs printed for each trial run in the cluster:\n",
     "\n",
-    "<img src=\"assets/tune-logs.png\" width=800px/>\n",
+    "<img src=\"assets/tune-logs.png\" width=800px />\n",
     "\n",
     "Tune also saves a number of input and output metadata files for each trial to storage, you can view them by querying the returned result object:\n",
     "- ``params.json``: The input parameters of the trial\n",
@@ -258,7 +258,7 @@
    "source": [
     "During and after the execution, Tune reports a table of current trial status and reported accuracy. You can find the configuration that achieves the highest accuracy on the validation set:\n",
     "\n",
-    "<img src=\"assets/tune-output.png\" width=600px/>\n"
+    "<img src=\"assets/tune-output.png\" width=600px />\n"
    ]
   },
   {
@@ -292,15 +292,15 @@
     "\n",
     "First, let's view the run in the Jobs sub-tab and click through to into the job view. Here, you can see an overview of the job, and the status of the individual actors Tune has launched to parallelize the job:\n",
     "\n",
-    "<img src=\"assets/tune-jobs-1.png\" width=800px/>\n",
+    "<img src=\"assets/tune-jobs-1.png\" width=800px />\n",
     "\n",
     "You can further click through to the actors sub-page and view the status of individual running actors. Inspect trial logs, CPU profiles, and memory profiles using this page:\n",
     "\n",
-    "<img src=\"assets/tune-jobs-2.png\" width=800px/>\n",
+    "<img src=\"assets/tune-jobs-2.png\" width=800px />\n",
     "\n",
     "Finally, we can observe the holistic execution of the job in the cluster in the Metrics sub-tab. When running the above job on a 36-CPU cluster, we can see that Tune was able to launch ~16 concurrent actors for trial execution, with each actor assigned 2 CPU slots as configured:\n",
     "\n",
-    "<img src=\"assets/tune-metrics.png\" width=800px/>\n"
+    "<img src=\"assets/tune-metrics.png\" width=800px />\n"
    ]
   },
   {