[batch infer] Update batch inference template to use RayLLMBatch #346

rickyyx · 2024-09-24T22:05:24Z

Update the current batch llm inference template to use RayLLM-Batch

rickyyx · 2024-09-24T22:18:29Z

DON' MERGE, and don't delete the branch.

templates/batch-llm/README.ipynb

scottsun94 · 2024-09-24T23:33:38Z

templates/batch-llm/README.md

-    prompt_for_hugging_face_token,
-)
-```
+RayLLM-Batch is a library for running batch inference for LLMs using Ray Data for data processing, and defines an easy and flexible interface for the user to define their own workload. In this tutorial, we will implement a workload based on the [`CNNDailyMail`](https://huggingface.co/datasets/abisee/cnn_dailymail) dataset, which is a collection of news articles. And we will summarize each article with our batch inferencing pipeline. We will cover more details on how to customize the workload in the later sections.


same comment

scottsun94 · 2024-09-24T23:40:43Z

RE compute config change, the previous seems to have different variations of g5 and l4 instance types. I guess it helps with finding the gpus. Not sure if this change will make it harder to find gpu and slower to start the cluster

scottsun94 · 2024-09-25T01:54:21Z

@shomilj to review the computer config.

@angelinalg if u want to take a look at the content.

shomilj · 2024-09-25T02:15:18Z

configs/batch-llm/aws.yaml

+  name: head
+    # TODO(ricky): We need head node to have CUDA due to eager import from rayllm_batch now.
+  instance_type: g5.xlarge


we generally don't want to encourage the pattern of using GPU head nodes --> since it leads to people running code on head nodes that take down the workload due to oom (antipattern)

can you use a CPU head node instead with scheduling disabled (see basic serverless config that existing templates use) and wrap your code in an actor or something that runs on workers?

This makes sense - instead of temp fixing this by wrapping in actor, I think we will fix this by addressing the root cause soon, that is to make this code runnable on CPU itself (which it should be, it's just we are not lazy importing vllm as of now).

Co-authored-by: Huaiwei Sun <scottsun94@gmail.com>

scottjlee · 2024-09-25T18:33:46Z

templates/batch-llm/README.md


-For a Python script version of the code in this workspace template, refer to `main.py`.
+<!-- TODO: add a link for the RayLLM-Batch API reference -->
+This template shows you how to run batch inference for LLMs using RayLLM-Batch.


will you be including/updating main.py in this PR as well?

Since we don't want to run anyscale submit job for now (because the log outputs are just not properly handled yet) so I didn't add it.

I was actually planning to remove the main.py - good catch. I am open to have the main.py as well, no strong preference.

sounds good to remove, yeah the main point of the main.py was to run as a job, and also allow the user to run the notebook code as one continuous script.

scottjlee · 2024-09-25T18:36:36Z

templates/batch-llm/README.md

-        }
+from rayllm_batch import init_engine_from_config
+# Read the model configs from the path.
+model_config_path = "configs/llama-3.1-8b-a10g.yaml" 


should we add code to auto-detect if the user is on aws or gcp, and choose the right config accordingly? (e.g. same logic in _on_gcp_cloud())

oh nice - that would be great actually. Let me do that.

Addressed comments and fix GCP compute not found. --------- Co-authored-by: rickyx <rickyx@anysale.com>

comaniac

LGTM. We will have a follow-up to support CPU head node by lazily importing vLLM.

rickyyx · 2024-10-22T17:10:48Z

Don't delete the branch for now.

rickyx added 4 commits September 24, 2024 00:53

update ipython

c85a6b8

update readme

bac0d25

update

d2c62a5

update

1927120

rickyyx requested review from shomilj and scottsun94 as code owners September 24, 2024 22:05

rickyyx mentioned this pull request Sep 24, 2024

[batch infer] Update batch inference template to use RayLLMBatch #343

Closed

scottsun94 reviewed Sep 24, 2024

View reviewed changes

templates/batch-llm/README.ipynb Outdated Show resolved Hide resolved

scottsun94 reviewed Sep 24, 2024

View reviewed changes

shomilj reviewed Sep 25, 2024

View reviewed changes

Update templates/batch-llm/README.ipynb

01c0475

Co-authored-by: Huaiwei Sun <scottsun94@gmail.com>

scottjlee reviewed Sep 25, 2024

View reviewed changes

rickyyx and others added 2 commits September 25, 2024 15:48

Update with comments and fix GCP compute config (#352)

76c7439

Addressed comments and fix GCP compute not found. --------- Co-authored-by: rickyx <rickyx@anysale.com>

Merge branch 'main' into raysummit-batchllm

9725f93

comaniac approved these changes Oct 22, 2024

View reviewed changes

rickyyx merged commit 7aec451 into main Oct 22, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[batch infer] Update batch inference template to use RayLLMBatch #346

[batch infer] Update batch inference template to use RayLLMBatch #346

rickyyx commented Sep 24, 2024

rickyyx commented Sep 24, 2024

scottsun94 Sep 24, 2024

scottsun94 commented Sep 24, 2024

scottsun94 commented Sep 25, 2024

shomilj Sep 25, 2024

rickyyx Sep 25, 2024

scottjlee Sep 25, 2024

rickyyx Sep 25, 2024

scottjlee Sep 25, 2024

scottjlee Sep 25, 2024

rickyyx Sep 25, 2024

comaniac left a comment

rickyyx commented Oct 22, 2024

[batch infer] Update batch inference template to use RayLLMBatch #346

[batch infer] Update batch inference template to use RayLLMBatch #346

Conversation

rickyyx commented Sep 24, 2024

rickyyx commented Sep 24, 2024

Choose a reason for hiding this comment

scottsun94 commented Sep 24, 2024

scottsun94 commented Sep 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

comaniac left a comment

Choose a reason for hiding this comment

rickyyx commented Oct 22, 2024