Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mountPath to use /tmp instead of /data #1584

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

alvarobartt
Copy link
Contributor

Description

This PR updates the mountPath so as to point to /tmp instead of /data, as the HF_HOME environment variable in the TGI DLCs is set to /tmp instead of the default /data set within the TGI default image (see Dockerfile); the change has always been there for the TGI DLC, but when setting the container URI i.e. the image within the Kubernetes manifests to the TGI DLC instead of the default TGI image (hosted on GitHub) as per GoogleCloudPlatform/ai-on-gke#816, the mountPath were not updated, and it was still using /data which was leading to some issues when running the examples.

This has already been fixed within this repository in this PR, and also within GoogleCloudPlatform/ai-on-gke#852.

cc @annapendleton @raushan2016

Closes #1581

Tasks

  • The contributing guide has been read and followed.
  • The samples added / modified have been fully tested.
  • Workflow files have been added / modified, if applicable.
  • Region tags have been properly added, if new samples.
  • Editable variables have been used, where applicable.
  • All dependencies are set to up-to-date versions, as applicable.
  • Merge this pull-request for me once it is approved.

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi team,

I'm currently reviewing this pull request and will provide feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly understand the changes:

This pull request, authored by alvarobartt, addresses an issue where the mountPath in several Kubernetes YAML files was incorrectly set to /data instead of /tmp. The issue stemmed from a mismatch between the default image and a Text Generation Inference (TGI) Docker image, where the HF_HOME environment variable points to /tmp.

Intent: To correct the mountPath to /tmp for consistency with the TGI DLCs and resolve issues encountered when using the TGI DLCs in Kubernetes manifests.

Changes:

The pull request modifies four YAML files, updating the mountPath from /data to /tmp within the volumeMounts section:

  • ai-ml/llm-multiple-gpus/falcon-40b/text-generation-inference.yaml (line 54)
  • ai-ml/llm-multiple-gpus/llama2-70b/text-generation-inference.yaml (line 59)
  • ai-ml/llm-multiple-gpus/llama3-70b/text-generation-inference.yaml (line 61)
  • ai-ml/llm-multiple-gpus/mixtral-8x7b/text-generation-inference.yaml (line 59)

Activity: The PR description mentions that this fix has already been implemented in another repository and references related pull requests (GoogleCloudPlatform/ai-on-gke#816 and GoogleCloudPlatform/ai-on-gke#852).

This change ensures consistency across different environments and resolves potential runtime issues.


A little haiku to lighten the mood:

/data path is wrong,
/tmp now holds the secrets,
Code flows smoothly on.

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR updates the mountPath for several LLM deployments to use /tmp instead of /data, aligning with the HF_HOME environment variable setting in the TGI DLC images. This change is necessary to avoid issues when using TGI DLC images as the container URI in Kubernetes manifests. The PR description clearly explains the problem, its cause, and the solution. The references to related PRs provide helpful context. Overall, the changes are well-justified and effectively address the identified issue.

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi team,

I'm currently reviewing this pull request and will provide my feedback shortly. In the meantime, here's a summary for you and other reviewers to quickly get up to speed:

This pull request, authored by alvarobartt, aims to fix an issue where the mountPath in several Kubernetes YAML files was incorrectly set to /data instead of /tmp. This was causing problems because the HF_HOME environment variable in the Text Generation Inference (TGI) Docker images is set to /tmp.

The issue stemmed from using the TGI DLC (Docker container image) instead of the default TGI image. While the DLC already used /tmp, the Kubernetes manifests weren't updated to reflect this change.

Here's a breakdown of the changes:

This fix ensures that the examples run correctly when using the TGI DLC.


And now, a little haiku to lighten the mood:

/data path is wrong,
/tmp now holds the secrets,
Code compiles at last!

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR updates the mountPath to /tmp to align with the HF_HOME environment variable in TGI DLCs. This change is necessary for compatibility with the TGI DLC images and resolves issues encountered when using these images in Kubernetes deployments. The PR description clearly outlines the problem, its cause, the solution, and related fixes in other repositories. Overall, the changes are well-justified and implemented correctly.

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi team,

I'm currently reviewing this pull request and will provide feedback shortly. In the meantime, here's a summary for you and other reviewers to quickly get up to speed:

This pull request, submitted by alvarobartt, addresses an issue where the mountPath in several Kubernetes YAML files was incorrectly set to /data instead of /tmp. The issue arose due to a mismatch between the default image and a Text Generation Inference (TGI) Docker image used in the examples. The TGI DLC uses /tmp as its HF_HOME environment variable, while the default image uses /data. This PR corrects the mountPath to /tmp for consistency.

Here's a breakdown of the changes:

I'll provide a more detailed review shortly.


A little haiku to brighten your day:

/tmp now takes stage,
Data's path, a thing of the past,
Smooth code flows again.

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR correctly addresses the mountPath issue with TGI DLCs. The change ensures consistency with the HF_HOME environment variable and resolves the issues encountered when using TGI DLC images. The PR description is well-written and provides sufficient context, including references to related PRs. The suggested comments enhance clarity and maintainability across the affected YAML files.

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi team,

I'm currently reviewing this pull request and will provide my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly understand the changes:

This pull request, submitted by alvarobartt, addresses an issue where the mountPath in several Kubernetes YAML files was incorrectly set to /data instead of /tmp. The issue arose when switching to a new Docker image where the HF_HOME environment variable points to /tmp.

Intent: The primary goal is to correct the mountPath to /tmp for consistency with the new Docker image and to resolve issues caused by the mismatch.

Changes:

The pull request modifies four YAML files, updating the mountPath from /data to /tmp within the volumeMounts section:

  • ai-ml/llm-multiple-gpus/falcon-40b/text-generation-inference.yaml (lines 54-55)
  • ai-ml/llm-multiple-gpus/llama2-70b/text-generation-inference.yaml (lines 59-60)
  • ai-ml/llm-multiple-gpus/llama3-70b/text-generation-inference.yaml (lines 61-62)
  • ai-ml/llm-multiple-gpus/mixtral-8x7b/text-generation-inference.yaml (lines 59-60)

Activity: The PR description mentions that this fix has already been implemented in other related PRs (links provided in the description).

This PR is a straightforward fix that ensures consistency and resolves existing issues. I'll provide a more detailed review shortly.


A tiny poem for your coding pleasure:

A path mislaid,
From /data to /tmp's shade,
Now all is made right.

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR updates the mountPath for Text Generation Inference deployments to /tmp to align with the HF_HOME environment variable in TGI DLC images. This change is necessary to resolve issues encountered when using TGI DLC images in Kubernetes deployments. The PR description provides a clear explanation of the issue, its cause, and the proposed solution. Referencing related PRs adds valuable context.

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi team,

I'm currently reviewing this pull request and will provide feedback shortly. In the meantime, here's a summary for you and other reviewers to quickly get up to speed:

This pull request, authored by alvarobartt, addresses an issue where the mountPath in several Kubernetes YAML files was incorrectly set to /data instead of /tmp. The issue stems from a discrepancy between the default TGI image and the TGI DLC (Docker container image). The DLC uses /tmp as the location for the HF_HOME environment variable, where the Hugging Face model is stored. This PR updates the mountPath to /tmp to resolve compatibility problems.

Here's a breakdown of the changes:

  • Intent: Correct the mountPath in Kubernetes YAML files to point to /tmp instead of /data for consistency with the TGI DLC's HF_HOME environment variable.
  • Changes: The mountPath in the following files has been updated from /data to /tmp:
    • ai-ml/llm-multiple-gpus/falcon-40b/text-generation-inference.yaml (lines 54-57)
    • ai-ml/llm-multiple-gpus/llama2-70b/text-generation-inference.yaml (lines 59-62)
    • ai-ml/llm-multiple-gpus/llama3-70b/text-generation-inference.yaml (lines 61-64)
    • ai-ml/llm-multiple-gpus/mixtral-8x7b/text-generation-inference.yaml (lines 59-62)
  • Activity: The PR description mentions related work in other repositories and closes issue Serve an LLM with multiple GPUs in GKE is doesn't work and fails with The node was low on resource: ephemeral-storage. #1581.

I'll provide a more detailed review shortly. In the meantime, let me know if you have any questions.


A little haiku to lighten the mood:

/tmp now takes stage,
Models load, a happy sight,
Fixes flow like streams.

Copy link

@code-review-assist code-review-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR makes a necessary change to ensure compatibility between the Kubernetes manifests and the TGI DLC image. The change is well-explained and the provided context is helpful. A minor suggestion would be to standardize the comments across all affected files for better consistency.

Copy link

@raushan2016 raushan2016 Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There still some issue in this sample. It fails with

"Traceback (most recent call last):
  File ""/opt/conda/lib/python3.11/site-packages/text_generation_server/models/flash_causal_lm.py"", line 1269, in warmup
    _, batch, _ = self.generate_token(batch)
  File ""/opt/conda/lib/python3.11/contextlib.py"", line 81, in inner
    return func(*args, **kwds)
  File ""/opt/conda/lib/python3.11/site-packages/text_generation_server/models/flash_causal_lm.py"", line 1730, in generate_token
    prefill_logprobs_tensor = torch.log_softmax(out, -1)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 514.00 MiB. GPU 0 has a total capacity of 21.96 GiB of which 393.12 MiB is free. Including non-PyTorch memory, this process has 0 bytes memory in use. Of the allocated memory 20.87 GiB is allocated by PyTorch, and 382.89 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)"

image

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I even tried with g2-standard-48 and it still keeps on crashing. Can you please run some tests to validate.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all logs feel free to request access. But the error above should be enough to find the issue and diff between TGI default and TGI DLC image, as the only difference is the image and mountpath
https://docs.google.com/spreadsheets/d/1hKZP9X2ueP-Zvnb9zIXMLk6LeGXvfr-mGR6z-NGfA3s/edit?gid=1556804789#gid=1556804789

Copy link

@raushan2016 raushan2016 Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked with a2-highgpu-2g which has 40G (A100) GPU memory. Which means there is something not right with the DLC image and breaks using LLAMA3-70B running on L4 GPU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed report @raushan2016, let me run some tests on our end to investigate and I'll ping you as soon as those are completed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants