Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mountPath to use /tmp instead of /data #1584

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,10 @@ spec:
volumeMounts:
- mountPath: /dev/shm
name: dshm
- mountPath: /data
# mountPath is set to /tmp as it's the path where the HF_HOME environment
# variable points to i.e. where the downloaded model from the Hub will be
# stored
- mountPath: /tmp
alvarobartt marked this conversation as resolved.
Show resolved Hide resolved
name: ephemeral-volume
volumes:
- name: dshm
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,10 @@ spec:
volumeMounts:
- mountPath: /dev/shm
name: dshm
- mountPath: /data
# mountPath is set to /tmp as it's the path where the HF_HOME environment
# variable points to i.e. where the downloaded model from the Hub will be
# stored
- mountPath: /tmp
name: ephemeral-volume
volumes:
- name: dshm
Expand Down
Copy link

@raushan2016 raushan2016 Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There still some issue in this sample. It fails with

"Traceback (most recent call last):
  File ""/opt/conda/lib/python3.11/site-packages/text_generation_server/models/flash_causal_lm.py"", line 1269, in warmup
    _, batch, _ = self.generate_token(batch)
  File ""/opt/conda/lib/python3.11/contextlib.py"", line 81, in inner
    return func(*args, **kwds)
  File ""/opt/conda/lib/python3.11/site-packages/text_generation_server/models/flash_causal_lm.py"", line 1730, in generate_token
    prefill_logprobs_tensor = torch.log_softmax(out, -1)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 514.00 MiB. GPU 0 has a total capacity of 21.96 GiB of which 393.12 MiB is free. Including non-PyTorch memory, this process has 0 bytes memory in use. Of the allocated memory 20.87 GiB is allocated by PyTorch, and 382.89 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)"

image

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I even tried with g2-standard-48 and it still keeps on crashing. Can you please run some tests to validate.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all logs feel free to request access. But the error above should be enough to find the issue and diff between TGI default and TGI DLC image, as the only difference is the image and mountpath
https://docs.google.com/spreadsheets/d/1hKZP9X2ueP-Zvnb9zIXMLk6LeGXvfr-mGR6z-NGfA3s/edit?gid=1556804789#gid=1556804789

Copy link

@raushan2016 raushan2016 Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked with a2-highgpu-2g which has 40G (A100) GPU memory. Which means there is something not right with the DLC image and breaks using LLAMA3-70B running on L4 GPU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed report @raushan2016, let me run some tests on our end to investigate and I'll ping you as soon as those are completed!

Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,10 @@ spec:
volumeMounts:
- mountPath: /dev/shm
name: dshm
- mountPath: /data
# mountPath is set to /tmp as it's the path where the HF_HOME environment
# variable points to i.e. where the downloaded model from the Hub will be
# stored
- mountPath: /tmp
name: ephemeral-volume
volumes:
- name: dshm
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,10 @@ spec:
volumeMounts:
- mountPath: /dev/shm
name: dshm
- mountPath: /data
# mountPath is set to /tmp as it's the path where the HF_HOME environment
# variable points to i.e. where the downloaded model from the Hub will be
# stored
- mountPath: /tmp
name: ephemeral-volume
volumes:
- name: dshm
Expand Down