Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix broken cuda and rocm images #263

Merged
merged 25 commits into from
Sep 20, 2024
Merged

fix broken cuda and rocm images #263

merged 25 commits into from
Sep 20, 2024

Conversation

baptistecolle
Copy link
Collaborator

@baptistecolle baptistecolle commented Sep 17, 2024

AutoAWQ change its setup.py, this broke the building of our docker image

@baptistecolle baptistecolle marked this pull request as ready for review September 17, 2024 14:34
@IlyasMoutawwakil
Copy link
Member

IlyasMoutawwakil commented Sep 17, 2024

can you fix the same TORCH_VERSION issue in rocm image as well (we build the same packages there)

@baptistecolle
Copy link
Collaborator Author

done, rocm is fixed

@IlyasMoutawwakil
Copy link
Member

IlyasMoutawwakil commented Sep 18, 2024

I'm not sure about that (see failing PyTorch tests), I think https://github.com/casper-hansen/AutoAWQ/blob/main/setup.py#L10 is triggering the installation of ipex https://github.com/huggingface/optimum-benchmark/actions/runs/10908257743/job/30273811027?pr=263#step:5:129

@baptistecolle baptistecolle marked this pull request as draft September 18, 2024 06:56
@baptistecolle baptistecolle changed the title fix broken cuda image fix broken docker images Sep 19, 2024
@baptistecolle baptistecolle changed the title fix broken docker images fix broken cuda and rocm images Sep 19, 2024
@baptistecolle baptistecolle marked this pull request as ready for review September 20, 2024 06:42
@baptistecolle
Copy link
Collaborator Author

baptistecolle commented Sep 20, 2024

Now, everything concerning the broken Docker image due to an update with AutoAWQ should have been fixed.

There are still some failing tests in our pipeline at of today:
(because of #265 - issue with the codecarbon lock)

  • API CPU Tests / run_api_cpu_tests (pull_request)
  • API CUDA Tests / run_api_cuda_tests (pull_request)
    (issue with the use of the deprecated method - is_torch_tpu_available Fix is_torch_tpu_available in ORT Trainer optimum#2028, so we need to wait for next release of optimum for the fix)
  • CLI CUDA Torch-ORT Multi-GPU Tests / run_cli_cuda_torch_ort_multi_gpu_tests
  • CLI CUDA Torch-ORT Single-GPU Tests / run_cli_cuda_torch_ort_single_gpu_tests

@baptistecolle baptistecolle marked this pull request as draft September 20, 2024 08:05
baptistecolle and others added 3 commits September 20, 2024 10:31
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
@IlyasMoutawwakil IlyasMoutawwakil marked this pull request as ready for review September 20, 2024 09:52
@IlyasMoutawwakil IlyasMoutawwakil merged commit 39ca491 into main Sep 20, 2024
52 of 60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants