Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete implicit in dockerfile #1288

Merged
merged 5 commits into from
Aug 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 7 additions & 10 deletions Dockerfile.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -107,15 +107,6 @@ RUN pip uninstall -y pyarrow && \
/tmp/clean-layer.sh
{{ end }}

# Install implicit
{{ if eq .Accelerator "gpu" }}
RUN mamba install -y implicit implicit-proc=*=gpu && \
/tmp/clean-layer.sh
{{ else }}
RUN mamba install -y implicit && \
/tmp/clean-layer.sh
{{ end}}

# Install PyTorch
{{ if eq .Accelerator "gpu" }}
COPY --from=torch_whl /tmp/whl/*.whl /tmp/torch/
Expand Down Expand Up @@ -172,7 +163,9 @@ RUN pip install spacy && \
{{ if eq .Accelerator "gpu" }}
# Install GPU-only packages
# No specific package for nnabla-ext-cuda 11.x minor versions.
RUN pip install pycuda \
RUN export PATH=/usr/local/cuda/bin:$PATH && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djherbis @psbang Any reasons why you updated the PATH and CUDA_ROOT? Also, a comment would be helpful to explain to future readers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read it as a solution in a stackoverflow somewhere, I'll make sure to add the links in next time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A solution for what? You are removing the package in the end no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting "Failed building wheel for pycude" errors, I found a solution here: https://forums.developer.nvidia.com/t/cant-install-pycuda/238230

export CUDA_ROOT=/usr/local/cuda && \
pip install pycuda \
pynvrtc \
pynvml && \
/tmp/clean-layer.sh
Expand Down Expand Up @@ -646,7 +639,11 @@ RUN sed -i '/from tensorflow_hub import uncompressed_module_resolver/a from tens
# python -m nb_conda_kernels.install --disable

# Force only one libcusolver
{{ if eq .Accelerator "gpu" }}
RUN rm /opt/conda/bin/../lib/libcusolver.so.11 && ln -s /usr/local/cuda/lib64/libcusolver.so.11 /opt/conda/bin/../lib/libcusolver.so.11
{{ else }}
RUN ln -s /usr/local/cuda/lib64/libcusolver.so.11 /opt/conda/bin/../lib/libcusolver.so.11
Copy link
Contributor

@rosbo rosbo Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@psbang Same here, I am a bit surprised to see a cuda related command for non-gpu image build... A comment would be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this one, the build was failing for CPU because the directory it was trying to delete did not exist, but for some reason, that directory did exist on the GPU build, so this was the only way I was able to make it build successfully.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you remember what process was trying to delete this directory? The underlying issue is likely that we were running a command that shouldn't have been run in the CPU and was missing the {{ if eq .Accelerator "gpu" }} guard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 643 in the updated file, I didn't change that, I just added the if-else statements around it.

{{ end }}

# b/270147159 conda ships with a version of libtinfo which is missing version info causing warnings, replace it with a good version.
RUN rm /opt/conda/lib/libtinfo.so.6 && ln -s /usr/lib/x86_64-linux-gnu/libtinfo.so.6 /opt/conda/lib/libtinfo.so.6
Expand Down
38 changes: 0 additions & 38 deletions tests/test_implicit.py

This file was deleted.