-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ImportError: /opt/conda/lib/python3.8/site-packages/transformer_engine_extensions.cpython-38-x86_64-linux-gnu.so: undefined symbol #30
Comments
Hello! According to your error message, it seems that my docker is a bit out of date considering the CUDA:
I'll try to update the |
Hi, I updated the |
I run the "docker build -t $TAG_NAME ."again, but report errors, do I have to delete de docker file and rebuild? |
what are the errors? And yes, you may need to delete the original |
I rebuild again,but the errorS seem to be still there. $ docker run --gpus all --rm -p 8123:8123 cfcreator:latest =============
|
ok, i caught another potential mistake:
which means you restrict the memory of your docker to be 64MB, which may cause error. You can try the command it suggests you, or manually link your server's |
Seems like another error? $docker run --gpus 0 --rm -p 8123:8123 $TAG_NAME:latest --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 =============
|
Maybe you need to put |
Yes! You are right. But the ImportError seems also exist. $ docker run --gpus 0 --rm -p 8123:8123 --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 $TAG_NAME:latest =============
|
Hmmm, here's my another guess: maybe the CUDA driver on your physics server (i.e., 510.108.03) or something like that is too low for the latest PyTorch. To verify it: can you run other pytorch2.0 projects on your server? |
you may want to retry with the latest PyTorch NGC container: nvcr.io/nvidia/pytorch:24.07-py3 |
Hi, after installing carefree-creator in docker, it reports an error when running, how to solve it?
WARNING: CUDA Minor Version Compatibility mode ENABLED.
Using driver version 510.108.03 which has support for CUDA 11.6. This container
was built with CUDA 11.8 and will be run in Minor Version Compatibility mode.
CUDA Forward Compatibility is preferred over Minor Version Compatibility for use
with this container but was unavailable:
[[Forward compatibility was attempted on non supported HW (CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE) cuInit()=804]]
See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.
NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be
insufficient for PyTorch. NVIDIA recommends the use of the following flags:
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...
Traceback (most recent call last):
File "/opt/conda/bin/cfcreator", line 5, in
from cfcreator.cli import main
File "/opt/conda/lib/python3.8/site-packages/cfcreator/init.py", line 2, in
from .common import *
File "/opt/conda/lib/python3.8/site-packages/cfcreator/common.py", line 22, in
from cflearn.zoo import DLZoo
File "/opt/conda/lib/python3.8/site-packages/cflearn/init.py", line 3, in
from .schema import *
File "/opt/conda/lib/python3.8/site-packages/cflearn/schema.py", line 27, in
from accelerate import Accelerator
File "/opt/conda/lib/python3.8/site-packages/accelerate/init.py", line 3, in
from .accelerator import Accelerator
File "/opt/conda/lib/python3.8/site-packages/accelerate/accelerator.py", line 34, in
from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
File "/opt/conda/lib/python3.8/site-packages/accelerate/checkpointing.py", line 24, in
from .utils import (
File "/opt/conda/lib/python3.8/site-packages/accelerate/utils/init.py", line 112, in
from .launch import (
File "/opt/conda/lib/python3.8/site-packages/accelerate/utils/launch.py", line 27, in
from ..utils.other import merge_dicts
File "/opt/conda/lib/python3.8/site-packages/accelerate/utils/other.py", line 24, in
from .transformer_engine import convert_model
File "/opt/conda/lib/python3.8/site-packages/accelerate/utils/transformer_engine.py", line 21, in
import transformer_engine.pytorch as te
File "/opt/conda/lib/python3.8/site-packages/transformer_engine/init.py", line 7, in
from . import pytorch
File "/opt/conda/lib/python3.8/site-packages/transformer_engine/pytorch/init.py", line 6, in
from .module import LayerNormLinear
File "/opt/conda/lib/python3.8/site-packages/transformer_engine/pytorch/module.py", line 16, in
import transformer_engine_extensions as tex
ImportError: /opt/conda/lib/python3.8/site-packages/transformer_engine_extensions.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN3c106SymInt8toSymIntENS_13intrusive_ptrINS_14SymIntNodeImplENS_6detail34intrusive_target_default_null_typeIS2_EEEE
The text was updated successfully, but these errors were encountered: