Current reposiory contains pre-built docker images to run in any cloud / on-premise platform. This is the recommended way to run in in destineed containers, as they are compatible and tested in GPU and CPU setups, and they are a basis for containerized distributed scheme.
In the given file you will find installation instructions to run in Amazon SageMaker, but they are applicable to according EC2 instances.
- Familiriality with AWS cloud is assumed.
- Root or IAM account is configured.
- ***Disclaimer: AWS is a paid service, and any computations imply costs.
- Navigate to the console for your selected region.
- Create or run your SageMaker instance
- Open jupyter lab
- Upload your files.
- Click Terminal among available options. Validate
nvidia-smi
to make sure that drivers are successfully installed. - Run docker compose build && docker compose up -d according to instructions.
Alternatively one may try to setup the appropriate image to EC2 together with drivers, and install application as per guide.
This instruction helps to set up Pytorch with CUDA on an EC2 instance with plain, Ubuntu AMI.
- Verify the instance has the CUDA-capable GPU
lspci | grep -i nvidia
- Install kernel headers and development packages
sudo apt-get install linux-headers-$(uname -r)
- Download a CUDA keyring for your distribution $distro and architecture $arch
wget https://developer.download.nvidia.com/compute/cuda/repos/$distro/$arch/cuda-keyring_1.1-1_all.deb
i.e. for Ubuntu 22.04 with x86_64 the command would look as follows:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
- Add the downloaded keyring package
sudo dpkg -i cuda-keyring_1.1-1_all.deb
- Update the APT repository cache
sudo apt-get update
- Install the drivers
sudo apt-get -y install cuda-drivers
- Reboot the instance
sudo reboot
- Verify the installation
nvidia-smi
It is important to keep in mind CUDA Version is displayed in the upper-right corner, as PyTorch needs to be compatible with it.
NOTE: At this stage NVIDIA recommends following Post-installation actions. I didn't and it worked but some unexpected errors might occur.
I used conda but pip+venv should also work
- Install conda
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
- Initialize conda
~/miniconda3/bin/conda init bash
- Reload bash
source ~/.bashrc
- Create a new conda environment
conda create -n env
- Activate the newly created environment
conda activate env