AWS installation

Current reposiory contains pre-built docker images to run in any cloud / on-premise platform. This is the recommended way to run in in destineed containers, as they are compatible and tested in GPU and CPU setups, and they are a basis for containerized distributed scheme.

In the given file you will find installation instructions to run in Amazon SageMaker, but they are applicable to according EC2 instances.

Pre-requisites

Familiriality with AWS cloud is assumed.
Root or IAM account is configured.
***Disclaimer: AWS is a paid service, and any computations imply costs.
Navigate to the console for your selected region.
Create or run your SageMaker instance
Open jupyter lab
Upload your files.
Click Terminal among available options. Validate nvidia-smi to make sure that drivers are successfully installed.
Run docker compose build && docker compose up -d according to instructions.

Alternatively one may try to setup the appropriate image to EC2 together with drivers, and install application as per guide.

CUDA ON EC2 FROM SCRATCH

This instruction helps to set up Pytorch with CUDA on an EC2 instance with plain, Ubuntu AMI.

Pre-installation actions

Verify the instance has the CUDA-capable GPU

lspci | grep -i nvidia

Install kernel headers and development packages

sudo apt-get install linux-headers-$(uname -r)

NVIDIA drivers installation

Download a CUDA keyring for your distribution $distro and architecture $arch

wget https://developer.download.nvidia.com/compute/cuda/repos/$distro/$arch/cuda-keyring_1.1-1_all.deb

i.e. for Ubuntu 22.04 with x86_64 the command would look as follows:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb

Add the downloaded keyring package

sudo dpkg -i cuda-keyring_1.1-1_all.deb

Update the APT repository cache

sudo apt-get update

Install the drivers

sudo apt-get -y install cuda-drivers

Reboot the instance

sudo reboot

Verify the installation

nvidia-smi

It is important to keep in mind CUDA Version is displayed in the upper-right corner, as PyTorch needs to be compatible with it.

NOTE: At this stage NVIDIA recommends following Post-installation actions. I didn't and it worked but some unexpected errors might occur.

PyTorch installation

Install package manager

I used conda but pip+venv should also work

Install conda

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh

Initialize conda

~/miniconda3/bin/conda init bash

Reload bash

source ~/.bashrc

Create a new conda environment

conda create -n env

Activate the newly created environment

conda activate env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aws-setup.md

aws-setup.md

AWS installation

Pre-requisites

CUDA ON EC2 FROM SCRATCH

Pre-installation actions

NVIDIA drivers installation

PyTorch installation

Install package manager

Files

aws-setup.md

Latest commit

History

aws-setup.md

File metadata and controls

AWS installation

Pre-requisites

CUDA ON EC2 FROM SCRATCH

Pre-installation actions

NVIDIA drivers installation

PyTorch installation

Install package manager