Copper is a read-only cooperative caching layer aimed to enable scalable data loading on massive amounts of compute nodes. This aims to avoid the I/O bottleneck in the storage network and effectively use the compute network for data movement.
The current intended use of copper is to improve the performance of python imports - dynamic shared library loading on Aurora. However, copper can used to improve the performance of any type of redundant data loading on a supercomputer.
More documentation can be found here: readthedocs
- init
- open
- read
- readdir
- readlink
- getattr
- ioctl
- destroy
module load copper
CUPATH=$COPPER_ROOT/bin/cu_fuse # If you are building copper on your own, set this path to your cu_fuse binary
LOGDIR=~/copper-logs/${PBS_JOBID}
CU_FUSE_MNT_VIEWDIR=/tmp/${USER}/copper
mkdir -p ${LOGDIR}
clush --hostfile ${PBS_NODEFILE} "mkdir -p ${CU_FUSE_MNT_VIEWDIR}"
read -r -d '' CMD << EOM
numactl --physcpubind="0-3"
$CUPATH/cu_fuse
-tpath / # / will be mounted under CU_FUSE_MNT_VIEWDIR
-vpath ${CU_FUSE_MNT_VIEWDIR} # To provide the fuse mounted location
-log_output_dir ${LOGDIR} # To provide where the copper logs will be stored
-log_level 6 # To provide the level of copper logging 6 more 0 less
-log_type file # To direct logging to file / stdout / both
-net_type cxi # To provide the network protocol
-nf ${PBS_NODEFILE} # To provide the hostlist where cu_fuse will be mounted
-trees 1 # To provide the number of trees to form in the overlay network
-max_cacheable_byte_size $((10*1024*1024)) # To provide the size of access that goes through copper
-s ${CU_FUSE_MNT_VIEWDIR} # To start fuse in single threaded mode.
EOM
clush --hostfile ${PBS_NODEFILE} $CMD # To start copper on all the compute nodes
# instead of clush you can also use the following to start copper as a background process on all compute nodes
# WCOLL=$PBS_NODEFILE pdsh $CMD
# mpirun --np ${NRANKS} --ppn ${RANKS_PER_NODE} --cpu-bind=list:0-3 sh ./scripts/filesystem/mnt_fs.sh &
time mpirun --np ${NRANKS} --ppn ${RANKS_PER_NODE} --cpu-bind=${CPU_BINDING} --genvall \
--genv=PYTHONPATH=${CU_FUSE_MNT_VIEWDIR}/lus/flare/projects/Aurora_deployment/kaushik/copper/july12/copper/run/copper_conda_env \
python3 -c "import torch; print(torch.__file__)"
clush --hostfile ${PBS_NODEFILE} "fusermount3 -u ${CU_FUSE_MNT_VIEWDIR}"
clush --hostfile ${PBS_NODEFILE} "rm -rf ${CU_FUSE_MNT_VIEWDIR}"
clush --hostfile ${PBS_NODEFILE} "pkill -9 cu_fuse"
In order to build we recommend installing mochi dependencies using spack and their environment feature. You can find the instructions to install spack here. The required mochi services are margo, mercury, thallium, and cereal. The instructions to install the listed mochi services can be found here.
Assuming you have a mochi environment setup correctly you should now be able to build by running the following commands.
git clone https://github.com/spack/spack.git
. copper/gitrepos/git-spack-repo/spack/share/spack/setup-env.sh
git clone https://github.com/mochi-hpc-experiments/platform-configurations.git
cd copper/gitrepos/git-mochi-repos/platform-configurations/ANL/Aurora
[compare with copper/scripts/build_helper/aurora_spack.yaml]
git clone https://github.com/mochi-hpc/mochi-spack-packages.git
spack repo add copper/gitrepos/git-mochi-repos/mochi-spack-packages
module load cmake # on aurora
spack env create kaushik_env_1 spack.yaml
spack env activate kaushik_env_1
spack add mochi-margo
spack install
# incase of any issue with the spack environment, completely delete spack and start again
spack env remove kaushik_env_1
# from the next time onwards you only need
. copper/gitrepos/git-spack-repo/spack/share/spack/setup-env.sh
spack env activate kaushik_env_1
git clone https://github.com/argonne-lcf/copper
cd copper/scripts/build_helper/
cp default_env.sh env.sh
Set the following variables in the copied `env.sh`
export VIEW_DIR=<PATH_TO_MOUNT_DIRECTORY>
export FUSE3_LIB=<PATH_TO_FUSE_LIBRARY>
export FUSE3_INCLUDE=<PATH_TO_FUSE_HEADER>
export PY_PACKAGES_DIR=<PATH_TO_PY_ENV>
spack env activate <MOCHI_ENV>
sh copper/scripts/build_helper/build.sh