-
Notifications
You must be signed in to change notification settings - Fork 0
Sync meeting 2023 09 25 with CernVM FS developers on Best Practices for CernVM FS on HPC tutorial
Kenneth Hoste edited this page Dec 8, 2023
·
3 revisions
- https://github.com/multixscale/cvmfs-tutorial-hpc-best-practices
- online tutorial, focused on (Euro)HPC system administrators
- aiming for Fall 2023 (Sept-Oct-Nov)
- collaboration between MultiXscale/EESSI partners and CernVM-FS developers
- tutorial + improvements to CernVM-FS docs
- similar approach to introductory tutorial by Kenneth & Bob in 2021, see https://cvmfs-contrib.github.io/cvmfs-tutorial-2021/
- format: tutorial website (+ CVMFS docs) + accompanying slide deck
Attending:
- CernVM-FS: Laura, Valentin
- EESSI/MultiXscale: Lara, Alan, Bob, Kenneth
- GitHub repository @ https://github.com/multixscale/cvmfs-tutorial-hpc-best-practices
- Progress on tutorial contents
- Introduction to CVMFS split up into subsections (see PR #14)
- suggestions by Jakob/Lara largely taken into account
- TODO:
- revisit subsection on caching, needs more structure
- flesh out "Terminology" + "Example repositories" subsections
- visuals?
- should also mention systems where CernVM-FS is used (like ComputeCanada, NERSC, EuroHPC Vega)
- EESSI section: PR #12
- (still) to be reviewed/merged [Kenneth]
- "Accessing a repository" section [Bob?]
- Installing + configuring CernVM-FS client
- squid proxy (offline nodes + LRU cache in network)
- Performance aspects
- focused call on benchmarking with Alan, Laura, Kenneth
- Introduction to CVMFS split up into subsections (see PR #14)
- date for online tutorial?
- Mon 4 Dec 2023 (13:30-17:00 CET with 30min break)
- still realistic?
- promote via CASTIEL2?
- should be announced at least 1 month up front => end of Oct'23
- announce via MultiXscale website + EuroHPC portal (https://hpc-portal.eu)
- registration via UGent event manager?
- Mon 4 Dec 2023 (13:30-17:00 CET with 30min break)
- next sync meetings
- Thu 28 Sept'23 16:00 CEST on performance aspects (Alan, Lara, Kenneth)
- Mon 23 Oct'23 14:00 CEST: go/no-go for tutorial on Mon 4 Dec'23
- scenarios for performance benchmarks
- performance data to collect
- #files + data volume + bandwidth
- timing for the command
- software
- GROMACS (# files?)
gmx --version
- TensorFlow (~20sec startup?!)
python -c 'import tensorflow'
- diff for x86_64 vs aarch64? (~1GB on aarch64, ~2GB on x86_64)
- Python script with lots of imports as "extreme" example?
- tensorflow, pandas, scipy, numpy, h5py, ...
- GROMACS (# files?)
- CVMFS scenarios
- only client cache (single node)
- cold client cache (out of network proxy + Stratum-1)
- hot client cache
- on local disk vs in-memory client cache?
- squid proxy (to deal with 2-node scenario)
- see https://cvmfs-contrib.github.io/cvmfs-tutorial-2021/03_stratum1_proxies/#32-setting-up-a-proxy
- cold squid, cold client caches
- hot squid, cold client caches
- hot client caches
- private Stratum-1 (2-node scenario)
- cold squid, cold client caches
- only client cache (single node)
- comparison with common practice
- compare with timings using software stack on local disk (like ext4) + GPFS/Lustre (@ HPC-UGent) at the end
- other
- loopback cache (offline nodes)
- alien cache
- impact of EESSI compat layer?
- vs software installed on top of bare OS
- try these experiments both with and without EESSI?
- can we figure out how many files were pulled in for compat layer?
sudo cvmfs_talk -i pilot.eessi-hpc.org cache list
- performance data to collect
Attending:
- CernVM-FS: Laura
- EESSI/MultiXscale: Alan, Lara, Kenneth
- Laura's PR https://github.com/cvmfs/cvmfs/pull/3372
- use cases
- GROMACS binary
- TensorFlow import
- (ROOT)
- hot vs warm vs cold cache
- Laura's benchmark script does each run 20 times
- warm cache means kernel cache is cleared between runs
- hot cache is without clearing kernel cache
- scenarios
- private Stratum-1 with and without proxy
- also test with repo not mounted yet (couple of seconds to let autofs kick in)
- GeoAPI impact
- see Alan's script on comparing Stratum-1's @ https://github.com/EESSI/eessi-demo/pull/24
- see also https://cvmfs.readthedocs.io/en/stable/cpt-telemetry.html