Sync meeting 2023 09 25 with CernVM FS developers on Best Practices for CernVM FS on HPC tutorial

Best Practices for CernVM-FS in HPC

https://github.com/multixscale/cvmfs-tutorial-hpc-best-practices
online tutorial, focused on (Euro)HPC system administrators
aiming for Fall 2023 (Sept-Oct-Nov)
collaboration between MultiXscale/EESSI partners and CernVM-FS developers
tutorial + improvements to CernVM-FS docs
similar approach to introductory tutorial by Kenneth & Bob in 2021, see https://cvmfs-contrib.github.io/cvmfs-tutorial-2021/
format: tutorial website (+ CVMFS docs) + accompanying slide deck

Sync meeting (2023-09-25)

Attending:

CernVM-FS: Laura, Valentin
EESSI/MultiXscale: Lara, Alan, Bob, Kenneth

Notes

GitHub repository @ https://github.com/multixscale/cvmfs-tutorial-hpc-best-practices
Progress on tutorial contents
- Introduction to CVMFS split up into subsections (see PR #14)
  - suggestions by Jakob/Lara largely taken into account
  - TODO:
    - revisit subsection on caching, needs more structure
    - flesh out "Terminology" + "Example repositories" subsections
    - visuals?
  - should also mention systems where CernVM-FS is used (like ComputeCanada, NERSC, EuroHPC Vega)
- EESSI section: PR #12
  - (still) to be reviewed/merged [Kenneth]
- "Accessing a repository" section [Bob?]
  - Installing + configuring CernVM-FS client
  - squid proxy (offline nodes + LRU cache in network)
- Performance aspects
  - focused call on benchmarking with Alan, Laura, Kenneth
date for online tutorial?
- Mon 4 Dec 2023 (13:30-17:00 CET with 30min break)
  - still realistic?
  - promote via CASTIEL2?
  - should be announced at least 1 month up front => end of Oct'23
  - announce via MultiXscale website + EuroHPC portal (https://hpc-portal.eu)
  - registration via UGent event manager?
next sync meetings
- Thu 28 Sept'23 16:00 CEST on performance aspects (Alan, Lara, Kenneth)
- Mon 23 Oct'23 14:00 CEST: go/no-go for tutorial on Mon 4 Dec'23
scenarios for performance benchmarks
- performance data to collect
  - #files + data volume + bandwidth
  - timing for the command
- software
  - GROMACS (# files?)
    - gmx --version
  - TensorFlow (~20sec startup?!)
    - python -c 'import tensorflow'
    - diff for x86_64 vs aarch64? (~1GB on aarch64, ~2GB on x86_64)
  - Python script with lots of imports as "extreme" example?
    - tensorflow, pandas, scipy, numpy, h5py, ...
- CVMFS scenarios
  - only client cache (single node)
    - cold client cache (out of network proxy + Stratum-1)
    - hot client cache
    - on local disk vs in-memory client cache?
  - squid proxy (to deal with 2-node scenario)
    - see https://cvmfs-contrib.github.io/cvmfs-tutorial-2021/03_stratum1_proxies/#32-setting-up-a-proxy
    - cold squid, cold client caches
    - hot squid, cold client caches
    - hot client caches
  - private Stratum-1 (2-node scenario)
    - cold squid, cold client caches
- comparison with common practice
  - compare with timings using software stack on local disk (like ext4) + GPFS/Lustre (@ HPC-UGent) at the end
- other
  - loopback cache (offline nodes)
  - alien cache
- impact of EESSI compat layer?
  - vs software installed on top of bare OS
  - try these experiments both with and without EESSI?
  - can we figure out how many files were pulled in for compat layer?
    - sudo cvmfs_talk -i pilot.eessi-hpc.org cache list

Sync meeting (2023-09-28)

Attending:

CernVM-FS: Laura
EESSI/MultiXscale: Alan, Lara, Kenneth

Notes

Laura's PR https://github.com/cvmfs/cvmfs/pull/3372
use cases
- GROMACS binary
- TensorFlow import
- (ROOT)
hot vs warm vs cold cache
- Laura's benchmark script does each run 20 times
- warm cache means kernel cache is cleared between runs
- hot cache is without clearing kernel cache
scenarios
- private Stratum-1 with and without proxy
- also test with repo not mounted yet (couple of seconds to let autofs kick in)
- GeoAPI impact
see Alan's script on comparing Stratum-1's @ https://github.com/EESSI/eessi-demo/pull/24
see also https://cvmfs.readthedocs.io/en/stable/cpt-telemetry.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync meeting 2023 09 25 with CernVM FS developers on Best Practices for CernVM FS on HPC tutorial

Best Practices for CernVM-FS in HPC

Sync meeting (2023-09-25)

Notes

Sync meeting (2023-09-28)

Notes

Clone this wiki locally