-
Notifications
You must be signed in to change notification settings - Fork 144
Conference call notes 20210414
ocaisa edited this page Apr 14, 2021
·
12 revisions
(back to Conference calls)
Notes on the 170th EasyBuild conference call, Wednesday April 14th 2021 (08:00 UTC)
Alphabetical list of attendees (9):
- Mikael Öhman (Chalmers University of Technology, Sweden)
- Alan O'Cais (Jülich Supercomputing Centre, Germany)
- Sebastian Achilles (Jülich Supercomputing Centre, Germany)
- Jörg Saßmannshausen (NIHR Biomedical Research Centre, UK)
- Terje Kvernes (University of Oslo, Norway)
- Kurt Lust (Univ. of Antwerp, Belgium + LUMI User Support Team)
- Robert Mijakovic (LuxProvide)
- Alex Domingo (Vrije Universiteit Brussel, Belgium)
- Alexander Grund (TU Dresden, Germany)
- update on recent developments
- 2021a update of common toolchains
- outlook to component versions
- BLAS/LAPACK component: OpenBLAS vs BLIS, maybe FlexiBLAS?
- collapsing foss and fosscuda toolchains
- bintray
- Q&A
- next release: a month or so since we just released last week
- project for next release: (not created yet)
- to maintainers: add issues/PRs you consider important there!
- project for next release: (not created yet)
- recent changes
-
framework
-
bug fixes
- Catch problems early on if
--github-user
is not specified for--new-pr
& co (PR #3644)
- Catch problems early on if
-
enhancements
- Avoid module call for
unuse()
for Lmod and set$MODULEPATH
directly (PR #3633)- Setting MODULEPATH directly is a lot faster
- When using priority is not exactly the same
- In another PR have removed the use of priority to further enable this where possible
- Do the same for
module unuse
- update
validate_github_token
function to accept GitHub token in new format (PR #3632) - mention easyblocks PR in gist when uploading test report for it + fix
clean_gists.py
script (PR #3622) - add templates for architecture independent Python wheels (PR #3618)
- Avoid module call for
- changes
-
bug fixes
-
easyblocks
-
bug fixes
- fix permission on MATLAB installer config file so it can be written to (PR #2385)
- Improve Python package version check and add unversioned_packages EC param (PR #2377)
- Make the CUDA stub libs take preference over system libs when linking (PR #2373)
- If you install CUDA 11 on a system with only CUDA 10 drivers, using stub libraries allows you to link for use on another system
- Problem was link order meant that priority was given to system paths which lead to missing symbols
- also set
$TORCH_CUDA_ARCH_LIST
for PyTorch tests (PR #2363)
- enhancements
-
changes
- (nothing major)
-
bug fixes
-
easyconfigs
- over 50 merged easyconfig PRs since last conf call
-
bug fixes
- (nothing major)
-
enhancements
- (nothing major)
-
new software
-
goblf/2020b
(PR #12381)
-
-
noteworthy software updates
- GCCcore 10.3.0
- Will things break?...have to wait and see
- PRs in preparation
- the idea of automating version updates was raised
- this has come up many times in the past but no-one has really tried to tackle it
- Started work on OpenMPI 4.1.1 (probably
foss/2021a
), waiting on release
- GCCcore 10.3.0
-
noteworthy changes
- (none)
-
framework
- to merge/fix/tackle soon
-
framework
-
bug fixes
- performance improvements for easyconfig parsing (PR #3555)
- Re-enable write permissions when installing with read-only-installdir (PR #3649)
- Only problem would be a group that are responsible for installations of things like Python/R, they would lose the ability to install additional modules
- JSC had a case where a software package actually silently used
pip
to install additional packages, which caused the pip check to fail for any other package. This would have prevented that.
- enhancements
-
changes
- (nothing major)
-
bug fixes
-
easyblocks
-
bug fixes
- treat files/directories of unpacked sources equally in
PackedBinary
(PR #2306)
- treat files/directories of unpacked sources equally in
-
enhancements
- enhance CUDA support in CP2K easyblock (WIP) (PR #2349)
- this could use a review
- currently requires a single value in
--cuda-compute-capabilities
EasyBuild configuration option orcuda_compute_capabilities
easyconfig parameter - do we need custom easyconfig parameters to easily enable/disable the different GPU capabilities supported by CP2K?
- add Java wrapper support to OpenMPI (PR #2360)
- still missing a matching easyconfig PR that leverages this?
- enable installation of samples for CUDA > 10.1 (PR #2374)
- enhance CUDA support in CP2K easyblock (WIP) (PR #2349)
-
changes
- (nothing major)
-
new software
-
-
easyconfigs
-
bug fixes
- (nothing major)
-
enhancements
- (nothing major)
- new software
-
software updates
- PyTorch 1.8.0 (PR #12347)
- PyTorch only test against MKL which is probably connected to failures in their testsuite
- should probably be bumped to 1.8.1
- PyTorch 1.8.0 (PR #12347)
-
bug fixes
-
framework
- outlook to component versions
- GCC 10.3 (ready to go?)
- Merged
- OpenMPI 4.1.1 (out soon)
- Waiting for the release, but in progress
- Intel oneAPI versions of compilers, MPI, MKL?
- Still need to look at this, 2021.2 is out
- What is the GCC compatability?
- Python version?
- 3.9 has issues (latest is
3.9.4
), should we stick with 3.8? Might be worth some more experiments? - Problems will only really show up with complex builds like TensorFlow or PyTorch
- Python 3.9 is marked for support in TF 2.5 (currently a release candidate)
- Looks like support may be merged in PyTorch (in next release)
- Next release of v2 and v1 will probably support this
- 3.9 has issues (latest is
- GCC 10.3 (ready to go?)
- BLAS/LAPACK component: OpenBLAS vs BLIS, maybe FlexiBLAS?
- It's complicated, no clear answer
- What is best depends on arch
- Intel is MKL
- AMD best with BLIS for BLAS 3, for LAPACK not so clear (especially with threads)
- Idea would be to use FlexiBLAS to choose best bases
- Not sure if this is realistic, it would also be moving target
- Maybe you could write an auto-tuner
- There are a few variables: size of matrix, and number of cores
- Would need hooks to choose per function
- Could lead to toolchains that look the same but are not the same under the hood
- Could lead to bugs that are hard to resolve
- If we want to give users the power to change libraries themselves, we need to teach them how to do that
- We could go conservative and ship a static configuration and document how that can be tuned by the site
- Idea would be to use FlexiBLAS to choose best bases
- collapsing
foss
andfosscuda
toolchains- see https://github.com/easybuilders/easybuild-easyconfigs/issues/12484
- Will need to enhance MPI library, can't change UCX
-
OMPI_MCA_mca_component_path
approach looks promising, maybe we should check with Jeff Squyres- Build UCX as normal, also
UCX+CUDA
(different name so we can have both at same time) - Need 2 OpenMPI builds, build additonal MCA components and set envvar to point to them in second OpenMPI build
- Second build only enriches first installation
- For a module hierarchy, the second build would sit in the tree of the first
- Build UCX as normal, also
- Don't want to force CUDA on people
- Keeping CUDA as a versionsuffix would allow us to bump a version of the software to an updated CUDA easily
- see https://github.com/easybuilders/easybuild-easyconfigs/issues/12484
- HMNS should be updated to be aware of
intel-compilers
component- Pretty trivial, we should do this, need to open an issue
- If people have stuff that must go in, add it to the project page or ask a maintainer to: Project - 2021a common toolchains
- Tracking Issue #12099
- Boost is the biggest one
- They give some examples of what you need to do
- Maybe we should be more aggressive in making copies of sources and making them available
- If we know the licences we can automate this
- Number of licences recognised by the EB parameter is very limited and there is no way to introduce a new one
- What we have could be improved here
- What happens if a licence changes?
- Could have a CI job to check sources for us
- Perhaps more suited to a regression test for releases
- Might need some work to only
- Perhaps more suited to a regression test for releases
- Should we have an easier way of making a ticket if a download fails?
- Could give some advice on the command line on how to trigger this
- Could we have a webhook?
- Could have a callback, but that would need to be toggled for privacy
- Perl easyconfigs install the same package multiple times (PR #12575), PR opened to behave similar to R packages. Should cut down on installation times.
- Feature request open to set group permissions also on build and temporary directories as well
- For disaster recovery, can there be an automatic way to create an easystack file
- This is hard, as you need the easyblock, the hooks, the EB version,...
- The
reprod
directory of each install does store this information but there is no automated way of picking this information up and using it - Need to design your infra to make this possible
- Running EB unit tests using Lmod 6 (available from Ubuntu/Debian) fails because Lmod 6 is deprecated
- We should update the docs to reflect this
- Some test failures can happen due to the language environment
- If this is not in the test suite, we should fix that
- AOCC needs to specify Clang version
- Why is this necessary?
- Detection mechanism does not work for 3.0 (on CentOS and EB 4.3.4)
- There is a mapping in the easyblock