-
Notifications
You must be signed in to change notification settings - Fork 0
Sync meeting 2024 09 10
Kenneth Hoste edited this page Sep 10, 2024
·
1 revision
- Monthly, every 2nd Tuesday of the month at 10:00 CE(S)T
- Notes of previous meetings at https://github.com/multixscale/meetings/wiki
- Tue 8 Oct 2024 10:00 CEST
- Tue 12 Nov 2024 10:00 CET (prep for special project review?)
- Tue 10 Dec 2024 10:00 CET (post-mortem of special project review?)
attending:
- Neja (NIC)
- Alan (UB)
- Kenneth (UGent)
- ... (SURF)
- Thomas & Richard (UiB)
- Bob, Pedro (RUG)
- Julián (BSC)
- Helena, Susana (HPCNow!)
- notes of previous meeting in 13 August?
- shared drive moved to https://ubarcelona-my.sharepoint.com (same password)
- 2024Q3 quarterly report
- due by next sync meeting (early Oct)
- reported risk of spending more PMs than planned in 2024Q2
- not a real issue, we're catching up from underspending in 2023
- Milestone 3 (M18 - June 2024, lead: UStuttgart)
- Milestone name: "First portable test run on two systems with different architectures (e.g. with and without accelerators)"
- Means of validation: "Performance and scalability plots available for the application on the two architectures"
- done, see also https://www.eessi.io/docs/blog/2024/06/28/espresso-portable-test-run-eurohpc
- amendment to MultiXscale grant agreement
- convert PMs to travel budget => need to cut PMs from some tasks
- T5.1 and T5.3 were finished M12, did we underspend PMs there?
- convert PMs to travel budget => need to cut PMs from some tasks
- WP status updates
-
[SURF] WP1 Developing a Central Platform for Scientific Software on Emerging Exascale Technologies
- [UGent] T1.1 Stable (EESSI) - D1.3 due M24 (Dec'24)
- more software added, now:
- ~388 different software projects
- ~731 software installations per CPU target
- over 6.500 software installations in total (across all CPU targets)
- particularly relevant for MultiXscale: pystencils, MetalWalls, pyMBE, Extrae (POP tool), ESPResSo for A64FX (Deucalion)
-
dev.eessi.io
=> see notes + support issue #61- dedicated Magic Castle cluster in Azure [Alan,Kenneth]
- was created, but is not fully operational yet currently
- not correctly deployed due to incorrect Terraform Cloud token being used
- TODO:
- install and configure bot on dedicated MC cluster for
dev.eessi.io
- implement
bot/build.sh
script indev.eessi.io
repo, leveraging scripts fromsoftware-layer
repo - flesh out first use case: building specific commits of ESPResSo
- install and configure bot on dedicated MC cluster for
-
showing good progress on this would be good for upcoming special project review
- would help to tackle the remark that there's too much of a disconnect between technical and scientific WPs
- we need to plan who will actively contribute, and how [Kenneth,Pedro?]
- dedicated Magic Castle cluster in Azure [Alan,Kenneth]
- NVIDIA GPU support => see notes + support issue #59
- ground work has been done end of 2023 (see EESSI docs)
- bot was enhanced recently to support
accel
filter - TODO:
- enhance script(s) in
software-layer
repo- auto-detect GPU model/architecture (enhance
archdetect
) - pick up
accel
directive from the bot and change software installation prefix accordingly - install GPU software in proper location: ESPResSo (?), LAMMPS, MetalWalls (?), TensorFlow, PyTorch, ...
- auto-detect GPU model/architecture (enhance
- enhance script(s) in
-
proper NVIDIA GPU support is due by M24 (deliverable D1.3)
- => we shouldn't wait for
dev.eessi.io
being operational
- => we shouldn't wait for
- we need to plan who will actively contribute, and how [Kenneth,Lara]
- need to review description of Task 1.1, make sure all subtasks are covered
- => need to update project planning (Caspar, Kenneth)
- "we will benchmark software from the shared software stack and compare the performance against on-premise software stacks to identify potential performance limitations, ..."
- ESPResSo + LAMMPS + OpenFOAM + ALL(?) (MultiXscale), GROMACS (BioExcel)
- "increase stability of the shared software stack ... pro-actively by developing monitoring tools"
- proper monitoring for CVMFS network (S0 + S1s)
- active work-in-progress by RUG, see also meeting notes
- more software added, now:
- [RUG] T1.2 Extending support - D1.4 due M30 (June'25)
- optimized installations for AMD Genoa Zen4 (~64% done) + A64FX (~23% done) are still a work-in-progress
- Intel Sapphire Rapids & NVIDIA Grace (for JUPITER) to start
- AMD ROCm (see planning issue #31 + support issue #71)
- effort led by Pedro/Bob (RUG)
- optimized installations for AMD Genoa Zen4 (~64% done) + A64FX (~23% done) are still a work-in-progress
- [SURF] T1.3 Test suite - D1.5 due M30 (June'25)
- additional tests implemented: CP2K, LAMMPS, mpi4py, PyTorch
- WIP: test for MetalWalls
- various small enhancements
- due another release?
- [BSC] T1.4 RISC-V (due M48, D1.6)
- see https://www.eessi.io/docs/repositories/riscv.eessi.io
- see blog post on installing Extrae (POP tool)
- Julian is working on getting CernVM-FS deployed natively on the RISC-V hardware they have at BSC
- [SURF] T1.5 Consolidation (starts M25 - Jan'25)
- continuation of effort on EESSI (T1.1, etc.) (not started yet)
- [UGent] T1.1 Stable (EESSI) - D1.3 due M24 (Dec'24)
-
[UGent] WP5 Building, Supporting and Maintaining a Central Shared Stack of Optimized Scientific Software Installations
- [SURF] T5.2 Monitoring/testing, D5.3 due M30 (June'25)
- dashboard to present test results is work-in-progress @ SURF
- [UGent] T5.4 support/maintenance - D5.4 due M48 (Dec'26)
- support portal + bi-weekly rotation working well
- total: 84 issues (29 open, 55 closed)
- [SURF] T5.2 Monitoring/testing, D5.3 due M30 (June'25)
-
[UB] WP6 Community outreach, education, and training
- deliverables due: D6.2 (M24 - Dec'24), D6.3 (M30 - June'25)
- EESSI as "package manager" backend in Ramble (testing tool by Google)
- see EESSI tutorial in Ramble docs
- indirect result of chat at Google booth @ ISC'24 in Hamburg (June 2024)
- EESSI was used at EUMASTER4HPC event at IT4I (20 Aug 2024) by lecturer Tomas Martinovic as an easy way to use a rich installation of R during his course
- some trouble with MPI due to missing integration with Slurm
- => could look into "use cases" section in EESSI docs to cover fixes/workarounds for know problems
- Alan: some related work on this by Pedro @ HPCNow!
- we could do a short interview on this with him, and then report on it on MultiXscale website + EESSI blog
- [Alan] invited speaker for Nordic Industry Days (early Sept'24, Copenhagen)
- AI focused event
- TrustLLM, European project
- Morris Riedel (Univ. of Iceland) is involved, interested in collaboration on EESSI, evaluate performance (of PyTorch for example)
- [Thomas] presentation @ CernVM workshop on EESSI (16-18 Sept 2024, Geneva)
- we need a section on the website to have a record of upcoming events like this
- could be done via news item, see https://www.multixscale.eu/blog-posts/
- also via social media (Twitter, LinkedIn, etc.)
- contact Susana/Neja via email to get social media posts on this
- [Richard] public webinar Introduction to EESSI (3 Oct 2024)
- would be good if Norwegian NCC could be involved/made aware of this
- => to be promoted via MultiXscale/EESSI channel
- [Alan] First ambassador event: "Introduction to EESSI" on 4 Oct 2024 (see news post on MultiXscale website)
- EESSI will be available on Austrian systems as a result from this
- number of registered participants?
- unclear, done via Austrian partner, who will also do post-event survey
- Neja: need to check which questions will be asked in survey
- do we need to promote this a bit more, via EESSI/EasyBuild mailing list/Slack, etc.?
- yes
- EuroHPC User Days (22-23 Oct 2024, Amsterdam)
- attending: Kenneth/Lara (UGent), Thomas/Ricard (UiB), Bob?/Pedro? (RUG)
- paper submitted to get a talk slot
- in touch with organisation w.r.t. participation in CoE session
- "Walk-in networking sessions focusing on specific EuroHPC user needs: provide your feedback and get some advice"
- bring your MultiXscale T-shirt!
-
Netherlands eScience Center (Dutch national center of expertise for research software, ~60 RSEs) got in touch with Bob to give a talk (31 Oct'24, Amsterdam)
- unclear if that's a public event, but can do a write-up afterwards
- [Eli/HPCNow!] EESSI Birds-of-a-Feather session accepted at Supercomputing'24 (Atlanta, US)
- can reuse material from BoF session @ ISC'24 in Hamburg
- [Pedro] submitted talk for SURF Advanced Computing Days (12 Dec'24, Utrecht)
- talk not accepted yet
- [Eli?] EESSI tutorial at HiPEAC 2025 accepted (20-22 Jan'25)
- [Jean-Noël] Espresso summer school
-
[HPCNow] WP7 Dissemination, Exploitation & Communication
- T7.1 Scientific applications provisioned on demand (lead: HPCNow) (started M13, finished M48)
- WIP by Pedro (HPCNow!)
- working on MPI injection for AWS ParallelCluster (AWS HPC Recipe library)
- no CUDA support in OpenMPI 4.x in AWS, but it's there in OpenMPI 5.x
- more or less ready, not available yet at startup
- has to be tested on various OSs
- will result in contribution to AWS HPC Recipe library + documentation on this
- AWS HPC Recipe library will use a script that we control
- future work: integration of EESSI into Open OnDemand
- could be a collaboration with them, cfr. discussion at ISC booth
- Task 7.2 - Dissemination and communication activities (lead: NIC)
- more EESSI stickers?
- Neja has more stickers: 1k EESSI + 1k MultiXscale
- send an email to Neja if you need some shipped
- Susana: improved/new video to be displayed at EuroHPC booth at SC'24 in Atlanta?
- no budget for something professional, so we'll need to do it ourselves
- most important thing is that we give them something to display
- deadline: 11 Oct'24
- is there any outdated info in current video?
- would be nice to include interview with Matej (towards general public)
- no sound for SC video!
- but we can add subtitles
- missing on MultiXscale website: organisation info (SC members, WP leads, etc.)
- Susana is trying to get more followers on Facebook, now 67 followers
- Task 7.3 - Sustainability (lead: NIC, started M18, due M42)
- Legal entity for EESSI needs to be looked into
- Alan has done some homework on this
- only for EESSI, not for MultiXscale?
- need to frame it as a way of making results from MultiXscale sustainable through EESSI, by including the software there
- subcontracting money available for this
- we should explore options ourselves a bit first
- review mentioned that there's "no service" from scientific WPs
- bit unfair, see trainings (like upcoming CECAM event), etc.
- see also EESSI in CI context used by pyMBE (on top of Espresso)
- Legal entity for EESSI needs to be looked into
- Task 7.4 - Industry-oriented training activities (lead: HPCNow)
- CECAM flagship event is also in scope here
- work done on integrating EESSI in ParallelCluster, could do a webinar/tutorial on this
- T7.1 Scientific applications provisioned on demand (lead: HPCNow) (started M13, finished M48)
-
[NIC] WP8 (Management and Coordination)
- amendment in the works
- Neja will start looking into that after holiday in July
- waiting for feedback from IIT on travel budget
- maybe not everyone will do this because of having to get approval from co-funding agency
- unclear what would happen if EU approves the change, but co-funding agency doesn't
- not a problem for UGent/UB since travel budget is entirely in EU budget
- Milestone 8 would be moved to KPI for T8.3: report on how many applications are installed on how many systems
- @Neja: all partners should be informed that amendment is in the works and will be submitted soon
- next General Assembly meeting
- 23-24 Jan'25 in Barcelona/Sitges
- coupled to HiPEAC'25 (20-22 Jan 2025)
- 23-24 Jan'25 in Barcelona/Sitges
- 2 deliverables were due 5 July'24 (in response to project review)
- one on co-design (by Alan)
- focus on collaborating with projects like EUPILOT, EPI, EUPEX (rather than contacting vendors directly)
- one for scientific WPs
- both submitted
- one on co-design (by Alan)
- next general MultiXscale meeting
- Mon 23 Sept 2024, 14:00 CEST
- discussing amendment?
- prep for special project review?
- who will prepare/present what will be discussed
- amendment in the works
-
- CI/CD call for EuroHPC
- is 100% funded (not 50/50 EU/countries)
- not published yet
- request for success story by CASTIEL2
- ideally end of June, by latest at end of August
- status: rounds of editing going on, should be published soon [Neja,Alan,Caspar]
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2024-05-14
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2024-04-09
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2024-03-12
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2024-02-13
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2024-01-09
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-12-12
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-11-14
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-10-10
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-09-12
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-08-08
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-07-11
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-06-13
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-05-09
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-04-11
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-03-14
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-02-14
- https://github.com/multixscale/meetings/wiki/sync-meeting-2023-01-10