-
Notifications
You must be signed in to change notification settings - Fork 0
Sync meeting 2024 12 09
Caspar van Leeuwen edited this page Dec 10, 2024
·
1 revision
- Monthly, every 2nd Tuesday of the month at 10:00 CE(S)T
- Notes of previous meetings at https://github.com/multixscale/meetings/wiki
- Tue 14 Jan 2024 10:00 CET
attending:
-
Kenneth, Lara (UGent)
-
Alan (CECAM/UB)
-
Thomas (UiB)
-
Caspar (SURF)
-
Bob, Pedro (RUG)
-
Neja (NIC)
-
Pedro, Susana (HPCNow!)
-
Special Project Review
- Presentations themselves went well.
- In-person feedback at the end of the day
- positive
- increase in work
- efforts + measures put in place are appreciated, on the good track
- role of each partners is clarified
- shortcomings to discuss/highlight
- still worried about multiscale simulations + scaling/coupling. Especially that so far no (or very little) actual coupling has been done. This is considered problematic, since we are also supposed to teach-by-example as a CoE. We cannot do that, until we have done the work itself. This appears to be one of the reasons why postponing the deliverables was rejected: their opinion is that if we postpone these deliverables, we will not have sufficient time for dissemination of the results later down the line.
- DX.1 (on coupling) will need to be revised
- DX.2 on co-design accepted
- proposed amendments
- not acceptable for technical/scienfitic parts
- delaying deliverables too much would postpone everything
- incl. deliverable 3.1, will be delivered M24
- economical part of Spain OK
- travel budget
- could be discussed w.r.t collaboration with CASTIEL2
- also incl. moving of PMs between tasks between tasks
- but this part of amendment was actually rejected...
- will be discussed during GA meeting in Jan'25
- plan is to try again with a dedicated amendment to revise travel budget
- could be discussed w.r.t collaboration with CASTIEL2
- not acceptable for technical/scienfitic parts
- next review
- more efforts/activity in multiscale simulations
- too much "on a single scale", more coupling work needed
- dissemination and training
- expect this to be improved
- training materials for multiscale methods
- more efforts/activity in multiscale simulations
- core of project is runing software projects together for multiscale projects
- should focus less on performance optimizations of individual projects
- breakdown of multiscale simulations across software projects and the coupling between them
- formal report within 30 days
- positive
- Email EU Project Offcier 4 Dec 2024
- Requests 1–4 (e.g., moving due dates, renaming deliverables, etc.) cannot be accepted.
- Requests 5-6 may be potentially accepted if they are both revised in a way to carefully rethink and adequately re-design the concerned Task(s) (e.g. Task 4.1) toward the "Multiscale" relevant objective, as per the PMON-101093169-1 Review Report and its specific recommendations (GENERAL PROJECT REVIEW CONSOLIDATED REPORT (HE) - Ref. Ares(2024)3295238 - 06/05/2024).
- Request 7: Change of PM effort in Task 5.2, reallocating 4 PMs from UB and UiB to Task 1.3 to support collaboration with CASTIEL 2 and other CoEs for developing pilot CI/CD pipelines for wider adoption in the EuroHPC ecosystem. It may be accepted.
- Request 8: DX.1 and DX.2 can be added to the GA. It may be accepted.
- Request 9: In Grant Agreement n.101093169, MS8 does not specify where the applications developed will be available so it cannot be accepted.
- table with apps as columns + systems/partitions as rows
- entries could have different colors to reflect native installation of EESSI, user-mounting, etc.
- input like this is also requested for CASTIEL2 deliverable
- Request 10: Elimination of references to OpenFOAM. It may be accepted.
- Request 11: Rephrasing the NICs justification for other goods, works, and services in Table 10, specifically for promotional material at EuroHPC JU conferences and CASTIEL 2 workshops. In the light of the EU "Green Deal" Priority, it cannot be accepted.
- Request 12: Adjusting Annex 2 to reallocate budget (NIC from C3, SURF 1 PM, RUG 1 PM, UGent and UiB 1 PM from A1 personnel category to travel budget under C1). In the light of the EU "Green Deal" Priority, it cannot be accepted.
- Request 13: Mitigation of risk regarding co-funding for the beneficiary partner, University of Barcelona. It may be accepted.
-
Upcoming deliverables (M24):
- we should schedule an initial sync meeting to kickstart the effort on these deliverables
- for D1.3: Report on stable, shared software stack (UB): UB lead -
KennethCaspar will take the lead in practice- self-assigning of sections being done by Caspar, Thomas, ...
- for D6.2: Training Activity Technical Support Infrastructure, UiB lead - Thomas => In practice Alan will take the lead
- Probably go into some detail on how to configure magic castle. There's not much to say otherwise.
- Something about how Lhumos, Magic Castle and EESSI are used in conjunction
- Also use of MkDocs to set up tutorial website
- Preserve training material by doing it online (+ Green deal)
- Alan thinks he can do most of it himself, Thomas can have a look after finishing his parts in D1.3
- for D7.2: Intermediate report on Dissemination, Communication and Exploitation, HPCNow lead - Eli
- Work in progress, +/- 50% done
- Deliverable is fully done by HPCnow
- Summary of the activities in WP7
- for D8.5: Project Data Management Plan - final, NIC lead - Neja
- Will include FAIR principles
- Mention of Zenodo, all deliverables are there
- provides DOI (citable), etc.
- Write something about how the software will be stored (Github is already "FAIR")
- we should start tagging versions of (more) key EESSI components
- missing for layer repos
- could be done via a GitHub Action that auto-tags "releases" every month (
2024.12
)
- could be done via a GitHub Action that auto-tags "releases" every month (
- already done for bot + EESSI test suite
- missing for layer repos
- also relevant for scientific codes
- cfr. Tilen's LAMMPS plugin, which is in some sense tied to specific LAMMPS versions
- ESPReSso does not have Zenodo yet
- we could expose the contents of EESSI repos via a web interface
- we should start tagging versions of (more) key EESSI components
- for D1.3: Report on stable, shared software stack (UB): UB lead -
- we should schedule an initial sync meeting to kickstart the effort on these deliverables
-
Upcoming milestones (M24) are all taken care off:
- update required on Milestone 2, request by review
- shared software stack w/ 1 accel supported + list of dependencies defined in WP2-4
- New link was included in the portal. This update was mentioned in the special review.
- Now we need to wait for the review feedback to see if it gets accepted
- Milestone 4, M21: First training event supported by the Ambassador Program. [WP1/WP5/WP6] (UB)
- Oct 4th training in Vienna: https://events.vsc.ac.at/event/141
- 2nd Ambassador event: MultiXscale hackaton in Slovenia (Dec'24)
- Milestone 5, M24: WP4 Pre-exascale workload executed on EuroHPC architecture. [WP2/WP3/WP4] (NIC)
- was done through Tilen, see post on MultiXscale website
- update required on Milestone 2, request by review
-
WP status updates
- [SURF] WP1 Developing a Central Platform for Scientific Software on Emerging Exascale Technologies
- [UGent] T1.1 Stable (EESSI) - D1.3 due M24 (Dec'24)
-
dev.eessi.io
: Tiger team is making very good progress. See meeting notes- Done:
- Project-specific prefixes work, but now need to hardcode that prefix
- Next step: do that based on the repo from which the build is coming
- Challenge: getting the repo name from the environment isn't easy
- Make sure accelerator builds end up in the right prefix => Should work, but still be tested
- Project-specific prefixes work, but now need to hardcode that prefix
- TODO's from last time:
- Once docs are merged, get scientific software developer to experiment with it => Jean-Noël?
- WIP:
- Meeting planned in January with Tilen to have him experiment with
dev.eessi.io
- Meeting tomorrow to see what still needs to be implmented before the January meeting
- Documentation effort to describe what we should do if we want to onboard a new code / repo to build for
dev.eessi.io
- Meeting planned in January with Tilen to have him experiment with
- Done:
- NVIDIA GPU support Tiger team making really good progress. See meeting notes
- Key results:
- ...
- TODO from last time:
- [WIP] updateing the
SitePackage.lua
for proper GPU support (see PR #798) => Still waiting for review - Actual GPU nodes in build cluster (now cross-compiling, not running GPU tests) or at least on partner clusters
- service account in place for EESSI at HPC-UGent + Deucalion => Lara is aiming to get it running next week
- WIP @ Snellius => Likely not before end-of-year
- Service account on Vega => Kenneth will request this by e-mail
- Adapt bot to accept arguments to allocate/build on GPU nodes
- For now hardcoding it in the config, but still TODO
- Decide on and expand combinations of CPU & GPU architectures
- will be determined by where we can get service accounts for EESSI?
- should definitely cover EuroHPC systems
- maybe also look into generic CPU + GPU?
- Re-install GPU software in proper location: ESPResSo (?), LAMMPS, MetalWalls (?), TensorFlow, PyTorch, ...
- WIP for TensorFlow, PyTorch => Status?
- [WIP] updateing the
- Should we plan new GPU Tiger Team meeting in January? (after deliverables are done)
- Kenneth will make a date-picker for this, aiming for 2nd week of January
- Key results:
- "we will benchmark software from the shared software stack and compare the performance against on-premise software stacks to identify potential performance limitations, ..."
- ESPResSo + LAMMPS + ALL(?) (MultiXscale), GROMACS (BioExcel)
- @Satish:
- Comparison done for GROMACS
- No surprises here, job time is a bit longer because it sources things from EESSI. But performance numbers are the same
- Working on ESPResSo (needed a local ESPResSo)
- TODO: LAMMPS
- Comparison done for GROMACS
- @Satish:
- ESPResSo + LAMMPS + ALL(?) (MultiXscale), GROMACS (BioExcel)
-
- [RUG] T1.2 Extending support - D1.4 due M30 (June'25)
-
zen4
almost on par with the rest.- Need to symlink everything for 2022b from
zen3
. See https://gitlab.com/eessi/support/-/issues/37#note_2159031831 (@Lara?) - Then merge https://github.com/EESSI/software-layer/pull/766
- Need to symlink everything for 2022b from
- NVIDIA Grace
- @Thomas: any update?
- AMD ROCm (see planning issue #31 + support issue #71)
- @Pedro/Bob: any update?
-
- [SURF] T1.3 Test suite - D1.5 due M30 (June'25)
- [BSC] T1.4 RISC-V (due M48, D1.6)
- ... (is build bot active? Who can control it?)
- [SURF] T1.5 Consolidation (starts M25 - Jan'25)
- continuation of effort on EESSI (T1.1, etc.) (not started yet)
- [UGent] T1.1 Stable (EESSI) - D1.3 due M24 (Dec'24)
- [UGent] WP5 Building, Supporting and Maintaining a Central Shared Stack of Optimized Scientific Software Installations
- [SURF] T5.2 Monitoring/testing, D5.3 due M30 (June'25)
- Plan to seperate dashboard & database in two separate VMs (security) => Status?
- Vega agreed to make test data public. Karolina is waiting for response from their director.
- [UGent] T5.4 support/maintenance - D5.4 due M48 (Dec'26)
- ...
- [SURF] T5.2 Monitoring/testing, D5.3 due M30 (June'25)
- [UB] WP6 Community outreach, education, and training
- deliverables due: D6.2 (M24 - Dec'24), D6.3 (M30 - June'25)
- Past activities:
- We won HPCWire Readers choice award in "Best HPC Programming Tool or Technology" category
- [Eli/HPCNow!] EESSI Birds-of-a-Feather session accepted at Supercomputing'24 (Atlanta, US)
- Great job by Lara and Helena
- 40 attendees or so
- 2 RPi giveaways
- Upcoming activities:
- Epicure Webinar on EasyBuild, 13 Dec 2024
- Pedro, Kenneth, Caspar will teach
- 1.5h, covering EasyBuild, and EESSI-extend
- Slides here
- [Pedro] submitted talk for SURF Advanced Computing Days (12 Dec'24, Utrecht)
- Yes
- [Alan] EESSI tutorial at HiPEAC 2025 accepted (20-22 Jan'25)
- Who will organize? => Alan, Lara, Eli for sure. Alan will take the lead on this.
- [Lara] Also at HiPEAC: another workshop (about CoEs), Lara will present workshop there
- Epicure Webinar on EasyBuild, 13 Dec 2024
- [HPCNow] WP7 Dissemination, Exploitation & Communication
- podcast interview for EuroHPC podcast
- Any updates? Date planned yet?
- Contacted HPCwire to see if they can make an article about EESSI
- Seperate from them, we as MultiXscale should probably make a press release
- Maybe with a quote from EuroHPC folks?
- Susana can take the lead, Kenneth can maybe provide the quote
- For inspiration:
- T7.1 Scientific applications provisioned on demand (lead: HPCNow) (started M13, finished M48)
- EESSI on 'paid layer' on top of Parallel Cluster: WIP. Status?
- Task 7.2 - Dissemination and communication activities (lead: NIC)
- Updates ... ?
- Task 7.3 - Sustainability (lead: NIC, started M18, due M42)
- Updates ... ?
- Task 7.4 - Industry-oriented training activities (lead: HPCNow)
- Updates ... ?
- podcast interview for EuroHPC podcast
- [NIC] WP8 (Management and Coordination)
- Amendmend not accepted in current form
- Will require revision
- Who takes the lead, @Neja? Input needed from others?
- Neja is waiting for IIT, they need to do the changes. Then, will be resubmitted. Hopefully this Friday
- NIC got payment of interest for the fact that the payment for the first reporting period was late
- Could this potentially be used for e.g. RPi5 give-aways? Other purposes?
- next General Assembly meeting
- 23-24 Jan'25 in Barcelona/Sitges
- venue is BSC
- coupled to HiPEAC'25 (20-22 Jan 2025)
- We need to promote the workshop at HiPEAC more!
- Registration is quite pricey, so we'll need to limit who actually attends?
- Try to setup online participation (Kenneth / Caspar would be interested)
- Should be possible in the room. Maybe use Alan's mic.
- Agenda is in the shared drive => project management => GA
- Ideas for services (item on the agenda):
- Support for HPC resources
- Training
- 23-24 Jan'25 in Barcelona/Sitges
- Amendmend not accepted in current form
- [SURF] WP1 Developing a Central Platform for Scientific Software on Emerging Exascale Technologies
- CI/CD call for EuroHPC
- is 100% funded (not 50/50 EU/countries) => Thomas saw some CI/CD thing in a table that was 50/50 funded
- Alan has seen draft text
- Budget is in flux (originally 6M/3 years, now changing)