Skip to content

Sync meeting 2024 12 09

Caspar van Leeuwen edited this page Dec 10, 2024 · 1 revision

MultiXscale WP1+WP5 sync meetings


Next meetings

  • Tue 14 Jan 2024 10:00 CET

Agenda/notes 2024-10-12

attending:

  • Kenneth, Lara (UGent)

  • Alan (CECAM/UB)

  • Thomas (UiB)

  • Caspar (SURF)

  • Bob, Pedro (RUG)

  • Neja (NIC)

  • Pedro, Susana (HPCNow!)

  • Special Project Review

    • Presentations themselves went well.
    • In-person feedback at the end of the day
      • positive
        • increase in work
        • efforts + measures put in place are appreciated, on the good track
        • role of each partners is clarified
      • shortcomings to discuss/highlight
        • still worried about multiscale simulations + scaling/coupling. Especially that so far no (or very little) actual coupling has been done. This is considered problematic, since we are also supposed to teach-by-example as a CoE. We cannot do that, until we have done the work itself. This appears to be one of the reasons why postponing the deliverables was rejected: their opinion is that if we postpone these deliverables, we will not have sufficient time for dissemination of the results later down the line.
        • DX.1 (on coupling) will need to be revised
        • DX.2 on co-design accepted
      • proposed amendments
        • not acceptable for technical/scienfitic parts
          • delaying deliverables too much would postpone everything
          • incl. deliverable 3.1, will be delivered M24
        • economical part of Spain OK
        • travel budget
          • could be discussed w.r.t collaboration with CASTIEL2
            • also incl. moving of PMs between tasks between tasks
            • but this part of amendment was actually rejected...
            • will be discussed during GA meeting in Jan'25
          • plan is to try again with a dedicated amendment to revise travel budget
      • next review
        • more efforts/activity in multiscale simulations
          • too much "on a single scale", more coupling work needed
        • dissemination and training
          • expect this to be improved
          • training materials for multiscale methods
      • core of project is runing software projects together for multiscale projects
        • should focus less on performance optimizations of individual projects
        • breakdown of multiscale simulations across software projects and the coupling between them
      • formal report within 30 days
    • Email EU Project Offcier 4 Dec 2024
      • Requests 1–4 (e.g., moving due dates, renaming deliverables, etc.) cannot be accepted.
      • Requests 5-6 may be potentially accepted if they are both revised in a way to carefully rethink and adequately re-design the concerned Task(s) (e.g. Task 4.1) toward the "Multiscale" relevant objective, as per the PMON-101093169-1 Review Report and its specific recommendations (GENERAL PROJECT REVIEW CONSOLIDATED REPORT (HE) - Ref. Ares(2024)3295238 - 06/05/2024).
      • Request 7: Change of PM effort in Task 5.2, reallocating 4 PMs from UB and UiB to Task 1.3 to support collaboration with CASTIEL 2 and other CoEs for developing pilot CI/CD pipelines for wider adoption in the EuroHPC ecosystem. It may be accepted.
      • Request 8: DX.1 and DX.2 can be added to the GA. It may be accepted.
      • Request 9: In Grant Agreement n.101093169, MS8 does not specify where the applications developed will be available so it cannot be accepted.
        • table with apps as columns + systems/partitions as rows
        • entries could have different colors to reflect native installation of EESSI, user-mounting, etc.
        • input like this is also requested for CASTIEL2 deliverable
      • Request 10: Elimination of references to OpenFOAM. It may be accepted.
      • Request 11: Rephrasing the NICs justification for other goods, works, and services in Table 10, specifically for promotional material at EuroHPC JU conferences and CASTIEL 2 workshops. In the light of the EU "Green Deal" Priority, it cannot be accepted.
      • Request 12: Adjusting Annex 2 to reallocate budget (NIC from C3, SURF 1 PM, RUG 1 PM, UGent and UiB 1 PM from A1 personnel category to travel budget under C1). In the light of the EU "Green Deal" Priority, it cannot be accepted.
      • Request 13: Mitigation of risk regarding co-funding for the beneficiary partner, University of Barcelona. It may be accepted.
  • Upcoming deliverables (M24):

    • we should schedule an initial sync meeting to kickstart the effort on these deliverables
      • for D1.3: Report on stable, shared software stack (UB): UB lead - Kenneth Caspar will take the lead in practice
        • self-assigning of sections being done by Caspar, Thomas, ...
      • for D6.2: Training Activity Technical Support Infrastructure, UiB lead - Thomas => In practice Alan will take the lead
        • Probably go into some detail on how to configure magic castle. There's not much to say otherwise.
        • Something about how Lhumos, Magic Castle and EESSI are used in conjunction
        • Also use of MkDocs to set up tutorial website
        • Preserve training material by doing it online (+ Green deal)
        • Alan thinks he can do most of it himself, Thomas can have a look after finishing his parts in D1.3
      • for D7.2: Intermediate report on Dissemination, Communication and Exploitation, HPCNow lead - Eli
        • Work in progress, +/- 50% done
        • Deliverable is fully done by HPCnow
        • Summary of the activities in WP7
      • for D8.5: Project Data Management Plan - final, NIC lead - Neja
        • Will include FAIR principles
        • Mention of Zenodo, all deliverables are there
          • provides DOI (citable), etc.
        • Write something about how the software will be stored (Github is already "FAIR")
          • we should start tagging versions of (more) key EESSI components
            • missing for layer repos
              • could be done via a GitHub Action that auto-tags "releases" every month (2024.12)
            • already done for bot + EESSI test suite
          • also relevant for scientific codes
            • cfr. Tilen's LAMMPS plugin, which is in some sense tied to specific LAMMPS versions
            • ESPReSso does not have Zenodo yet
          • we could expose the contents of EESSI repos via a web interface
  • Upcoming milestones (M24) are all taken care off:

    • update required on Milestone 2, request by review
      • shared software stack w/ 1 accel supported + list of dependencies defined in WP2-4
      • New link was included in the portal. This update was mentioned in the special review.
      • Now we need to wait for the review feedback to see if it gets accepted
    • Milestone 4, M21: First training event supported by the Ambassador Program. [WP1/WP5/WP6] (UB)
    • Milestone 5, M24: WP4 Pre-exascale workload executed on EuroHPC architecture. [WP2/WP3/WP4] (NIC)
  • WP status updates

    • [SURF] WP1 Developing a Central Platform for Scientific Software on Emerging Exascale Technologies
      • [UGent] T1.1 Stable (EESSI) - D1.3 due M24 (Dec'24)
        • dev.eessi.io: Tiger team is making very good progress. See meeting notes
          • Done:
            • Project-specific prefixes work, but now need to hardcode that prefix
              • Next step: do that based on the repo from which the build is coming
              • Challenge: getting the repo name from the environment isn't easy
            • Make sure accelerator builds end up in the right prefix => Should work, but still be tested
          • TODO's from last time:
            • Once docs are merged, get scientific software developer to experiment with it => Jean-Noël?
          • WIP:
            • Meeting planned in January with Tilen to have him experiment with dev.eessi.io
            • Meeting tomorrow to see what still needs to be implmented before the January meeting
            • Documentation effort to describe what we should do if we want to onboard a new code / repo to build for dev.eessi.io
        • NVIDIA GPU support Tiger team making really good progress. See meeting notes
          • Key results:
            • ...
          • TODO from last time:
            • [WIP] updateing the SitePackage.lua for proper GPU support (see PR #798) => Still waiting for review
            • Actual GPU nodes in build cluster (now cross-compiling, not running GPU tests) or at least on partner clusters
              • service account in place for EESSI at HPC-UGent + Deucalion => Lara is aiming to get it running next week
              • WIP @ Snellius => Likely not before end-of-year
              • Service account on Vega => Kenneth will request this by e-mail
            • Adapt bot to accept arguments to allocate/build on GPU nodes
              • For now hardcoding it in the config, but still TODO
            • Decide on and expand combinations of CPU & GPU architectures
              • will be determined by where we can get service accounts for EESSI?
              • should definitely cover EuroHPC systems
              • maybe also look into generic CPU + GPU?
            • Re-install GPU software in proper location: ESPResSo (?), LAMMPS, MetalWalls (?), TensorFlow, PyTorch, ...
              • WIP for TensorFlow, PyTorch => Status?
          • Should we plan new GPU Tiger Team meeting in January? (after deliverables are done)
            • Kenneth will make a date-picker for this, aiming for 2nd week of January
        • "we will benchmark software from the shared software stack and compare the performance against on-premise software stacks to identify potential performance limitations, ..."
          • ESPResSo + LAMMPS + ALL(?) (MultiXscale), GROMACS (BioExcel)
            • @Satish:
              • Comparison done for GROMACS
                • No surprises here, job time is a bit longer because it sources things from EESSI. But performance numbers are the same
              • Working on ESPResSo (needed a local ESPResSo)
              • TODO: LAMMPS
      • [RUG] T1.2 Extending support - D1.4 due M30 (June'25)
      • [SURF] T1.3 Test suite - D1.5 due M30 (June'25)
        • Ongoing effort: porting tests to use eessi_mixin class
          • Open PRs for PyTorch, QuantumESPRESSO
        • Docs on how to use eessi_mixin are merged here
        • Devised strategy for how to deal with ReFrame staging the same data once per test instance issue
      • [BSC] T1.4 RISC-V (due M48, D1.6)
        • ... (is build bot active? Who can control it?)
      • [SURF] T1.5 Consolidation (starts M25 - Jan'25)
        • continuation of effort on EESSI (T1.1, etc.) (not started yet)
    • [UGent] WP5 Building, Supporting and Maintaining a Central Shared Stack of Optimized Scientific Software Installations
      • [SURF] T5.2 Monitoring/testing, D5.3 due M30 (June'25)
        • Plan to seperate dashboard & database in two separate VMs (security) => Status?
        • Vega agreed to make test data public. Karolina is waiting for response from their director.
      • [UGent] T5.4 support/maintenance - D5.4 due M48 (Dec'26)
        • ...
    • [UB] WP6 Community outreach, education, and training
      • deliverables due: D6.2 (M24 - Dec'24), D6.3 (M30 - June'25)
      • Past activities:
        • We won HPCWire Readers choice award in "Best HPC Programming Tool or Technology" category
        • [Eli/HPCNow!] EESSI Birds-of-a-Feather session accepted at Supercomputing'24 (Atlanta, US)
          • Great job by Lara and Helena
          • 40 attendees or so
          • 2 RPi giveaways
      • Upcoming activities:
        • Epicure Webinar on EasyBuild, 13 Dec 2024
          • Pedro, Kenneth, Caspar will teach
          • 1.5h, covering EasyBuild, and EESSI-extend
          • Slides here
        • [Pedro] submitted talk for SURF Advanced Computing Days (12 Dec'24, Utrecht)
          • Yes
        • [Alan] EESSI tutorial at HiPEAC 2025 accepted (20-22 Jan'25)
          • Who will organize? => Alan, Lara, Eli for sure. Alan will take the lead on this.
        • [Lara] Also at HiPEAC: another workshop (about CoEs), Lara will present workshop there
    • [HPCNow] WP7 Dissemination, Exploitation & Communication
    • [NIC] WP8 (Management and Coordination)
      • Amendmend not accepted in current form
        • Will require revision
        • Who takes the lead, @Neja? Input needed from others?
          • Neja is waiting for IIT, they need to do the changes. Then, will be resubmitted. Hopefully this Friday
      • NIC got payment of interest for the fact that the payment for the first reporting period was late
        • Could this potentially be used for e.g. RPi5 give-aways? Other purposes?
      • next General Assembly meeting
        • 23-24 Jan'25 in Barcelona/Sitges
          • venue is BSC
          • coupled to HiPEAC'25 (20-22 Jan 2025)
          • We need to promote the workshop at HiPEAC more!
          • Registration is quite pricey, so we'll need to limit who actually attends?
          • Try to setup online participation (Kenneth / Caspar would be interested)
            • Should be possible in the room. Maybe use Alan's mic.
          • Agenda is in the shared drive => project management => GA
          • Ideas for services (item on the agenda):
            • Support for HPC resources
            • Training

Other topics

  • CI/CD call for EuroHPC
    • is 100% funded (not 50/50 EU/countries) => Thomas saw some CI/CD thing in a table that was 50/50 funded
    • Alan has seen draft text
    • Budget is in flux (originally 6M/3 years, now changing)
Clone this wiki locally