-
Notifications
You must be signed in to change notification settings - Fork 0
Sync meeting 2023 07 11
- Monthly, every 2nd Tuesday of the month at 10:00 CE(S)T
- Notes of previous meetings at https://github.com/multixscale/meetings/wiki
attending: Caspar, Elisabeth, Killian, Bob, Maxim, Xavier, Xin, Satish
-
overview of MultiXscale planning
-
WP status updates
- [SURF] WP1 Developing a Central Platform for Scientific Software on Emerging Exascale Technologies
- [UGent] T1.1 Stable (EESSI) - due M12+M24
- 2023.06 compat layer deployed
- OpenSSL 1.1.1 used, OpenSSL 3.x gave too many issues
- software layer started
- we had a meeting to show how to work with the bot
- many PRs with additions
- Init scripts for 2023.06 are present, so you can give it a try
- All software deployed by the bot
- New stratum 0 server installed with OS. TODO: raid setup
- Yubikeys are available as well, still need to set them up and distribute a few of them to other people
- RUG stratum 1 has bandwidth issues, potentially due to firewall. TODO: move to different VLAN to avoid RUG firewall.
- Idea: use CDN for stratum 1
- For instance Cloudfare or AWS CloudFront
- Challenge: might confuse the GEO API
- We may want to look into writing our own Python script for picking the best Stratum 1, replacing the GEO API functionality
- 2023.06 compat layer deployed
- [RUG] T1.2 Extending support (starts M9, due M30)
- [SURF] T1.3 Test suite - due M12+M24
- Regular runs set upon Vega scripts for cronjob and config, PRs still open
- Config for AWS CITC here, but CPU autodetection failing on Graviton nodes
- Caspar opened an issue on the ReFrame repository: https://github.com/reframe-hpc/reframe/issues/2950
- 2nd High level application test (TensorFlow): under review
- Detected issues on hyperthreading systems (Vega) with binding. First attempt to solve this failed. Better merge TensorFlow test first.
- Satish has been working on a test for running the OSU Micro Benchmarks: https://github.com/EESSI/test-suite/pull/54
- [BSC] T1.4 RISC-V (starts M13)
- [SURF] T1.5 Consolidation (starts M25)
- [UGent] T1.1 Stable (EESSI) - due M12+M24
- [UGent] WP5 Building, Supporting and Maintaining a Central Shared Stack of Optimized Scientific Software Installations
- [UGent] T5.1 Support portal - due M12
- Support portal meetings #1 #2 and #3
- Gitlab selected for support portal
- Option for self-hosting if we hit limits, through Docker compose. Haven't found anything that limits it, seems easy.
- Experimented with some test issues, both from 'customer' and 'support' point of view. No major issues. Guidelines on use of labels required though.
- Instructions for support staff here
- [SURF] T5.2 Monitoring/testing (starts M9)
- [UiB] T5.3 community contributions (bot) - due M12
- Bot tutorial: https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-software-layer-(2023-06-23)
- More people with build permissions (Thomas, Bob, Kenneth, Caspar, Alan, Lara, Maxim, Satish, Xin, Richard)
- More people with deploy permissions (Thomas, Bob, Kenneth, Caspar, Alan)
- More people have actually used the bot (at least once), see e.g. these PRs
- Contribution policy initial version
- Bot tutorial: https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-software-layer-(2023-06-23)
- [UGent] T5.4 support/maintenance (starts M13)
- [UGent] T5.1 Support portal - due M12
- [UB] WP6 Community outreach, education, and training
- Follow-up meeting with CernVM-FS developer about training event
- Initial content ideas posted to [tutorial website]
- (https://multixscale.github.io/cvmfs-tutorial-hpc-best-practices/)
- Discussed who is going to write/review each section (for most of them: one person from EESSI, one from the CVMFS team), issues for this have been opened on the GitHub repository
- Next meeting on September 5th
- Magic Castle tutorial at SC23 accepted, so also working on that as AOC will now attend
-
Approached the project officer to ask for an exception for funding. She was very firm that no trips other than those listed in the proposal would be approved:
According to Grant Agreement 6.1 eligible “costs must be incurred in connection with the action as described in Annex 1 and necessary for its implementation.”...Taken that the Centres of Excellence have a technical focus on their flagships codes, dissemination is not central in the action and the visit of SC is not necessary to achieve the objectives of the call. Costs that are not specifically linked to the action are covered by the indirect costs.
This is worrisome as it would seem to exclude any conference (unless EuroHPC is participating I would expect, but even then it is not clear).
-
- Follow-up meeting with CernVM-FS developer about training event
- [HPCNow] WP7 Dissemination, Exploitation & Communication
- Newsletter every 6 months, one draft now that got sent to Neja, Matteu and Alan. Will probably send next week
- Finishing the slides for the elevator pitch.
- Format of the slides is not so nice, template is quite limited. Making a more extensive template
- [SURF] WP1 Developing a Central Platform for Scientific Software on Emerging Exascale Technologies
-
Other updates
- Who has access to the root account of AWS CITC?
- Alan, Kenneth, Bob
- All parterns please ensure that the excel sheet with the list of people working on the project is up to date:
- See the excel sheet within the base directory of the MultiXscale shared folder
- Who has access to the root account of AWS CITC?
-
Action items:
- Q2 reporting: Please fill in your numbers & activities for WP1 and 5 (and other WPs probably also appreciate this :))
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-06-13
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-05-09
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-04-11
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-03-14
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-02-14
- https://github.com/multixscale/meetings/wiki/sync-meeting-2023-01-10