Skip to content

Meeting 2017 01

Tomislav Janjusic edited this page Jan 6, 2024 · 1 revision

January 2017 Open MPI Developer's Meeting

Logistics:

  • Start: 9am, Jan 24th
  • Finish: 1pm, Jan 26th
  • Attendance fee: NONE

Location:

Remote attendance

Webex connectivity to the meeting rooms is possible, but the dialin information is not posted here to the public wiki page. Please contact Jeff Squyres if you'd like to attend remotely.

Attendees

Please add your name to the wiki list below if you are coming to the meeting:

  1. Ralph Castain (Intel)
  2. George Bosilca (UTK)
  3. Shinji Sumimoto (Fujitsu)
  4. Howard Pritchard (LANL)
  5. Geoff Paulsen (IBM)
  6. Jeff Squyres (Cisco)
  7. Andrew Friedley (Intel)
  8. Annu Dasari (Intel)
  9. Artem Polyakov (Mellanox)
  10. Joshua Ladd (Mellanox)
  11. Nathan Hjelmn (LANL)
  12. Matias Cabral (Intel)
  13. Geoffroy Vallee (ORNL)
  14. David Bernholdt (ORNL)
  15. Brian Barrett (AWS)
  16. Sylvain Jeaugey (NVIDIA)
  17. Chris Chambreau (LLNL)
  18. ***** Jeff has registered up to this point. If additional people sign up, please let Jeff Squyres know so that he can get you a Cisco badge and signed up on the guest wifi. Thanks!

Attending Remotely:

  1. Josh Hursey - IBM (Available from 6:30am-3pm Pacific) (Added a ☎️ icon next to the items I'd like to call in for, if possible)
  2. Murali Emani - Livermore

Cisco aps

Topics Still To Discuss

Thurs morning

  • ☎️ v3.x planning
    • Select release managers
  • ☎️ OpenMP + OMPI
    • Can we use PMIx to coordinate binding between the layers and the RM?
    • Coordinate meeting for broader audience, point people to GoogleGroup mailing list
  • Per https://www.mail-archive.com/devel@lists.open-mpi.org/msg19874.html:
    • Are the defaults hostile to running by default on RoCE?
    • Is the btl openib code bit-rotted / defunct? Should we amend the docs and disable/remove the code?
  • ☎️ Continuation of -prot forward-adaptation proposal from IBM (which got morphed into -net discussions in February and August 2016 meetings)
  • ☎️ Update: IBM features coming upstream
    • PR #2773 Hook framework (e.g., for implementing licensing, -prot)
    • PR #2272 stacktrace improvements
  • ☎️ Check hetero-node detection
  • Check nightly tarball generation logic
  • Ralph/Nathan: debug static ports, DVM at scale, remaining race conditions in OOB

Topics Covered (see notes for discussion/resolution)

Tues morning

  • ☎️ OMPI v2.1 planning
    • PMIx v1.2.1 or v2.0? [telecon: v1.2.1]
      • Auto-set RTE barriers off when async modex?
      • ORTE-related scaling updates
  • Jeff/Howard: issue / PR cleanup
    • Let's review all old issues and PRs.
    • For those that are still relevant, let's assign milestones.
    • And let's close those that are no longer relevant.
  • ☎️ Revisit --host and --hostfile behavior. See PR #1353
  • ☎️ Performance regression in MTLs. See Issue #2644
  • ☎️ Discuss work remaining to do for Issue #2151 ("'nonblocking3' BVT test fails")
    • Discussion started at SC'16, but no progress since then.

Tues afternoon

  • ☎️ Overview of new DOE Exascale (ECP) project focusing on Open MPI
  • ☎️ Exposing PMIx functionality
    • Martin Schulz proposes MPI_T wrapper - do we want to pursue it?
  • ☎️ Memory footprint reduction
    • Critical on large-scale, complex architectures
    • HWLOC topology tree is a driver - can we devise strategies for not holding this object in memory for the entire execution?
    • See this PR for details and possible solutions
    • Can we do this for v2.x?
  • Should we allow variadiac macros? They're part of C99.
    • E.g., OBJ_NEW to allow constructors with arguments.
  • Moving verbosity, enable, priority, (and other?) parameters into the MCA super class to be shared across all MCA components?
  • Do we still care about 32 bit on x86 platforms?
    • I.e., should we disable it in configure?
    • Only asking just to trim cases that we don't care about -- there's no specific technical reason to kill / keep 32 bit on x86.

Wed morning

  • Jeff/Ralph: finish SPI onboarding, SPI logo on web site?
  • ☎️ Fujitsu Status Update
  • BTL replacement with UCX and libfabric?
  • ☎️ btls are opened even if pml and osc components will not use them.
    • Can we devise a way to avoid loading the btls if they are not going to be used in a run?
    • This is a bit of a chicken-and-egg problem due to component selection...

Wed afternoon

  • MTL thread support
    • info requested on OMPI threading model
    • what level of thread support will OMPI shoot for?
  • ☎️ v3.x planning
    • Select release managers
    • When do we branch?
    • Default settings for v3.x
      • async modex = ON?
      • no RTE barrier on init = ON?
    • What if any upgrade path for existing applications? Recompile / relink applications? Command line differences?
      • If no need to rev major version, should we?
    • Binary / API backwards compatibility testing? per commit?
  • ☎️ Performance improvements from master over v2.x? Where do we need to work?
  • ☎️ Performance Regression Monitoring options (e.g. a dashboard?)
  • OMPIO improvements - MPI persistent request extensions to allow picking up newer ROMIO (IBM?)
Clone this wiki locally