Skip to content

WeeklyTelcon_20230214

Geoffrey Paulsen edited this page Feb 28, 2023 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Attendance not captured.

Not here today, but keep here for easy cut-n-paste for future.

  • Geoffrey Paulsen (IBM)
  • Jeff Squyres (Cisco)
  • Austen Lauria (IBM)
  • Brendan Cunningham (Cornelis Networks)
  • Brian Barrett (Amazon)
  • Edgar Gabriel (AMD)
  • Josh Fisher (Cornelis Networks)
  • Josh Hursey (IBM)
  • Luke Robison (Amazon)
  • Matthew Dosanjh (Sandia)
  • Thomas Naughton (ORNL)
  • Todd Kordenbrock (Sandia)
  • Tomislav Janjusic (nVidia)
  • William Zhang (AWS)
  • Howard Pritchard (LANL)
  • Joseph Schuchart (UTK)
  • David Bernholdt

New Items

  • Mellanox CI is failing. May be similar to a configure Edgar is seeing an issue where PRRTE is trying to build external, but there's none installed on the machine.

    • Sometimes this happens if there's one in the prefix for OMPI.
    • Edgar will debug a bit on his side, and Tommy will
  • New - 32bit (64bit came out 20 years ago)

    • Debbian noticed that Open MPI fails to build on 32bit build in configure.
      • This breaks a bunch of other packages that can't be built.
    • But are there real users? Or just inertia?
      • Looks like Inertia, but for example Boost library could just turn off MPI support for 32bit builds.
    • They're sticking with Open MPI v4.1.x for immediate need.
    • Lets check back in a week on estimate for 32bit scoping.
      • We do have 32bit testing that's turned off. So if we decide to test it's easy to reenable.
  • Issue #11347 Versioning is wrong in v5.0.x

    • We agreed v4.0.x -> v4.1.x -> v5.0.x should be ABI compatible.
      • Compile an MPI Application with v4.0.x, then RM -Rf OMPI, and then install the v5.0.0 into the same location, and it just work.
      • Did we figure out the Fortran ABI break?
        • Memory: Yes we did break Fortran ABI.
        • Broke ABI in a very narrow case, when you compile Fortran with 8byte ints, and C 4byte int.
        • Two other things that may or maynot break ABI.
        • Did some stuff with intents and asyncs, and went from named interfaces to unnamed.
          • Unsure if this affects ABI.
      • ABI mostly just care about C and mpif.h
      • Fortran library has different .so versioning.
    • Blocker for next v5.0.0rc - get library versioning correct.
    • When we talk about ABI - Fortran will be nuanced.
  • Comm Size Issue. Issue #11373.

    • Edgar created a PR to fix Comm Size to be same as v4.1.x to maintain backward compatibility for v5.0.0 from v4.1.x built apps.
  • Austen said he'd try to find time to run the

    • Some GNU ABI checker tool might help.

v4.1.x

  • Need to pull in a PMIx v3.1.
  • Fix cuda issue, due to a bad cherry-pick from earlier.
    • Reworking a PR, in progress.
  • Made a minor change for another rc. Trying to get rc built.

v5.0.x

  • Romio issue not

  • RC from last week, got pushed to this week.

    • Still waiting on https://github.com/open-mpi/ompi/issues/11354
    • may be enable dso option?
      • Accelerator initially picks CUDA and then disqualifies it, but at teardown it trys to teardown CUDA.
        • Reason it does this, is because CUDA now uses delayed startup so will still be enabled.
        • Another variable if CUDA was initialized.
      • Should also be on main (comment saying otherwise
    • Howard said after the call that this isn't a blocker for rc10
  • Howard has had an issue using external compilers with the accelerator

    • Issue #11354
  • Cuda Framework #11354 - Howard is working on it.

    • SM-Cuda if you disable building it, the problem doesn't occur.
    • --enable-so don't see this.
    • Dont see if app initializes cuda before MPI_Init (maybe)
    • Takes a number of factors to see this.
    • If application is actually using CUDA - then everything works.
    • Problem is that app doesn't use CUDA, but sm-cuda then initializes (even though application doesn't need cuda)
      • Calls into framework, to
      • At Finalize makes calls into the accelerator, it gets cuda runtime errors.
    • We think want SM-CUDA if running on a single node.
    • Was it just the IPC or also something else? Believe it was IPC stuff.
      • No IPC support to Accelerator framework. Just direct dependency on cuda.
    • For collective cuda - never directly uses Cuda buffers, just checks and then memcopies into host.
      • All of this does use accelerator framework.
      • These three components added a direct CUDA dependency because they call CUDA directly, instead of calling through framework.
        • BTL-sM-cuda
        • Rcache-somethign-sm
        • Rcache-gpu-sm
  • ROMIO isn't included in packaging properly.

    • Issue #11364 Austen is taking a look. Might have missed something.
  • Waiting on PMIx and PRRTE submodule update.

    • Ralph pestered us to please merge it. - just merged on main.
    • Merged, will make rc10
  • Need documentation for v5.0.0

  • Manpages need an audit before release.

    • Double check --prefix behavior
    • Not the same behavior as v4.1.x
  • What is status of HAN?

    • Priority bump of HAN PR #11362 to main, need one to v5.0.x
    • Joseph pushed a bunch of data, but not on the call. Go read this.
    • Joseph had some more experiments. HAN collective component with shared memory PR, we were pretty good compared to tuned and another
      • Comparing HAN with shared Mem component.
      • How many ppr? Between 2ppr and 64ppr
    • Better numbers, would be good to document this.
      • In OSU there's always a barrier before the operation. If Barrier and operation match up well, you get lower latency.
      • We'd talked about supplying some docs about how HAN is great, and why we're enabling it for v5.0.0 by default.
        • Like to include instructions on how to reproduce as well for users.
        • document in ECP -
      • Our current resolution is to enable it as is, and fix current regressions in future releases.
      • What else is needed to enable it by default?
        • Just need to flip a switch.
        • The module that Joseph has for shared memory for HAN at the moment would need some work to add additional collectives.
        • And it relies on xpmem to be available.
        • So for now just enable HAN for collectives we have, and later enable for other collectives.
        • George would like to re-use what tuned does, without reimplementing everything, but a shared memory component is a better choice, but with more work.
        • If we don't enabled HAN now by default, it's v5.1 (best case) before it's enabled.
          • The trade offs lean toward turning it on and fixing whatever problems might be there.
        • There is a PR for tuned (increases default segment size), and changes algorithms in tuned for shared memory.
        • Need to start moving forward, rather than doing more analysis.

Main branch

Administration Topics

Clone this wiki locally