Skip to content

WeeklyTelcon_20171107

Geoffrey Paulsen edited this page Jan 9, 2018 · 1 revision

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Jeff Squyres
  • Geoff Paulsen (IBM)
  • Brian
  • Edgar Gabriel
  • Geoffroy Vallee
  • Todd Kordenbrock
  • Thomas Naughton
  • Ralph
  • Howard Pritchard
  • Josh Hursey
  • Nathan Hjelm
  • Mohan
  • George

Agenda

  • Set Async Modex options - have them fixed in a branch for PMIx.
    • have to turn off dstore for this to work. (3 bugs in dstore)
    • Brian looking at a solution.
    • Ucx can use instant on by fence in ucx instead of PMIx
    • Any interconnect can use instant on approach - if you have an app that's sparcely connected, then these parameters will get you a much faster start.
      • Most fabrics (other than uginie BTL) need a barrier before communication.
    • ONLY works if dstore is off, since dstore incorrectly tells folks that the data has arrived.
    • Decision: If users set params to use Async Modex, it will auto disable dstore.
  • 3 things to fix in dstore.
    • Runtime option - disable dstore if async is requested.
    • Fix cross versioning issue
    • return a pointer to memory where something is stored, but can't today because they're packing data.

Review v2.0.x Milestones v2.0.4

  • Decided to merge a few other things yesterday.
  • IBM Jenkins was problematic - just ignore until they can get back online.
  • Howard can't get to AWS instance anymore.
    • This is new, Brian can't look at it today.
  • Schedule: At this point, it's looking like a post Supercomputing.

Review v2.x Milestones v2.1.2

  • Still over the New Year horrizon

Review v3.0.x Milestones v3.0

  • v3.0.1 - No RC canidate. Hope to get to this week, but
    • No real timeline.
  • Are there any current blockers?
    • Jenkin's server is off.
  • Edgar how are we on IO?
    • If 3.0.1 or 3.1 are not getting out this week, we should try to get bugfix from yesterday in.
    • Edgar just pulled it into master, will create PRs to v3.0.x and v3.1.x soon.
    • Will hold RC for this.
    • Issue 4453
  • Performance of IO was a bit scary, but a neccisary hit for correctness.

Review v3.1.x Milestones v3.1](https://github.com/open-mpi/ompi/milestone/27)

  • Schedule -
    • Outlook - Probably will not get out by supercomputing. :(
  • Brian will send out requests to start testing v3.1
    • Cisco did last week.
  • Add v3.1 to MTT tests
    • Database is active now to accept v3.1 tests.
    • MTT disk full issue has been resolved.

Review Master Master Pull Requests

  • Can't tell how master is doing.

MTT / Jenkins Testing Dev

  • What are we testing?
    • We test builds on a number of platforms, a number of compilers, and a number of configurations
  • The tests a are a bit more limited than we'd like.
  • Makes sure it runs ompi_info, run example programs using shared memory.
  • Adding tests to run single-node is pretty easy.
  • George: Open Sheme is broken in master, and George had to go back to v2.1 to get it to work.
    • UCX couldn't find remote peer. He can use UCX as MTL in Open MPI.
    • Challenge with OSHMEM - we removed support for BTLs, so have to have a transport provider that supports OSHMEM, which is only UCX.
    • We need to setup a test build with UCX.
  • Howard has some ARM boxes to test multi-box OSHMEM
  • IBM will enable OSHMEM + UCX in coming weeks.
  • PR - New Compare-and-swap function will return a new type to use new C11.
    • Suggested using a configure based switch, and configure figures out how to #define.
    • Configury can figure out what doesn't work, and enable renaming of types, etc.
  • Configure will detect C11 atomics under the covers if the compiler is C11.
    • If we use C11 signature, we'll need to use generics.
    • Those generics will be protected by the same macro. Compilers in general implemented generics before C11.
    • It's a mess because there is no standardization around generics.
    • Nathan: This is why this is the final step.
    • Any operation on atomic int because atomic. So need a new OPAL_Atomic that will either be atomic int or C volatile.
    • But you're right, for generics to work, types have to exactly match.
    • IF you're using attributes, it's even worse.
  • ONLY need for a subset of types: size_t, int32, int64, pointer, and a couple others.
    • If you want to use the OPAL Atomic interface, you MUST use one of the six supported types.
  • Need to ensure all compilers supported, support Generics.
    • But they didn't standardize the naming of the type.
    • Open MPI WOULD be standardizing the name of the type internal to our code.
  • Eventually Nathan wants to move to where atomics can be used automaticlly with C11.
    • But sometimes it's slow (and correct)
      • If you know what you're doing, then can cast away atomic for speed.
  • Will not affect external MPI_Ops (they're not atomics).

This week Discussion Points.

Oldest PR

Oldest Issue

Next face-to-face meeting

  • Jan / Feb
  • Possible locations: San Jose, Portland, Albuquerque, Dallas
  • Discuss What to do for partner's broken CI pieces?
  • Big section of going through old issues and old PRs.

Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2017 WeeklyTelcon-2017

Clone this wiki locally