Skip to content

WeeklyTelcon_20180417

Geoffrey Paulsen edited this page Jan 15, 2019 · 1 revision

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Jeff Squyrese
  • Josh Hursey
  • Brian
  • David Bernholdt
  • Edgar Gabriel
  • Howard Pritchard
  • Nathan Hjelm
  • Thomas Naughton
  • Todd Kordenbrock
  • Xin Zhao

Agenda/New Business

Minutes

Review v2.x Milestones v2.1.3

  • v2.1.4 - Targeting Oct 15th,
    • Merged in a bunch of stuff.
    • One-sided multithreaded bugs that came up.
      • Doesn't feel like it's worth it to fix in v2.1.x, so instead pulled configurey changes from v2.0 to v2.1.x
  • No new news on v2.1.x

Review v3.0.x Milestones v3.0.2

  • v3.0.1 went out the door.
    • Oops, Did not get PMIx Compatibility pieces in embedded PMIx
  • v3.0.2 open for bugfixes. Quick turnaround on this.
    • Shooting for May 1st.
    • Will pre-emptively fix PMIx compatibility pieces to pickup PMIx v1.2.5 clients.
    • This will bring in PMIx compatibility with OMPI client (mpirun/orted/libmpi) from OMPI v2.1.3
  • memkind disable needs to get into v3.0.2, Either taken care of or waiting to be taken care of.
  • PR (fix ppc64-big-Endian) can't merger until 4563 is merged.
    • Thought Nathan was going to fix the hang, and then merge.
    • Given this is the same issue as ARM, where we don't have a block, thought we'd just remove
    • We now understand the problem, and not a silent data corruption, just a hang.

Review v3.1.x Milestones v3.1.0

  • Schedule - ASAP - but blockers keep getting filed.
    • No one seems particularly eager to get it out.
    • Not getting any
  • blockers
    1. One is high level of failures in CISCO MTT. Pretty sure it's not unique to 3.1.x, and happening on v3.0.x
      • --plm_base_verbose only 'non-default' flag setting.
      • Under slurm with mpirun not direct launch.
      • Jeff still investigating.
    2. UCX OSC is failiing in ibm tests on v3.1.x and on master. Geoff will post issue and @xin on it.
  • Not going to merge in PMIx patch, unless someone says they really want it. Would require a new RC.

Review Master Master Pull Requests

  • OSHMEM v1.4 - not sure if we have to drop the depricated APIs, curious OMPI is dropping depricated APIs...
    • Only remove things removed from the OSHMEM standard, not things Deprecated as "deprecated" means it will be removed from a future version of the standard. If some APIs were removed from the standard, then ask oshmem email list their thoughts.

Other topics

  • v4.0 release manager
    • Howard and Geoff have volunteered, but we can have other volunteers.
    • Start talking about now. Plan to branch mid-july.
    • Brian has automated much of the manual work, so now much of the work is coordinating with members, and encouraging testing, and fixing, etc.
    • If others are interested please volunteer.
  • Want to understand PMIx cross version compatibility.
    • Jeff Squyres and Josh talked about PMIx cross version, pulled in Howard and Ralph and discussed some more.
    • Josh shared a link to a google doc of a matrix of data
    • Amazon has compute resources, but not time for test development, etc.
    • Trying to get singularity and charly cloud to help with some of this testing, as it impacts them the most.
    • Should just need to test PMIx APIs
    • PMIx community is continuing to discuss.
    • VOLUNEER NEEDED: If time to work on MTT, please volunteer -perl or python or whatever you want.

MTT / Jenkins Testing Dev

  • Get copy of perl JSON, and put it on MTT.

When should we branch v4.0?

Oldest PR

Oldest Issue


Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2018 WeeklyTelcon-2018

Clone this wiki locally