Skip to content

WeeklyTelcon_20190326

Geoffrey Paulsen edited this page Mar 26, 2019 · 1 revision

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Geoff Paulsen
  • Jeff Squyres
  • Brian Barrett
  • Edgar Gabriel
  • George
  • Howard Pritchard
  • Josh Hursey
  • Joshua Ladd
  • Noah Evans (Sandia)
  • Ralph Castain
  • Thomas Naughton
  • Todd Kordenbrock

not there today (I keep this for easy cut-n-paste for future notes)

  • Michael Heinz (Intel)
  • Akshay Venkatesh (nVidia)
  • Dan Topa
  • David Bernholdt
  • Mike Heinz (Intel)
  • Jake Hemstad
  • Matthew Dosanjh
  • Xin Zhao
  • Nathan Hjelm
  • Geoffroy Vallee
  • Matias Cabral
  • Aravind Gopalakrishnan (Intel)
  • Dan Topa (LANL)
  • Arm (UTK)
  • Peter Gottesman (Cisco)
  • mohan

Agenda/New Business

  • NEW Ask George about Isuse: Overlapping Vector Datatype https://github.com/open-mpi/ompi/issues/5540

    • This is important. George is working on a patch.
    • If you're using complicated data types for real things, it's important.
    • Should it be back ported to release branches? Perhaps not, since only one customer has hit.
    • Is this a blocker for v4.0.1? Or v4.0.2? Not a blocker.
    • George did not get a chance to look at yet.
  • NEW Jeff will Open PR about StaleBot - https://github.com/apps/stale

    • https://github.com/open-mpi/ompi/pull/6495
    • We have a lot of OLD Issues, but they're so low priority that they won't get done.
      1. real bug, but no one will fix.
      2. bug WAS fixed, but no one noticed so should be closed.
      3. Work was abandoned for whatever reason.
    • We like the idea, but want to host it ourselves rather than rely on github to run it for a variety of reasons.
      • wait a month or so and should have a new person who knows Node.js to host Stalebot possibly with minor changes.
  • NEW Host Ordering fix to v3.0.x, v3.1.x, v4.0.x https://github.com/open-mpi/ompi/issues/6501

    • --host on command line, the ordering of the hosts were not ordered.
    • This Fix went into master. Do we want to bring it back to release branches?
    • Does this apply to the hostfile as well? - Yes. That seems like a common use-case
    • Is this a backwards compatibility issues? - No, since a specified ordering is a subset of a random ordering.
    • Seems like not too many people noticed, or just lived with this behavior.
    • Everyone on call liked PRing this to release branches, but want to see what Brian and Howard think.
    • There is a PR for v4.0.x that Ralph and Jeff are iterating on. Unexpectedly large. Would be good to do this first.
  • NEW Giles openib issue: https://github.com/open-mpi/ompi/pull/6152

    • No one had any thoughts on.
    • Would like Mellanox to chime in and let us know if it's needed in v4.0.x
  • NEW George did you get any follow up with Season of Documentation?

    • Some. Making man pages better and finding a way to link man pages with examples would be really awesome.
    • In libfabric they make man pages in Markdown, and then in make dist, they convert it to nroff.
    • For user facing APIs, they use Sphynx - convert Markdown in comments to user facing HTML man pages.
  • Face to face agenda is very light.

    • One major item is OMPI / ORTE / PRTE split out, and timing is important, probably shouldn't wait until fall
    • What about a one day meeting just for this issue?
      • Long way for people to come for one day.
    • Many items don't need to be on here for. Could do all day web-ex.
    • There is an interest to add back in, a threading framework so people can use other non-pthreads
      • Only reason we pulled this framework out was because we were just down
    • Lets cancel the face-to-face.
      • Interested: Brian, Ralph,
      • Interested in One Day Web-ex: Thomas, Josh Hursey, Josh Ladd,
      • Ralph will meet with
  • Agenda for Face to face needs to be updated. Please put items on there.

    • The agenda is very light, and perhaps we should cancel this one.
    • Lets talk about this next week. If the agenda is still light, we may consider canceling this.
  • We should reconsider and talk about if we want to break away ORTE / PRTE.

    • A lot of risk, but a lot of people merit discussing. - one thing on face to face agenda that's critical
    • If we DONT do it NOW it may never get done (move to PRTE) since Ralph is retiring and has much of both sides of this in his head.
    • ULFM has invested in this
    • MPI_Sessions is tightly coupled.
    • We will discuss again next week.
    • It's on the face to face agenda, but needs to be discussed and decided.
  • Nathan Hjelm's day job will no longer involve Open MPI, so if you want him to review something, please check with him first.

  • Next face to face is San Jose - April 23-April25 @ Cisco -San Jose.

  • Noah described a new thread framework

    • two bits of cleverness. Static initializes, and need certain functions to be static inline.
    • Get an implementation defined header that gets installed in configure. (similar to libevents)
    • Two components: Argobots and pthreads.
    • Currently exclusive (only one component, since it installs a header at configure time).
      • Probably permanently.
    • Need to look at thread local storage.
    • Had to implement TLS on top of pthreads, Argobots has this already.
    • Request completion would be the most sensitive to having oversubscription.

Minutes

Review v3.0.x Milestones v3.0.3

  • Will ship friday.

Review v3.1.x Milestones v3.1.0

  • Will ship friday.

Review v4.0.x Milestones v4.0.1

  • Will tag v4.0.1 and build today.

v5.0.0

  • Schedule: Delaying post Summer ***
  • Discussion of schedule depends on scope discussion
    • if we want to separate Orte out for that? Would be a bit past summer.
    • Giles has a prototype of PRTE replacing ORTE
  • Want to open up release-manager elections.
    • Now that we're delaying, will decide at face2face.
  • Now the possibility of v4.1 from master is a possibility
    • If we instead do a v4.1, some things we'd need fixed on master.
  • will discuss more at face to face.

Master

PMIx

  • There will be a new PMIx 3.1.3 which can be included into a future v4.0.x OMPI.
  • Take a look at Gile's PRTE work. He may have done SOME of that. He should have done that all in PRTE layer, maybe just some MPI layer work remains.

MTT

  • IBM still has 10% failure rate and build issue. Please fix!!!

New topics

  • MPI Forum - nothing too substantial. MPI_Sessions getting a lot of traction. Goal to get it done by next meeting. Need reading, and then vote, and another vote and another. So MPI Next would be in 2020 year. Language bindings, and some crazy proposals
  • Read MPI Forum link here: https://www.mpi-forum.org/

face to face -

  • how do we get more participation, and make MTT more meaningful?

Review Master Master Pull Requests

  • didn't discuss today.

Oldest PR

Oldest Issue


Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2018 WeeklyTelcon-2018

Clone this wiki locally