Skip to content

WeeklyTelcon_20160920

Jeff Squyres edited this page Nov 18, 2016 · 1 revision

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Jeff Squyres
  • Josh Hursey
  • Joshua Ladd
  • Ralph
  • Sylvain Jeaugey
  • Artem Polyakov
  • Brad Benton

Agenda

Review 1.10

Review 2.0.x

  • Wiki

  • Milestones

    • 2.0.2 in preparation
      • Will create branch 2.0.x
      • Looks like PRs are non-controversial, just waiting for reviews.
      • 2.x branch will see the 2.1.0 PRs merged in once the 2.0.x branch is created
    • 2.1.0
      • Was hoping to get 2.1.0 PRs in, before we merge GIT repos.
    • Looked at a prototype of the merged GITHUB repo called ompi-all-the-branches
      • Review mechanism is web-only.
      • Blocking on OSHMEM - needs rebasing.
      • Yoda maintenance.
      • Ongoing performance discussion.
      • Most PRs marked as RM approved
      • Discussion on a few other other items
    • Blocker 2.0.2 issues
      • Issue 2075
        • Non-issue since SIGSEGV is not forwarded.
      • Issue 2049
        • Ticket updated
      • Issue 2030
        • MTT seems to be the only place to reproduce
        • Might be a debug build related issue in usage of opal_list_remove_item
      • Issue 2028
        • yoda needs to be updated for BTL 3.0
        • 2.1 will not be released until yoda is fixed
        • Propose: Remove yoda from 2.1, and move to ucx
        • Raises the question: Does it make sense to keep OSHMEM in Open MPI if yoda is removed?
      • Issue 1831
    • Blocker 2.1.0 issues
  • OSHMEM - Yoda Maintenance

    • Want to progress both MPI and OSHMEM in same process, don't want multiple network stacks.
    • Original argument was to use OSHMEM over BTL - to use all network stacks (TCP, SM, OpenIB)
      • 4 years ago, but things changed. Don't really have that anymore, have PMLs and SPMLs.
    • Last week Mellenox proposed moving to UCX.
    • OSHMEM sits on top of MPI layer, since it uses much of it.
    • Over last couple of years, it's been decoupled from MPI, now it's sitting on side.
    • But now it's sitting off on the side, and no-one is interested in maintaining the connection to OPAL support and ORTE. If that's all it's using, there are other projects that share OPAL and ORTE.
    • Only reason to be in repository is because connected at the MPI layer.
    • BUT, When you start OSHMEM, first thing called is OMPI_MPI_Init.
    • Maybe it would help, exactly what in MPI layer OSHMEM is using?
    • OPAL<-ORTE<-OMPI<-OSHMEM dependency chain.
    • Maybe it would help to show where that is.
    • OSHRUN (really ORTERUN), Calls OMPI_MPI_Init. Build an MCA plugin infrastructure on top of that.
    • Can't just slash pieces away.
    • Take advantage of all PMIx, Direct Modex, proc structure, and everything that supports this.
    • According to this PR on Master - OSHMEM has the same proc structure as OMPI, but actually has some MORE at the end of it.
    • What about the Transports? MPI -mxm boils down to libmxm, and so does OSHMEM down to libmxm.
    • Became an issue with BTL 3.0 API change.
    • A number of things, especially over last year, MPI focus and OSHMEM focus. A number of breaks between MPI / OSHMEM, release schedules conflicts.
      • Does it make sense to separate the repositories, or design a way to make it easy to pull between the two projects.
    • Right now there is a regression in the code base.
      • Mellanox can't replace Yoda with UCX in October.
      • Mellanox will fix Yoda for this time (for 2.1.0)
      • Could package UCX along side with other transports and let the market decide.
      • Want to continue this discussion about OSHMEM importance included with Open MPI project.
    • We need to have an important discussion about future of MPI / OSHMEM.
  • SPI - http://www.spi-inc.org/

    • getting people to approve of these.
    • We'll be on Oct 12th Agenda. Once they formally invite us, then we have 60 days to agree / decline.
    • Works solely on a volunteer basis, so very inexpensive.
    • End of September for soliciting feedback on using SPI.
    • Open MPI will hold a formal vote after we receive the formal invite (in mid-to-late-December?)
  • New Contribution agreement / Consent agreement / Bylaws.

    • Will need a formal vote by members.
    • End of October for discussion of new contributor agreement / bylaws.
    • After that we'll set a date for voting.

New Agenda Items:

Review Master MTT testing (https://mtt.open-mpi.org/)

MTT Dev status:

Website migration

Open MPI Developer's Meeting

  • Date of another face to face. January or February? Think about, and discuss next week.

Status Update Rotation

  1. LANL, Houston, IBM
  2. Cisco, ORNL, UTK, NVIDIA
  3. Mellanox, Sandia, Intel

Back to 2016 WeeklyTelcon-2016

Clone this wiki locally