Skip to content

WeeklyTelcon_20160830

Jeff Squyres edited this page Nov 18, 2016 · 1 revision

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Jeff Squyres
  • Brad Benton
  • Edgar Gabriel
  • Geoffroy Vallee
  • Howard
  • Josh Hursey
  • Joshua Ladd
  • Nathan Hjelm
  • Ralph
  • Ryan Grant
  • Sylvain Jeaugey

Agenda

Review 1.10

  • Milestones
  • 1.10.4
    • Only potential blocker is issue with wrapper compiler.
      • mpifort is not libpath-ing rpath lib
      • when you do C builds, add rpath to all dependent libs during build.
      • static builds on 1.10

Review 2.0.x

  • Wiki

  • Milestones

    • No PRs Left.
    • But OSHMEM tests are not DOA, but still failing 20% of tests.
      • it's better, so might just ship 2.0.1 as is, and work on fixing these in next released.
    • Still thinking that we should get 2.0.1 out, and get a fix in this fro v2.1.0 (short cycle for that).
    • Comm Spawn is still Broken. - timeout in OPAL_PMIX_Exchange macro. Fixed in master?
      • Bug in CID creation in PMIx? Fix got rolled into ___, so have to be back ported.
      • When refactored code to make all CID Allreduces to be non-blocking, this got hit.
      • Symptom is a race condition where keys are becoming the same.
      • Only an issue in dynamics, normal comm creation uses MPI traffic, not PMIx.
      • Was this broken in 2.0.0? Ralph would be surprised, because he watches for this, but isn't certain.
      • Nathan could look at code, and compare new / old, to see if he can remember, or get loop_spawn to work again on cray.
      • If this is a regression, We shouldn't release.
  • Assuming we'll ship soon, go refactor your PRs from 2.1.0

    • Will start merging 2.0.2 PRs in, and then close ompi-release, and then merge the two repos in ompi repo.
  • Timeline for 2.1.0 is very short, because we wanted a small number of fairly low to medium risk that we can get done by end of October. Probably looking at freeze end of September. Shooting for mid-October.

  • Don't yet have a plan for 2.0.2

    • Going to make a new fork? What do we call that new fork? is it 2.2 or 3.0? Depends on backwards compatibility story.
  • coll_sync - slated for 2.0.2, classified as bugfix, but don't dump in at last second before 2.1.0

  • Mellanox needs PMIx 2.0 in 2.1.0

    • PMIx will release a 2.0 that just has shared memory data as an addition,
      • but doesn't have everything else they were targeting for 2.0.0.
      • This should come out Early September.
      • This is the piece that Mellanox and IBM are interested in.
    • Put items requested on the wiki (e.g., PMIx direct modex, OpenSHMEM, stability improvements)
    • What do people want to see for 2.1.0?
    • Finalize the list in Dallas meeting
    • Hopefully target Sept./Oct. release, not Super Computing Goal.

Review Master MTT testing (https://mtt.open-mpi.org/)

  • Master has a sea of red, due to OSHMEM issues.
  • mpifort failing to link on 1.10 with static as well.
  • MPI_Comm_spawn failures at mellanox and maybe ibm. Failing on master a week ago, and now failing on v2.x
    • Was working a week or two ago.
    • Howard or Ralph will look at when they get some cycles.
    • Sylvain might look at some PMIx commits also on v2.x and see if he can isolate.

MTT Dev status:

  • Ralph made a lot of progress there. Still need to get submission thing working.
    • One
  • Josh started moving MTT server to Amazon cloud server.
    • No progress last week. Just need to test, and work with Jeff on DNS, and schedule a day to do the move.

Website migration

  • Next steps for migration?

  • Jenkins and MTT is all that's left.

  • Got download numbers to Edger, some interesting data he'll share (devel list?)

  • Non-profit stuff.

    • Cisco is okay with.
    • Quarterly opportunity to apply is coming up. We fill out a proposal, and they will accept or reject.
      • We're on their agenda (end of september).
    • Should get non-profit prices for github dues (Possibly reduced or $0) Unfortunately bill is coming up soon, so Jeff will ask if we can just pay for a month or two, instead of full year.
  • Contribution agreement. Now that we're on github, we're getting more and more anonymous contributions.

    • Some folks (who haven't yet signed contributor's agreement) have some IPv6 fixes, and kind of new feature.
      • First patch is more bugfix, and restores IPv6 functionality.
      • Second patch is more of a non-local feature. Bigger, more technical discussion needed.
    • These are critical for Mellanox (in master). Need to be able to run on IPv6 only systems.
    • If it's a one-off, just remind them to check with company and make sure it's okay, then do git signoff to make sure you understand it.
    • Just put this into the contributors document. Modify this document to explain the process.
  • Other Open Source communities have a big list of things that contributors agree to when they git signoff on a commit. We could do something like that.

    • The Agreement also protects the company that contributes.
    • Changing the rules on contributors members.
    • 2nd issue is that if it's a "big" change that we'd normally require a contributor agreement, members need to have their legal teams review the change in contributor agreements.
    • Once Jeff writes up alternate language for contributor's agreement, then all members should get them reviewed by their legal departments.
  • C89 -

    • By removing the C99 check, he's defaulting back to GNU89, which isn't even a superset of C89.
    • Giles approach is a bit better, but not a good idea.
    • when you have a bad GCC, can fix glibc version BLAH, these are the functions that failed to link.
    • Patches are incomplete, because glibc on system was built with GCC without C89 compiler. It's not C89, it's GNU 89.
      • inlining is different.
      • If you can't use GNU 89, can add an attribute to functions to make things compile.
    • Consensus to drop this, if submitter wants to answer questions asked on list, we'll consider it.

Open MPI Developer's Meeting

  • Date of another face to face. January or February? Think about, and discuss next week.

  • Non-Profit

    • Ralph sent email out to list, please comment either pro/con.

Status Update Rotation

  1. LANL, Houston, IBM
  2. Cisco, ORNL, UTK, NVIDIA
  3. Mellanox, Sandia, Intel

Back to 2016 WeeklyTelcon-2016

Clone this wiki locally