Skip to content

WeeklyTelcon_20170822

Geoffrey Paulsen edited this page Jan 9, 2018 · 1 revision

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Josh Hursey (IBM)
  • Jeff Squyres (Cisco)
  • Geoff Paulsen (IBM)
  • Artem (Mellanox)
  • Brian Barrett
  • Ralph Castain (Intel)
  • Edgar Gabriel
  • Todd Kordenbrock
  • Mohan

Agenda

Review v2.0.x Milestones v2.0.4

  • Nothing new to report.

Review v2.x Milestones v2.1.2

  • v2.1.2
    • Howard put out an RC last week.
    • Big Endian Support Stuff to put in here - 4105 needs review.
  • PR4059 - NEWs
    • Some discussion about -xrc issue. Is this a regression from v2.x or existed always.
    • Artem tested:
      • Still an openib issue, only on the v2.x branch. v2.0.x and v3.0.x does NOT reproduce.
      • not reproducible does not mean it's not there.
  • Jeff won't put out an RC, until he can talk to Howard later this week.

Review v3.0.x Milestones v3.0

  • Last week's RC did NOT go out.
  • How close is Ralph to the next PMIx update. Ready to go with v2.0.1. Not a lot of change from what's in there now and v2.0.1
  • Will get PMIx v2.0.1 in today, let MTT run tonight, and do RC4 tomorrow (skipping RC3)
  • What to do with Issue4126?
    • Could be fairly easy to fix. Doesn't translate the lower level error into MPI_ERROR_NO_KEY?
    • PR3743 was a recent change.
  • Ralph has 4 PRs for v3.0.x, a few need reviews. Brian will do when he gets some time.

Review Master Master Pull Requests

  • Does appear we have a number of PMI install failures on master.
    • Perhaps could have been fixed, since it's dated on the 12th. Cisco / Absoft.
      • 'nm_check_prefix' - this test requires an ENV to run. There is a directive in Makefile.am to run.
        • This test is used to report exported symbols that shouldn't be exported.
        • For some environments this check isn't working. Will get Mark to look at again.
        • Mark (IBM) will look at.
    • gds_dstore in PMIx - compile error. Undefined reference to opal_atomic.
  • Something is missing in hwloc tarball - missing netloc
  • libpmix failed in linking against libopal.
    • Ralph will look at these.

MTT / Jenkins Testing

MTT Dev status:

  • Discussion on moving the nm_check test from 'make check' to MTT tests.
    • Want this to fail at PR creation time.
    • 'make check' - users run after they build Open MPI on their systems, so 3rd party libraries cause this to fail.
    • Prefer this NOT in MTT, but for each PR. Perhaps a separate jenkin's PR test.
    • Agree that a check like is good, but MTT is too late.
    • disable_dlopen caused it to fail.
    • Absoft added _C version of MPI symbols.
  • Do we even want user's running this?
    • Open MPI users probably run 'make check'
    • make it stand alone script included in tarball.

Jenkins CI

Master

  • Some things missing out of SHMEM - Issue4098
  • userlist talking about ROCE issue
  • PR4121 - fixes component linking model to link against project level library.
    • Couldn't figure out something that would work well with automake.
    • Only adding one library to each component.
    • Josh wrote a script to inject these libraries across the whole tree

Master testing

  • Cisco added a bunch of new mtt platforms, but not sure why the v3.0.x tests didn't run in last 48 hours.

Exceptional topics

  • AWS - 1 year renewal coming up.

    • Thank you Amazon.
    • Brian will take care of renewal.
  • Ompibot email forwarding setup - Thanks Ralph

  • Next face-to-face meeting

    • Jan / Feb
    • Dallas, San Jose, Portland, Albuquerque

Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2017 WeeklyTelcon-2017

Clone this wiki locally