Skip to content
Marc-Andre Hermanns edited this page Jan 29, 2016 · 2 revisions

Participants

  • Jean-Baptiste Besnard
  • James Custer
  • John DelSignore
  • Marc-Andre Hermanns
  • Nathan Hjelm
  • Michael Knobloch
  • Jeff Squyres

Notes

Problem scenario: Logically concurrent messages

  • Message-matching scenario as one of the drivers to future interfaces
    • Scalasca's scalable analysis reuses the recorded communication paths
    • Algorithm requires that the same send/recv call matching is taking place
    • With logically concurrent messages between the same source & destination this cannot be guaranteed, e.g.,
      • Two threads of the same sender process send to two threads of same receiver process
      • New assertion messages_may_overtake
    • Solution must at least enable the detection and correction of non-deterministic delivery

Potential solutions

  • Extending MPI_Status to enable query of some sort of sequence number of a message

    • Will likely break ABI within MPI implementation (i.e., a new version of the same MPI will not be binary-compatible to an earlier version)
    • Would work for receives and non-blocking sends but not for blocking sends
      • What if measurement tool replaces MPI_Send with a call to PMPI_Isend+Wait via wrapper?
    • Would a general meta-data query on handles be an interesting solution here?
  • MPI_T Extension for event callbacks

    • Successor of non-standardized PERUSE interface
    • Should blend into existing MPI_T interfaces for PVAR and CVAR query & manipulation
    • Should resemble the look & feel of PREUSE interface where possible
      • Open-MPI would like to move to a standardized interface
      • PERUSE is used by a small but active community, so it cannot be deprecated without replacement
    • Open-MPI could work as an initial testbed for a reference implementation
    • Nathan & Marc-Andre want to flesh out an initial interface proposal
    • Semantics need to be clear on where the callback will occur:
      • On the same MPI process but potentially a different thread?
      • Performance tools may need to be able to track these across all threads of a process
        • Callback interface usually also provides user/tool-level void* argument
  • Piggybacking interface

    • Hard to get right
    • Has been on and off the agenda for many years
    • We need to re-check how high the costs for on-the-fly datatype creation & usage is on current hardware
      • Current hardware already supports IOVs of length 2, maybe 3 would not be so hard to get right
      • Chicken-egg problem: IOV length will not likely be increased without a good use case
      • Performance penalty is probably only significant for small an mid-sized messages
Clone this wiki locally