Skip to content
Kathryn Mohror edited this page Jun 17, 2016 · 2 revisions

Participants

  • Jean-Baptiste Besnard
  • John DelSignore
  • Marc-Andre Hermanns
  • Kathryn Mohror
  • Bob Moench
  • Anh Vo

Notes

MPIR-2 and being_debugged

  • In the Forum face to face, discussed how the functionality currently implemented with the being_debugged variable will be supported in MPIR-2
  • Since we changed the definition of being_debugged such that it can be defined in all MPI processes and not just the starter process, do we want to follow that model in MPIR-2?
  • The consensus during the forum discussion was no, we don't want to have a callback for every process each time attach/detach occurs
  • Could we keep this symbol based in MPI processes, and use an API-based mechanism in the starter process for MPIR-2?
  • What is the difference between debug_gate variable and being_debugged variable? Are they just variations of the same thing?
    • No, debug_gate is a synchronization mechanism so that the MPI processes don't run away (and potentially terminate) before the debugger has a chance to attach to them
    • There are other implementations of this besides debug_gate. Some use a barrier, some use exit from execv
  • Are we dictating implementation by defining these variables?
    • Yes
  • Should we change the way we are managing such that implementations can choose for themselves what makes sense? Variable vs barrier vs ??
    • Yes
    • We should make this part of the dll, and leave details out of the MPIR spec
    • Debugger calls into dll and indicates which processes it wants to debug, and it happens somehow
    • Does this mean we would need per process initialization and per process dll?
      • No, whatever the mechanism is is up to the implementation
  • Need an entry point into the dll that says: these are the processes I want to debug

Compatibility of MPIR and MPIR-2

  • At Forum, we decided that if the MPI implementation makes the MPIR-1 symbols available, then it needs to adhere to the MPIR-1 spec
  • If an MPI only wants to support MPIR-2, then the symbols are not available

What is the proctable going to look like in MPIR-2?

  • Currently is a table of all MPI processes indexed by their rank in COMM_WORLD
  • In MPIR-2 we want to change this to support things we see in existence now, or expect to come in the future
    • MPI implementations where MPI processes are implemented as threads (e.g. MPC)
    • MPI endpoints
    • MPI sessions
    • dynamic processes
    • ?
  • The index being the rank in WORLD needs to go
    • What if WORLD is not defined?
    • What if different WORLDS combine?
    • What if processes come and go?
  • From the debugger point of view, all it needs is a list of OS processes because that's what it attaches to
  • However, users think of MPI processes in terms of ranks, what if they want to attach to one or a subset?
    • Many debugging cases involve a problem always occurring in rank X
  • Does the starter process know where the ranks are?
  • Yes, probably, but ranks only make sense with respect to a communicator.
    • Can we always assume that WORLD will exist as we know it today in MPI? Probably not...
  • There is a useful historical assumption about rank mapping and place in table
    • Going to need another interface to get that information
  • We want a more general model that can describe where a process lives in some context
    • is context a communicator? a session? an instant of time?
    • WORLD could have multiple MPI processes per OS PID
  • How can a user make sense of what they are debugging?
  • What about a generic labeling mechanism?
    • MPI gives a list of OS processes with some label, e.g. "Ranks 0-7", e.g. "progress thread"
    • How does a user figure out what the labels mean?
    • Well, hopefully they would be intuitive
    • Does the labeling scheme limit scalability?
      • E.g. tools that group processes based on behavior, like STAT
      • If the ranks aren't ints, then how can an aggregation tool scalably represent groups of processes?
  • Thread based MPIs
    • Same PID could appear multiple times in the table
    • Want a mechanism that says "I have this thread. Is it an MPI task or not?"
  • What about a mechanism that tells the mapping?
    • Either: this is a one-to-one mapping (like MPIR-1) or it is not
    • Would that work?
  • This is hard since we are trying to design for moving targets in many cases
    • What will happen with endpoints, sessions, dynamic processes?
  • We need more input from implementors
    • Let's reach out to some folks and see if they are interested in participating
    • Don't want to design something that is incompatible with a particular implementation
Clone this wiki locally