Notes 2016 06 17

Participants

Jean-Baptiste Besnard
John DelSignore
Marc-Andre Hermanns
Kathryn Mohror
Bob Moench
Anh Vo

Notes

MPIR-2 and being_debugged

In the Forum face to face, discussed how the functionality currently implemented with the being_debugged variable will be supported in MPIR-2
Since we changed the definition of being_debugged such that it can be defined in all MPI processes and not just the starter process, do we want to follow that model in MPIR-2?
The consensus during the forum discussion was no, we don't want to have a callback for every process each time attach/detach occurs
Could we keep this symbol based in MPI processes, and use an API-based mechanism in the starter process for MPIR-2?
What is the difference between debug_gate variable and being_debugged variable? Are they just variations of the same thing?
- No, debug_gate is a synchronization mechanism so that the MPI processes don't run away (and potentially terminate) before the debugger has a chance to attach to them
- There are other implementations of this besides debug_gate. Some use a barrier, some use exit from execv
Are we dictating implementation by defining these variables?
- Yes
Should we change the way we are managing such that implementations can choose for themselves what makes sense? Variable vs barrier vs ??
- Yes
- We should make this part of the dll, and leave details out of the MPIR spec
- Debugger calls into dll and indicates which processes it wants to debug, and it happens somehow
- Does this mean we would need per process initialization and per process dll?
  - No, whatever the mechanism is is up to the implementation
Need an entry point into the dll that says: these are the processes I want to debug

Compatibility of MPIR and MPIR-2

At Forum, we decided that if the MPI implementation makes the MPIR-1 symbols available, then it needs to adhere to the MPIR-1 spec
If an MPI only wants to support MPIR-2, then the symbols are not available

What is the proctable going to look like in MPIR-2?

Currently is a table of all MPI processes indexed by their rank in COMM_WORLD
In MPIR-2 we want to change this to support things we see in existence now, or expect to come in the future
- MPI implementations where MPI processes are implemented as threads (e.g. MPC)
- MPI endpoints
- MPI sessions
- dynamic processes
- ?
The index being the rank in WORLD needs to go
- What if WORLD is not defined?
- What if different WORLDS combine?
- What if processes come and go?
From the debugger point of view, all it needs is a list of OS processes because that's what it attaches to
However, users think of MPI processes in terms of ranks, what if they want to attach to one or a subset?
- Many debugging cases involve a problem always occurring in rank X
Does the starter process know where the ranks are?
Yes, probably, but ranks only make sense with respect to a communicator.
- Can we always assume that WORLD will exist as we know it today in MPI? Probably not...
There is a useful historical assumption about rank mapping and place in table
- Going to need another interface to get that information
We want a more general model that can describe where a process lives in some context
- is context a communicator? a session? an instant of time?
- WORLD could have multiple MPI processes per OS PID
How can a user make sense of what they are debugging?
What about a generic labeling mechanism?
- MPI gives a list of OS processes with some label, e.g. "Ranks 0-7", e.g. "progress thread"
- How does a user figure out what the labels mean?
- Well, hopefully they would be intuitive
- Does the labeling scheme limit scalability?
  - E.g. tools that group processes based on behavior, like STAT
  - If the ranks aren't ints, then how can an aggregation tool scalably represent groups of processes?
Thread based MPIs
- Same PID could appear multiple times in the table
- Want a mechanism that says "I have this thread. Is it an MPI task or not?"
What about a mechanism that tells the mapping?
- Either: this is a one-to-one mapping (like MPIR-1) or it is not
- Would that work?
This is hard since we are trying to design for moving targets in many cases
- What will happen with endpoints, sessions, dynamic processes?
We need more input from implementors
- Let's reach out to some folks and see if they are interested in participating
- Don't want to design something that is incompatible with a particular implementation

Quick Access

Meetings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notes 2016 06 17

Participants

Notes

MPIR-2 and being_debugged

Compatibility of MPIR and MPIR-2

What is the proctable going to look like in MPIR-2?

Quick Access

Active Topics

Working-group Issues

Clone this wiki locally