-
Notifications
You must be signed in to change notification settings - Fork 2
Notes 2019 10 31
Marc-Andre Hermanns edited this page Nov 4, 2019
·
4 revisions
- Marc-Andre Hermanns
- Bengisu Elis
- Chris Chambreau
- Joachim Protze
- Josh Cottingham
- Martin Schulz
-
In OMPT the address given to the tool is the address where to jump back to
- Usually is still in a register within the first call outside of application
- Cheap to get
- For a wrapper this is trivial to get
- From a PMPI to PnMPI
- Access to this address has recently been added to PnMPI
- On x86 architectures you can
- substract 1 from this address
- feed this to
addr2line
- get source line info from where the function was called
- cheapest way to obtain such information
- no stack tracing needed
- you can store just the address and resolve on demand (or only once)
- Helps with address-space randomisation
- Can the address help with stack walking
- Not directly, as it is a pointer to the next instruction
- no stack information
- Frame address of the first frame in the runtime would be interesting for this
- Pointer to the stack frame where the application entered MPI
- Should also be easy to obtain right for the MPI implementation
- Not directly, as it is a pointer to the next instruction
- Should we expose this information through a semi-opaque type like
MPI_Status
?- Quick access to known parameters
- Allow implementation to provide internal information as well
- Just a single additional argument to the QMPI callbacks (future proof)
- Usually is still in a register within the first call outside of application
-
Thread-safety to register and de-register tools at runtime
- Dynamic registration/deregistration at runtime may become problematic
- Global registry would be needed
- needs to be locked every time to look into the table
- runtime overhead not worth the additional
- In OMPT a tool should just return (in the callback) instead of trying to deregister
- for QMPI query table at the begining and then data can live in a local variable
- atomics would not really help
- memory fences prevent hashing and slow down performance
- all threads look at same data structure (no copies possible)
- access across NUMA boundaries (incurs performance hit)
- Maybe less of a Problem for MPI as calls are less frequent?
- This would be good to verify on a broader set of platforms
- What do we want to optimize for?
- OMPT -> optimize for no tool in the chain
- Some additional penalty for adding a tool
- Do static branch prediction (assume code-path without tool to be likely)
- Would probably for favoured by implementations
- OMPT -> optimize for no tool in the chain
-
Static linking/loading
- Always both present or not?
- What about extensions rather than tools?
- Can you make the same static library active at the same time?
- Tool needs to handle this
- Dynamic tool may need
- Dynamic library linked at link time (loaded)
- Dynamic library opened vi
dlopen