Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/lnx: LINKx (lnx) provider #10437

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

amirshehataornl
Copy link
Contributor

This PR introduces the LINKx (lnx) provider.

It currently supports linking only two providers, first of which is the shm provider.
It has been tested with SHM+CXI provider and SHM+RXM provider.
The data structures are designed to allow for support of a multi-rail feature, where
multiple providers and multiple endpoints can be linked together.

Future work items, which I plan to tackle after the initial PR lands:

  1. Support for all the libfabric APIs. Currently it supports the tagged APIs and partial support for rma and atomics.
  2. General support for address caching in lnx
  3. Enabling HW offload support.
  4. Multiple provider linkage
  5. Multi-rail feature

Add the FI_PEER capability bit to the CXI provider fi_info

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
On cq_open, check the FI_PEER_IMPORT, if set, set all internal cq operation
to be enosys, with the exception to the read callback.

The read callback is overloaded to operate as a progress callback
function. Invoking the read callback will progress the enpoints linked to
this CQ.

Keep track of the fid_peer_cq structure passed in.

If the FI_PEER_IMPORT flag is set, then set the callbacks in cxip_cq structure
which handle writing to the peer_cq, otherwise set them to the ones which
write to the util_cq.

A provider needs to call a different set of functions to insert
completion events into an imported CQ vs an internal CQ.

These set of callback definition standardize a way to assign a different
function to a CQ object, which can then be called to insert into the CQ.

For example:

	struct prov_cq {
		struct util_cq *util_cq;
		struct fid_peer_cq *peer_cq;
		ofi_peer_cq_cb cq_cb;
	};

When a provider opens a CQ it can:

	if (attr->flags & FI_PEER_IMPORT) {
		prov_cq->cq_cb.cq_comp = prov_peer_cq_comp;
	} else {
		prov_cq->cq_cb.cq_comp = prov_cq_comp;
	}

Collect the peer CQ callbacks in one structure for use in CXI.

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Restructure the code to allow for posting on the owner provider's shared
receive queues.

Do not do a reverse lookup on the AV table to get the fi_addr_t, instead
register an address matching callback with the owner. The owner can then
call the address matching callback to match an fi_addr_t to the source
address in the message received.

This is more efficient as the peer lookup can be an O(1) operation;
AV[fi_addr_t]. The peer's CXI address can be compared with the CXI address
in the message received.

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Handle the case where a message from a peer arrives before the peer is
inserted.

Implement the callflow to support this scenario.

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
The LINKx (lnx) provider offers a framework by which multiple providers
can be linked together and presented as one provider to the application.
This abstracts away the details of the traffic providers from the
application. This iteration of the provider allows linking only two
providers, shm and another provider, ex; CXI or RXM. The composite
providers which are linked together need to support the peer
infrastructure.

In order to use the lnx provider the user needs to:

export FI_LNX_PROV_LINKS="shm+<inter-node provider>"

ex:

export FI_LNX_PROV_LINKS="shm+cxi"
  or
export FI_LNX_PROV_LINKS="shm+tcp;ofi_rxm"

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant