Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Introduce new inject calls that take buffer descriptor #10433

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

j-xiong
Copy link
Contributor

@j-xiong j-xiong commented Oct 2, 2024

Existing inject calls (for msg, tagged, rma, and atomic ops) don't provide a way to supply the memory descriptor of the data buffer. This presents difficulty when FI_HMEM is enabled. Providers either have to detect the HMEM interface the buffer is allocated with, or declare the supported inject_size be zero. Both may result in sub-optimal application performance.

Here a set of new inject calls are introduced. The difference of the new calls vs the corresponding existing calls is the addition of a new parameter "desc", like what have already existed in the send calls. With the new calls, the provider is able to know the iface of the data buffer and handle it properly.

Existing inject calls (for msg, tagged, rma, and atomic ops) don't provide
a way to supply the memory descriptor of the data buffer. This presents
difficulty when FI_HMEM is enabled. Providers either have to detect the
HMEM interface the buffer is allocated with, or declare the supported
inject_size be zero. Both may result in sub-optimal application performance.

Here a set of new inject calls are introduced. The difference of the new
calls vs the corresponding existing calls is the addition of a new
parameter "desc", like what have already existed in the send calls.
With the new calls, the provider is able to know the iface of the
data buffer and handle it properly.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
@shefty
Copy link
Member

shefty commented Oct 2, 2024

A purpose of the inject call is that the buffer doesn't have to be registered. If it needs to be registered, the user can go through any of the other send operations. It seems the gap here is to disallow hmem buffers from being passed into the inject call if FI_MR_HMEM is set.

@j-xiong
Copy link
Contributor Author

j-xiong commented Oct 2, 2024

The actual usage of the inject calls can deviate. There are many cases that the buffer is pre-registered anyway (e.g. MPI eager buffer), and the user just wants the "not holding the buffer and no completion" semantics of the inject calls. Although that can be achieved with regular send with FI_INJECT flag and selective completion, but it is more cumbersome that way.

@shefty
Copy link
Member

shefty commented Oct 2, 2024

The normal send call is usable, however. Here, the app has already registered the buffer. But the app wants the provider to copy the GPU memory somewhere, so that when the call returns the app can own the buffer again. I'm struggling to see the benefit of introducing a new call or to follow the application use case. Copying off GPU memory isn't exactly fast, which is what the inject call targets.

@j-xiong
Copy link
Contributor Author

j-xiong commented Oct 3, 2024

The difficulty comes from handling the inject calls over the LNX provider when the underlying providers have different inject size and FI_MR_HMEM settings. For example, the CXI provider doesn't need FI_MR_HMEM mode and can always return positive inject size; but the SHM provider requires FI_MR_HMEM mode and has to return 0 inject size when FI_HMEM is enabled. If the LNX provider takes the conservative way by setting inject size to 0 when FI_HMEM is enabled, the application can't use inject calls at all even if the traffic goes through CXI. If the LNX provider setting the inject size to the value returned by CXI, it needs to handle device buffer manually if the traffic chooses SHM -- either register + copy + inject, or register + send. The registration here can be redundant if the buffer has already been registered by the application. Application can call these new inject functions if the buffer has already been registered, saving the provider from a duplicated registration.

@shefty
Copy link
Member

shefty commented Oct 3, 2024

The problem is the current inject calls are not usable. That needs to be indicated somehow, e.g. by restricting inject to host buffers only by definition, or have separate inject sizes for host vs hmem buffers or updating FI_MR_HMEM definition somehow. That needs to be fixed, otherwise, we're simply introducing magic developer required knowledge as to why one call needs to be used over another.

In this case, new API calls are for convenience. They aren't needed and likely have minimal performance benefit over a common path.

If LNX can return an inject size for transfers that go over SHM, then SHM should be able to support the same inject size directly.

@amirshehataornl
Copy link
Contributor

When you say: "A purpose of the inject call is that the buffer doesn't have to be registered. " What line of reasoning was used to come to that determination?

@shefty
Copy link
Member

shefty commented Oct 3, 2024

Because there's no damn place to provide the memory descriptor. Seems pretty damn obvious.

@amirshehataornl
Copy link
Contributor

Don't understand the tone of the response. It does not answer my question.

The lack for the descriptor in the original APIs might be due to a historical reason? That was my question and your answer says that the reason the descriptor is not there, is because it's not there.

As far as I understand the inject call is used to inject the data in the message in order not to incur the performance penalty of a more complex protocol. With the introduction of HMEM there is now a reason to add the descriptor, because the data to be injected resides in HMEM. The way CXI got around this is to do a lookup on the address and cache the result. At least in the context of CXI the inject path actually adds some significant performance gains.

@aingerson
Copy link
Contributor

aingerson commented Oct 3, 2024

I think what grumpy Sean is saying is that the behavior of the inject (in terms of dropping the completion and returning the buffer immediately) can be replicated with the sendmsg call. So the only additional thing the call is adding is that you don't need to register the buffer, hence the original reasoning behind the call. If we add fi_inject2 with the descriptor, it conflicts with what that call is providing (lack of registration). So what we should be doing is specifying that the inject calls can only be called with host memory (or memory does not have to be registered) and then move the middlewares to using fi_sendmsg() to replicate the inject behavior that provides the optimization for device memory if desired.

@shijin-aws
Copy link
Contributor

shijin-aws commented Oct 3, 2024

I agree with Sean here, I prefer to keep fi_*inject* call without MR descriptor to indicate an MR is not needed & the send buffer is host type (when FI_MR_HMEM is required) so we have a clear logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants