Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Proposal to base VkDeviceMemory on MTLHeap always #2309

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

aitor-lunarg
Copy link
Collaborator

Incomplete draft since it requires reworking how VkImage and VkBuffer handle the memory, but should be enough to provide an overview of the proposed changes.

@billhollings would like to know your thoughts. Relevant changes are in MVKDeviceMemory.h and MVKDeviceMemory.mm
The idea is to move all memory ownership to VkDeviceMemory. This would also simplify implementing a certain extension. Let me know if there's any issue that comes to mind. I'll also leave some comments with questions, if you could address those, it would be of great help to be able to finish with the changes once we align.

Incomplete draft since it requires reworking how VkImage and VkBuffer
handle the memory, but should be enough to provide an overview of
the proposed changes.
@@ -70,11 +68,11 @@
// Coherent memory does not require flushing by app, so we must flush now
// to support Metal textures that actually reside in non-coherent memory.
if (mvkIsAnyFlagEnabled(_vkMemPropFlags, VK_MEMORY_PROPERTY_HOST_COHERENT_BIT)) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning behind this? Wouldn't we want to map MTLStorageModeShared to VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT|VK_MEMORY_PROPERTY_HOST_COHERENT_BIT and MTLStorageModeManaged to VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT? Later in the flushToDevice call we check that the storage is MTLStorageModeManaged, so wouldn't it be just better to check for that here and assume that any mappable memory is host visible (I believe validation would complain if it's not). Similar situation with map call.

Copy link
Contributor

@billhollings billhollings Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On macOS discrete GPU's, Metal does not always support MTLStorageModeShared for textures. Search the MoltenVK code base for references to MTLStorageModeManaged to see when we ignore VK_MEMORY_PROPERTY_HOST_COHERENT_BIT and force the use of MTLStorageModeManaged.

Metal has relaxed this restriction in later versions (see the MVKPhysicalDeviceMetalFeatures::sharedLinearTextures feature), so we might be able to rework the overall logic a bit, but that should already be taken into consideration wherever we force MTLStorageModeManaged.

This issue might be problematic for an OS that doesn't support sharedLinearTextures. For macOS, that means 10.15.6. Perhaps we can make that our minimum for MoltenVK after this (although I would like to be able to support the 5 year window that Apple seems to cover on Xcode). Or perhaps if sharedLinearTextures is disabled, we could just error out and report that an image can't be bound to a shared-memory device memory when an attempt is made to do that.

Copy link
Contributor

@billhollings billhollings Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get around the limitation of private-only MTLHeaps on non-Apple GPU's, I'm wondering if we could allocate a temporary managed MTLBuffer when a map call occurs (and remove it on unmap), and do a BLIT from it to the aliased private MTLBuffer on flush?

In this case, we would have to indicate that there is no host-coherent memory options, so that the app would have to flush when it makes changes to the buffer content.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about the issue for some time now. I'll dump my ideas here so I don't forget about them and we can decide.

First of, let's understand what Vulkan requires in terms of host visible coherent memory (which is our main issue due to non Apple silicon hardware). The spec states in https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#memory-device-properties the following:
There must be at least one memory type with both the VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT and VK_MEMORY_PROPERTY_HOST_COHERENT_BIT bits set in its propertyFlags.
and
For any memory allocated with both the VK_MEMORY_PROPERTY_HOST_COHERENT_BIT and the VK_MEMORY_PROPERTY_DEVICE_COHERENT_BIT_AMD, host or device accesses also perform automatic memory domain transfer operations, such that writes are always automatically available and visible to both host and device memory domains.

So we need to provide a host visible coherent type always and any memory writes either in host or device require no flushing to be visible on both.

The other restriction Vulkan imposes in https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#resources-association is the following:
If buffer is a VkBuffer not created with the VK_BUFFER_CREATE_SPARSE_BINDING_BIT or VK_BUFFER_CREATE_PROTECTED_BIT bits set, or if image is a linear image that was not created with the VK_IMAGE_CREATE_PROTECTED_BIT bit set, then the memoryTypeBits member always contains at least one bit set corresponding to a VkMemoryType with a propertyFlags that has both the VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT bit and the VK_MEMORY_PROPERTY_HOST_COHERENT_BIT bit set. In other words, mappable coherent memory can always be attached to these objects.

So, any buffer and linear texture (except if created with some flags) must have at least a memory type that is host visible and coherent. We can work around this by limiting what we expose through vkGetPhysicalDeviceImageFormatProperties, more on this later.

Now, looking at Metal texel buffers (MTLBuffers anyway) have a limit according to https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf which is 256MB, and is also limited by MTLDevice.maxBufferLength.

I do believe the way forward can be allocating private-only MTLHeaps, providing a fake memory on map, and flushing it to the MTLHeap. However, there are a few caveats that need addressing due to the previously mentioned points:

  1. Host coherent memory requires that any write/read actually has the same value as the GPU counterpart memory. Why is this an issue? We cannot use map/unmap nor flush/invalidate Vulkan calls to do the flushing CPU memory to GPU memory and vice-versa. The user may map a VkDeviceMemory object, modify its contents by GPU commands, and read values from the mapped memory. I believe this case is not correctly handled in MoltenVK as of now either.
  2. If we decided to use a MTLBuffer we are limited to either 256MB or MTLDevice.maxBufferLength. What happens if the mapped memory is bigger? How do we provide a contiguous memory for the user? Unsure how MoltenVK handles this at the moment.

Potential solutions to point 1 (will use GFXR as baseline since they've already addressed this issue and aligns to what I had in mind):

Potential solutions to point 2 (when I use the limitation I'm refering to either 256MB or MTLDevice.maxBufferLength for this point, the smallest of those two):

  • Expose a separate heap with the limitation as size that will be the one the user is allowed to allocate host visible coherent memory. The big downside to this approach is the potential low memory it would have.
  • Limit linear texture sizes via vkGetPhysicalDeviceImageFormatProperties to not exceed the limitation. However, this does not limit users to allocate a bigger VkDeviceMemory than the limitation.
  • Allocate host memory and provide that to the user. We also maintain a copy of this memory in one or multiple MTLBuffers and do the flush/read to GPU using them. The big downside to this is the amount of memory we are using (3 times for the mapped region: one for the CPU, one for the buffers, one for the heap).

All of this is assuming we cannot create buffers with MTLStorageShared in non Apple silicon devices. I don't have a device with such set up, so I cannot verify this. I'm basing off MoltenVK's approach for those devices. If we had MTLStorageShared then a couple of things could be simplified, since we could just allocate buffers instead of heaps. But I believe at least point 2 would require addressing.

Any thoughts, comments or concerns are welcome!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this is managing syncing non-Apple GPU memory. I think there will always be situations where someone opens a memory map, leaves it open, and constantly makes changes to it, so we may always have to accept that we will have incomplete solutions.

There must be at least one memory type with both the VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT and VK_MEMORY_PROPERTY_HOST_COHERENT_BIT bits set in its propertyFlags.

It might be possible to modify this within the spec with something like "If the VK_KHR_portability_subset extension is not enabled, ...", allowing us to back-out of certain standard Vulkan requirements, and then modify CTS to not fail when this happens.


MVKSmallVector<MVKBuffer*, 4> _buffers;
MVKSmallVector<MVKImageMemoryBinding*, 4> _imageMemoryBindings;
std::mutex _rezLock;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this since my understanding is they are used to read/write to the resources that use this memory device objects. Allocated resources now will be placed in heaps and a buffer that spans the whole heap is provided to accomplish this. The only exception are imported resources, these are handled slightly different. The explanation is in code comments, but in short, textures require to be dedicated for export/import by VK_EXT_metal_objects, so we can create buffers when importing to account for host mapping due to Metal not providing anything; and buffers will just not have a heap if they are not backed by it at import.

@@ -44,14 +45,11 @@
return reportError(VK_ERROR_MEMORY_MAP_FAILED, "Memory is already mapped. Call vkUnmapMemory() first.");
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I am not against these checks, it feels like they are redundant since VVL should catch those with VUID-vkMapMemory-memory-00678 and VUID-vkMapMemory-memory-00682

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. These are holdovers from when MoltenVK was not as integrated with the SDK and validation environment. These can all go.

Copy link
Contributor

@billhollings billhollings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting this. I've been thinking about moving to MTLHeaps only for a while now.

This will also help with Metal Argument resource usage efficiencies. But I'll take care of that later once this is pulled in.

This makes MVKConfiguration::useMTLHeap obsolete, so that needs too be handled and documented. But again, I can take care of that too.


/** Resturn the Metal cache mode from the Metal resource options */
MTLCPUCacheMode mvkMTLCPUCacheMode(MTLResourceOptions options);

Copy link
Contributor

@billhollings billhollings Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are for internal use and don't need to be in mvk_datatypes.h. I suggest you move them (along with mvkMTLResourceOptions()) to a support functions section at the bottom of MVKDeviceMemory.h/mm. See other support function areas at the bottom of other files for reference.

The ancient original intention of mvk_datatypes.h was to provide apps access to convenience functions for mapping Vulkan to Metal types. TBH...I'm sure no-one ever uses it, and eventually I'd like to remove it and just use internal non-public files instead.

Also Returns is misspelled in the comment.


MVK_PUBLIC_SYMBOL MTLCPUCacheMode mvkMTLCPUCacheMode(MTLResourceOptions options) {
return static_cast<MTLCPUCacheMode>((options & MTLResourceCPUCacheModeMask) >> MTLResourceCPUCacheModeShift);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I suggest you just add these as support functions at the bottom of MVKDeviceMemory.h/mm.

@@ -44,14 +45,11 @@
return reportError(VK_ERROR_MEMORY_MAP_FAILED, "Memory is already mapped. Call vkUnmapMemory() first.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. These are holdovers from when MoltenVK was not as integrated with the SDK and validation environment. These can all go.

enum class DedicatedResourceType : uint8_t {
NONE = 0,
BUFFER,
IMAGE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit. MoltenVK traditional style is to avoid all-caps for names (except in some old macro definitions). So it would be more consistent to call these None, Buffer, and Image.

// Having no buffer and texture being host accessible means we allocated memory for the mapping
if (!_mtlBuffer && mvkIsAnyFlagEnabled(_vkMemPropFlags, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT)) {
free(_map);
_map = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is _map set if there is no _mtlBuffer?

And if we're going to set it to nullptr, we should move that part below, so it's also set to nullptr for the case where we have a _mtlBuffer.

}

fail_alloc:
setConfigurationResult(reportError(VK_ERROR_OUT_OF_DEVICE_MEMORY, "vkAllocateMemory(): Could not allocate VkDeviceMemory of size %llu bytes.", _size));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use goto. Package this in a reportAllocFail() function, and then just call return reportAllocFail(); in the goto calls above.

Also, to cover someone trying to use this with ancient OS versions, can you add a check in that function for getMetalFeatures().placementHeaps, and if it's false, report a slightly different message indicating that the app must be running on macOS 10.15 or iOS/tvOS 13.0. Those are also our current minimum supported versions anyway.

@billhollings billhollings changed the title WIP: VkDeviceMemory changes proposal WIP: Proposal to base VkDeviceMemory on MTLHeap always Aug 15, 2024
if (mvkIsAnyFlagEnabled(_vkMemPropFlags, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT)) {
_mtlBuffer = [_mtlHeap newBufferWithLength:_size options:_options];
if (!_mtlBuffer) goto fail_alloc;
[_mtlBuffer makeAliasable];
Copy link
Contributor

@billhollings billhollings Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uggh. The Apple docs for makeAliasable indicate:

Important
This method is only valid for heap-allocated resources using the MTLHeapTypeAutomatic allocator.

Do we need to use placement heaps, or can we rework everything for automatic heaps? I guess the requirement of setting offsets on resources in a device memory makes it impossible not to use placement heaps.

Ah! Actually, from the Apple docs, it looks like with placement heaps, any overlapping areas are automatically aliased, so maybe we just need to avoid the use of makeAliasable.

Also, we should check the maximum size of a MTLBuffer on the device. It may not be able to span the entire MTLHeap.

Copy link
Collaborator

@cdavis5e cdavis5e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had initially wanted to enable MTLHeap by default when I first added support, but I discovered that rendering to MTLHeap-backed textures was broken on AMD hardware. Dunno if they've fixed that yet.

I actually think that MTLHeaps shouldn't be used for dedicated allocations. IIUC part of the point of VK_KHR_dedicated_allocation is to avoid the overhead of a general memory allocation. Note that MTLHeap-backed resources have higher overhead than non-heap resources.

I also think--and this might be more controversial--that MTLHeap shouldn't be used for coherent, host-visible memory, precisely because MTLHeaps cannot be created in Shared memory on some GPUs. I was disappointed to learn that Apple didn't support Shared MTLHeaps and wouldn't support them, because I wanted to use MTLHeap for everything non-dedicated.

@K0bin
Copy link

K0bin commented Aug 30, 2024

Doesn't always using MTLHeap for VkDeviceMemory also require a rework to do explicit barriers? Metal tracks the underlying heap of each resource, so it would lead to a lot of false sharing. Apple advises against using automatic tracking with MTLHeap in a ton of WWDC videos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants