WIP: Proposal to base VkDeviceMemory on MTLHeap always #2309

aitor-lunarg · 2024-08-15T01:07:03Z

Incomplete draft since it requires reworking how VkImage and VkBuffer handle the memory, but should be enough to provide an overview of the proposed changes.

@billhollings would like to know your thoughts. Relevant changes are in MVKDeviceMemory.h and MVKDeviceMemory.mm
The idea is to move all memory ownership to VkDeviceMemory. This would also simplify implementing a certain extension. Let me know if there's any issue that comes to mind. I'll also leave some comments with questions, if you could address those, it would be of great help to be able to finish with the changes once we align.

Incomplete draft since it requires reworking how VkImage and VkBuffer handle the memory, but should be enough to provide an overview of the proposed changes.

aitor-lunarg · 2024-08-15T01:13:55Z

MoltenVK/MoltenVK/GPUObjects/MVKDeviceMemory.mm

@@ -70,11 +68,11 @@
 	// Coherent memory does not require flushing by app, so we must flush now
 	// to support Metal textures that actually reside in non-coherent memory.
 	if (mvkIsAnyFlagEnabled(_vkMemPropFlags, VK_MEMORY_PROPERTY_HOST_COHERENT_BIT)) {


What's the reasoning behind this? Wouldn't we want to map MTLStorageModeShared to VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT|VK_MEMORY_PROPERTY_HOST_COHERENT_BIT and MTLStorageModeManaged to VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT? Later in the flushToDevice call we check that the storage is MTLStorageModeManaged, so wouldn't it be just better to check for that here and assume that any mappable memory is host visible (I believe validation would complain if it's not). Similar situation with map call.

On macOS discrete GPU's, Metal does not always support MTLStorageModeShared for textures. Search the MoltenVK code base for references to MTLStorageModeManaged to see when we ignore VK_MEMORY_PROPERTY_HOST_COHERENT_BIT and force the use of MTLStorageModeManaged.

Metal has relaxed this restriction in later versions (see the MVKPhysicalDeviceMetalFeatures::sharedLinearTextures feature), so we might be able to rework the overall logic a bit, but that should already be taken into consideration wherever we force MTLStorageModeManaged.

This issue might be problematic for an OS that doesn't support sharedLinearTextures. For macOS, that means 10.15.6. Perhaps we can make that our minimum for MoltenVK after this (although I would like to be able to support the 5 year window that Apple seems to cover on Xcode). Or perhaps if sharedLinearTextures is disabled, we could just error out and report that an image can't be bound to a shared-memory device memory when an attempt is made to do that.

To get around the limitation of private-only MTLHeaps on non-Apple GPU's, I'm wondering if we could allocate a temporary managed MTLBuffer when a map call occurs (and remove it on unmap), and do a BLIT from it to the aliased private MTLBuffer on flush?

In this case, we would have to indicate that there is no host-coherent memory options, so that the app would have to flush when it makes changes to the buffer content.

I've been thinking about the issue for some time now. I'll dump my ideas here so I don't forget about them and we can decide.

First of, let's understand what Vulkan requires in terms of host visible coherent memory (which is our main issue due to non Apple silicon hardware). The spec states in https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#memory-device-properties the following:
There must be at least one memory type with both the VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT and VK_MEMORY_PROPERTY_HOST_COHERENT_BIT bits set in its propertyFlags.
and
For any memory allocated with both the VK_MEMORY_PROPERTY_HOST_COHERENT_BIT and the VK_MEMORY_PROPERTY_DEVICE_COHERENT_BIT_AMD, host or device accesses also perform automatic memory domain transfer operations, such that writes are always automatically available and visible to both host and device memory domains.

So we need to provide a host visible coherent type always and any memory writes either in host or device require no flushing to be visible on both.

The other restriction Vulkan imposes in https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#resources-association is the following:
If buffer is a VkBuffer not created with the VK_BUFFER_CREATE_SPARSE_BINDING_BIT or VK_BUFFER_CREATE_PROTECTED_BIT bits set, or if image is a linear image that was not created with the VK_IMAGE_CREATE_PROTECTED_BIT bit set, then the memoryTypeBits member always contains at least one bit set corresponding to a VkMemoryType with a propertyFlags that has both the VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT bit and the VK_MEMORY_PROPERTY_HOST_COHERENT_BIT bit set. In other words, mappable coherent memory can always be attached to these objects.

So, any buffer and linear texture (except if created with some flags) must have at least a memory type that is host visible and coherent. We can work around this by limiting what we expose through vkGetPhysicalDeviceImageFormatProperties, more on this later.

Now, looking at Metal texel buffers (MTLBuffers anyway) have a limit according to https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf which is 256MB, and is also limited by MTLDevice.maxBufferLength.

I do believe the way forward can be allocating private-only MTLHeaps, providing a fake memory on map, and flushing it to the MTLHeap. However, there are a few caveats that need addressing due to the previously mentioned points:

Host coherent memory requires that any write/read actually has the same value as the GPU counterpart memory. Why is this an issue? We cannot use map/unmap nor flush/invalidate Vulkan calls to do the flushing CPU memory to GPU memory and vice-versa. The user may map a VkDeviceMemory object, modify its contents by GPU commands, and read values from the mapped memory. I believe this case is not correctly handled in MoltenVK as of now either.

If we decided to use a MTLBuffer we are limited to either 256MB or MTLDevice.maxBufferLength. What happens if the mapped memory is bigger? How do we provide a contiguous memory for the user? Unsure how MoltenVK handles this at the moment.

Potential solutions to point 1 (will use GFXR as baseline since they've already addressed this issue and aligns to what I had in mind):

https://github.com/LunarG/gfxreconstruct/blob/dev/USAGE_desktop_Vulkan.md#3-page_guard. In short, the idea is to trap the memory we provide to the user by changing the protection level and install a signal handler to catch accesses to that memory. Then we do our thing and flush/read to/from GPU in the handler.

https://github.com/LunarG/gfxreconstruct/blob/dev/USAGE_desktop_Vulkan.md#4-userfaultfd. Unsure if this one is doable in macOS, I believe it should be, but need to put more time into it to correctly understand the viability.

Potential solutions to point 2 (when I use the limitation I'm refering to either 256MB or MTLDevice.maxBufferLength for this point, the smallest of those two):

Expose a separate heap with the limitation as size that will be the one the user is allowed to allocate host visible coherent memory. The big downside to this approach is the potential low memory it would have.

Limit linear texture sizes via vkGetPhysicalDeviceImageFormatProperties to not exceed the limitation. However, this does not limit users to allocate a bigger VkDeviceMemory than the limitation.

Allocate host memory and provide that to the user. We also maintain a copy of this memory in one or multiple MTLBuffers and do the flush/read to GPU using them. The big downside to this is the amount of memory we are using (3 times for the mapped region: one for the CPU, one for the buffers, one for the heap).

All of this is assuming we cannot create buffers with MTLStorageShared in non Apple silicon devices. I don't have a device with such set up, so I cannot verify this. I'm basing off MoltenVK's approach for those devices. If we had MTLStorageShared then a couple of things could be simplified, since we could just allocate buffers instead of heaps. But I believe at least point 2 would require addressing.

Any thoughts, comments or concerns are welcome!

Most of this is managing syncing non-Apple GPU memory. I think there will always be situations where someone opens a memory map, leaves it open, and constantly makes changes to it, so we may always have to accept that we will have incomplete solutions.

There must be at least one memory type with both the VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT and VK_MEMORY_PROPERTY_HOST_COHERENT_BIT bits set in its propertyFlags.

It might be possible to modify this within the spec with something like "If the VK_KHR_portability_subset extension is not enabled, ...", allowing us to back-out of certain standard Vulkan requirements, and then modify CTS to not fail when this happens.

aitor-lunarg · 2024-08-15T01:18:39Z

MoltenVK/MoltenVK/GPUObjects/MVKDeviceMemory.h

-
-	MVKSmallVector<MVKBuffer*, 4> _buffers;
-	MVKSmallVector<MVKImageMemoryBinding*, 4> _imageMemoryBindings;
-	std::mutex _rezLock;


I've removed this since my understanding is they are used to read/write to the resources that use this memory device objects. Allocated resources now will be placed in heaps and a buffer that spans the whole heap is provided to accomplish this. The only exception are imported resources, these are handled slightly different. The explanation is in code comments, but in short, textures require to be dedicated for export/import by VK_EXT_metal_objects, so we can create buffers when importing to account for host mapping due to Metal not providing anything; and buffers will just not have a heap if they are not backed by it at import.

aitor-lunarg · 2024-08-15T01:25:39Z

MoltenVK/MoltenVK/GPUObjects/MVKDeviceMemory.mm

@@ -44,14 +45,11 @@
 		return reportError(VK_ERROR_MEMORY_MAP_FAILED, "Memory is already mapped. Call vkUnmapMemory() first.");
 	}


While I am not against these checks, it feels like they are redundant since VVL should catch those with VUID-vkMapMemory-memory-00678 and VUID-vkMapMemory-memory-00682

I agree. These are holdovers from when MoltenVK was not as integrated with the SDK and validation environment. These can all go.

billhollings

Thanks for submitting this. I've been thinking about moving to MTLHeaps only for a while now.

This will also help with Metal Argument resource usage efficiencies. But I'll take care of that later once this is pulled in.

This makes MVKConfiguration::useMTLHeap obsolete, so that needs too be handled and documented. But again, I can take care of that too.

billhollings · 2024-08-15T16:51:47Z

MoltenVK/MoltenVK/API/mvk_datatypes.h

+
+/** Resturn the Metal cache mode from the Metal resource options */
+MTLCPUCacheMode mvkMTLCPUCacheMode(MTLResourceOptions options);
+


These are for internal use and don't need to be in mvk_datatypes.h. I suggest you move them (along with mvkMTLResourceOptions()) to a support functions section at the bottom of MVKDeviceMemory.h/mm. See other support function areas at the bottom of other files for reference.

The ancient original intention of mvk_datatypes.h was to provide apps access to convenience functions for mapping Vulkan to Metal types. TBH...I'm sure no-one ever uses it, and eventually I'd like to remove it and just use internal non-public files instead.

Also Returns is misspelled in the comment.

billhollings · 2024-08-15T16:55:48Z

MoltenVK/MoltenVK/Vulkan/mvk_datatypes.mm

+
+MVK_PUBLIC_SYMBOL MTLCPUCacheMode mvkMTLCPUCacheMode(MTLResourceOptions options) {
+	return static_cast<MTLCPUCacheMode>((options & MTLResourceCPUCacheModeMask) >> MTLResourceCPUCacheModeShift);
+}


As above, I suggest you just add these as support functions at the bottom of MVKDeviceMemory.h/mm.

billhollings · 2024-08-15T16:59:16Z

MoltenVK/MoltenVK/GPUObjects/MVKDeviceMemory.mm

@@ -44,14 +45,11 @@
 		return reportError(VK_ERROR_MEMORY_MAP_FAILED, "Memory is already mapped. Call vkUnmapMemory() first.");
 	}


I agree. These are holdovers from when MoltenVK was not as integrated with the SDK and validation environment. These can all go.

billhollings · 2024-08-15T18:01:40Z

MoltenVK/MoltenVK/GPUObjects/MVKDeviceMemory.h

+	enum class DedicatedResourceType : uint8_t {
+		NONE = 0,
+		BUFFER,
+		IMAGE


Minor nit. MoltenVK traditional style is to avoid all-caps for names (except in some old macro definitions). So it would be more consistent to call these None, Buffer, and Image.

billhollings · 2024-08-15T18:10:44Z

MoltenVK/MoltenVK/GPUObjects/MVKDeviceMemory.mm

+		// Having no buffer and texture being host accessible means we allocated memory for the mapping
+		if (!_mtlBuffer && mvkIsAnyFlagEnabled(_vkMemPropFlags, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT)) {
+			free(_map);
+			_map = nullptr;


Where is _map set if there is no _mtlBuffer?

And if we're going to set it to nullptr, we should move that part below, so it's also set to nullptr for the case where we have a _mtlBuffer.

billhollings · 2024-08-15T18:25:30Z

MoltenVK/MoltenVK/GPUObjects/MVKDeviceMemory.mm

 	}
+
+fail_alloc:
+	setConfigurationResult(reportError(VK_ERROR_OUT_OF_DEVICE_MEMORY, "vkAllocateMemory(): Could not allocate VkDeviceMemory of size %llu bytes.", _size));


Please don't use goto. Package this in a reportAllocFail() function, and then just call return reportAllocFail(); in the goto calls above.

Also, to cover someone trying to use this with ancient OS versions, can you add a check in that function for getMetalFeatures().placementHeaps, and if it's false, report a slightly different message indicating that the app must be running on macOS 10.15 or iOS/tvOS 13.0. Those are also our current minimum supported versions anyway.

billhollings · 2024-08-15T18:55:26Z

MoltenVK/MoltenVK/GPUObjects/MVKDeviceMemory.mm

+	if (mvkIsAnyFlagEnabled(_vkMemPropFlags, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT)) {
+		_mtlBuffer = [_mtlHeap newBufferWithLength:_size options:_options];
+		if (!_mtlBuffer) goto fail_alloc;
+		[_mtlBuffer makeAliasable];


Uggh. The Apple docs for makeAliasable indicate:

Important
This method is only valid for heap-allocated resources using the MTLHeapTypeAutomatic allocator.

Do we need to use placement heaps, or can we rework everything for automatic heaps? I guess the requirement of setting offsets on resources in a device memory makes it impossible not to use placement heaps.

Ah! Actually, from the Apple docs, it looks like with placement heaps, any overlapping areas are automatically aliased, so maybe we just need to avoid the use of makeAliasable.

Also, we should check the maximum size of a MTLBuffer on the device. It may not be able to span the entire MTLHeap.

cdavis5e

I had initially wanted to enable MTLHeap by default when I first added support, but I discovered that rendering to MTLHeap-backed textures was broken on AMD hardware. Dunno if they've fixed that yet.

I actually think that MTLHeaps shouldn't be used for dedicated allocations. IIUC part of the point of VK_KHR_dedicated_allocation is to avoid the overhead of a general memory allocation. Note that MTLHeap-backed resources have higher overhead than non-heap resources.

I also think--and this might be more controversial--that MTLHeap shouldn't be used for coherent, host-visible memory, precisely because MTLHeaps cannot be created in Shared memory on some GPUs. I was disappointed to learn that Apple didn't support Shared MTLHeaps and wouldn't support them, because I wanted to use MTLHeap for everything non-dedicated.

K0bin · 2024-08-30T11:10:57Z

Doesn't always using MTLHeap for VkDeviceMemory also require a rework to do explicit barriers? Metal tracks the underlying heap of each resource, so it would lead to a lot of false sharing. Apple advises against using automatic tracking with MTLHeap in a ton of WWDC videos.

WIP: VkDeviceMemory changes proposal

f6a469d

Incomplete draft since it requires reworking how VkImage and VkBuffer handle the memory, but should be enough to provide an overview of the proposed changes.

aitor-lunarg commented Aug 15, 2024

View reviewed changes

billhollings requested a review from cdavis5e August 15, 2024 17:00

billhollings requested changes Aug 15, 2024

View reviewed changes

billhollings changed the title ~~WIP: VkDeviceMemory changes proposal~~ WIP: Proposal to base VkDeviceMemory on MTLHeap always Aug 15, 2024

billhollings requested changes Aug 15, 2024

View reviewed changes

cdavis5e reviewed Aug 30, 2024

View reviewed changes

ncesario-lunarg mentioned this pull request Sep 12, 2024

[WIP] Add VK_EXT_image_2d_view_of_3d support #2332

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Proposal to base VkDeviceMemory on MTLHeap always #2309

WIP: Proposal to base VkDeviceMemory on MTLHeap always #2309

aitor-lunarg commented Aug 15, 2024

aitor-lunarg Aug 15, 2024

billhollings Aug 15, 2024 •

edited

Loading

billhollings Aug 15, 2024 •

edited

Loading

aitor-lunarg Aug 23, 2024

billhollings Aug 29, 2024

aitor-lunarg Aug 15, 2024

aitor-lunarg Aug 15, 2024

billhollings Aug 15, 2024

billhollings left a comment •

edited

Loading

billhollings Aug 15, 2024 •

edited

Loading

billhollings Aug 15, 2024

billhollings Aug 15, 2024

billhollings Aug 15, 2024

billhollings Aug 15, 2024

billhollings Aug 15, 2024

billhollings Aug 15, 2024 •

edited

Loading

cdavis5e left a comment

K0bin commented Aug 30, 2024

		@@ -44,14 +45,11 @@
		return reportError(VK_ERROR_MEMORY_MAP_FAILED, "Memory is already mapped. Call vkUnmapMemory() first.");
		}


		/** Resturn the Metal cache mode from the Metal resource options */
		MTLCPUCacheMode mvkMTLCPUCacheMode(MTLResourceOptions options);

WIP: Proposal to base VkDeviceMemory on MTLHeap always #2309

Are you sure you want to change the base?

WIP: Proposal to base VkDeviceMemory on MTLHeap always #2309

Conversation

aitor-lunarg commented Aug 15, 2024

Choose a reason for hiding this comment

billhollings Aug 15, 2024 • edited Loading

Choose a reason for hiding this comment

billhollings Aug 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

billhollings left a comment • edited Loading

Choose a reason for hiding this comment

billhollings Aug 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

billhollings Aug 15, 2024 • edited Loading

Choose a reason for hiding this comment

cdavis5e left a comment

Choose a reason for hiding this comment

K0bin commented Aug 30, 2024

billhollings Aug 15, 2024 •

edited

Loading

billhollings Aug 15, 2024 •

edited

Loading

billhollings left a comment •

edited

Loading

billhollings Aug 15, 2024 •

edited

Loading

billhollings Aug 15, 2024 •

edited

Loading