Introducing new memory types to CacheLib #102
Replies: 4 comments 7 replies
-
Hi, thank you for putting together a detailed design. This seems very interesting! We have a few questions regarding some details, and also would like to know what your plans are. Question on details
Question on next stepsHave you run any benchmark with this design? How does it perform compared to standard cachelib with local + far memory? (How does it compare to using NvmCache directly on far memory?) |
Beta Was this translation helpful? Give feedback.
-
I am trying to build DRAM + PMEM + NVMe three layer architecture with your latest version.
It says current implementation doesn't support multiple memory tiers. Can you point out if there is any misunderstanding in the config? |
Beta Was this translation helpful? Give feedback.
-
What are you hoping to accomplish with the benchmarking with respect to the code changes ? Do you envision these PRs would be eventually merged in to the main cachelib branch ? This will play a role into how we review the design. Design requirements to be merged into main branch would need more consideration than a design review to evaluate a prototype. Some design specific feedback. File-backed memory support in shared memory manager and the configuration changes sound good from a design point. I have a related question though about the Item layout. Wouldn't it be better if Items in far-memory that are byte-addressable have item headers separated from the item data layout ?
with 64 bits, it is no longer a "compressed" pointer and I think the only benefits we get is pointer fix-up for cache persistence mode. It might be worth to see if the memory mode can be tracked at the slab level instead ?
Using the wait-context sounds like a good idea. Does this ensure that item can not be mutated while it is in the process of a move between levels ? Cachebench has consistency testing mode which can also help flush out concurrency bugs with this. Have you tried that (@therealgymmy pointed some instructions on how to do this in an earlier comment). |
Beta Was this translation helpful? Give feedback.
-
Update from the Intel team:
|
Beta Was this translation helpful? Give feedback.
-
We are a software team at Intel working to implement support for different memory types in CacheLib to enable performance experiments and optimizations for different use cases that could benefit from using other memory types besides currently supported DRAM and flash/NVMe.
The vast majority of contemporary servers present a homogeneous memory abstraction. As far as user-space software is concerned, all main system memory is the same. That memory is usually DRAM, with latencies in the neighborhood of ~50-100ns. But the server landscape is changing rapidly towards more heterogeneous systems in pursuit of ever-greater efficiency.
Heterogeneous computing is already commonplace, with CPUs often working in tandem with various accelerators (GPUs, TPUs, ...). The memory technology development exhibits similar trend: persistent memory is already broadly available on the market, and new innovative cache-coherent interconnects (e.g., CXL) are not far behind. Typically, those types of auxiliary memory, sometimes referred to as far memory, have higher latency than DRAM but come with other benefits. Be it higher bandwidth (HBM), persistence, capacity and affordability (PMem), or better manageability and other characteristics (CXL.mem).
Heterogeneous memory, just like heterogeneous compute, requires software to understand how to use it efficiently. Primarily, software needs to make data placement decisions on which memory to use for what purpose. Very frequently accessed locations, such as metadata, are typically best placed on the lowest latency memory. In contrast, less often accessed data should be placed in more cost-efficient tiers. Data placement decisions can usually be made effectively in the kernel, provided that user-space applications can guarantee data locality at a page level. This is conceptually similar to how caching on a block device works, but software retains the ability to immediately and directly access individual memory locations (cache lines).
With this publication, we would like to get early feedback from the CacheLib community for our proposed design. Our goal is to create a solution that is easy to use and experiment with, applicable to common usage scenarios, and backward compatible.
At a high level, we suggest allowing cached items to migrate from the fastest memory (e.g. DRAM) to a slower memory when the item gets warm, and then to even slower memory. The current implementation of CacheLib can be configured to use NVMe as an additional memory layer which allows for keeping the warm items in the NVMe memory instead of evicting them out of the cache:
We are working on generalizing this approach to work with other modern types of memories, for example, persistent memory PMem, so that CacheLib users could take advantage of faster and bigger memory:
We made the following modifications to the code:
The idea behind these PRs is to introduce support for different memory types by extending Shm Manager implementation. The original design only supports POSIX and SysV Shared Memory Segments (which use either shm_open or shmget for memory allocation). In the PRs mentioned above, we introduced a third type of segment: FileShmSegment. It provides access to file-backed memory, which can be used to expose PMEM or CXL memory to CacheLib.
Configuration API was extended so that a user can set up a heterogeneous memory cache. Specifically, we have added a new configureMemoryTiers method to CacheAllocatorConfig. This method accepts a vector of MemoryTierCacheConfig structures that describe the type of memory (via a path to a memory-mapped file) and the size or ratio of a particular tier. The example below shows a configuration that uses two levels: the top is created using Shared Memory (POSIX or SysV), and the bottom is built on top of the path file. The bottom tier is twice as large as the top one:
We have converted MemoryAllocator and MMContainers to arrays to support multiple memory tiers: each memory tier has its own Memory Allocator with Memory Pools, and MMContainer. AccessContainer was left unmodified – the same data structure still indexes all items. Also, there were no changes (except for item promotion) for the find item path – the user still gets direct access to the item’s memory and there are no copies made.
The image above shows that the IDs of Memory Pools match for each tier: if an element from the memory pool with ID=1 is evicted from tier 0 to tier 1, it will end up with a memory pool with ID=1 at the tier 1. Relative sizes of Memory Pools are the same at each tier.
Since each memory tier is an independent piece of memory, we also had to modify CompressedPtr so that it knows to which level the pointer belongs. This resulted in increasing the size of the CompressedPtr from 32 to 64 bits (we use the topmost 32 bits to store the tier ID). We believe this approach can be improved and it’s open for discussion.
There are several incomplete work items: 1) We have not implemented proper serialization support for multiple tiers in the correct version. 2) Some operations and methods for gathering statistics should be extended to work with various tiers (currentTier() method).
Eviction:
If an item is a candidate for eviction (according to eviction policy), we first try to evict it to the next memory tier if configured. If the following memory tier is not configured or eviction to the next memory tier has failed, we try to evict to the NVMe storage.
Our design is done with concurrency in mind: the Item, which is moved between memory tiers, can be accessed by concurrent threads. The diagram below shows eviction operation with simultaneous find operation. The high-level idea is to use Wait Context and make lookup threads wait on the Item handle while the move operation is in progress.
Promotion:
The main difference between memory tiering and hybrid cache (DRAM + NVMe storage) is that all memory tiers allow direct access without requirements to promote Item to the top-most tier on Item access. For promotion, we are planning to introduce promotion policy abstraction. Each find request will decide whether we need to promote the Item to the top tier or return its handle to the current tier.
Let us know what you think.
Beta Was this translation helpful? Give feedback.
All reactions