Skip to content

Commit

Permalink
WIP: BatchIt (#677)
Browse files Browse the repository at this point in the history
* Rename dealloc_local_object_slower to _meta

Unlike its brethren, `dealloc_local_object` and
`dealloc_local_object_slow`, the `dealloc_local_object_slower` method
does not take a pointer to free space.  Make this slightly more apparent
by renaming it and adding some commentary to both definition and call
site.

* corealloc: get meta in dealloc_local_object

Make both _fast() and _slow() arms take the meta as an argument; _meta()
already did.

* Introduce RemoteMessage structure

Plumb its use around remoteallocator and remotecache

* NFC: Plumb metadata to remotecache dealloc

* Initial steps in batched remote messages

This prepares the recipient to process a batched message.

* Initial dealloc-side batching machinery

Exercise recipient machinery by having the senders collect adjacent frees to
the same slab into a batch.

* Match free batch keying to slab freelist keying

* freelist: add append_segment

* SlabMetadata: machinery for returning multiple objects

This might involve multiple (I think at most two, at the moment) transitions in
the slab lifecycle state machine.  Towards that end, return indicators to the
caller that the slow path must be taken and how many objects of the original
set have not yet been counted as returned.

* corealloc: operate ring-at-a-time on remote queues

* RemoteCache associative cache of rings

* RemoteCache: N-set caching

* Initial CHERI support for free rings

* Matt's fix for slow-path codegen

* Try: remotecache: don't store allocator IDs

We can, as Matt so kindly reminds me, go get them from the pagemap.  Since we
need this value only when closing a ring, the read from over there is probably
not very onerous.  (We could also get the slab pointer from an object in the
ring, but we need that whenever inserting into the cache, so it's probably more
sensible to store that locally?)

* Make BatchIt optional

Move ring set bits and associativity knobs to allocconfig and expose them via
CMake.  If associtivity is zero, use non-batched implementations of the
`RemoteMessage` and `RemoteDeallocCacheBatching` classes.

By default, kick BatchIt on when we have enough room in the minimum allocation
size to do it.  Exactly how much space is enough is a function of which
mitigations we have enabled and whether or not we are compiling with C++20.

This commit reverts the change to `MIN_ALLOC_SIZE` made in "Introduce
RemoteMessage structure" now that we have multiple types, and zies, of
remote messages to choose from.

* RemoteDeallocCacheBatching: store metas as address

There's no need for a full pointer here, it'd just make the structure larger on
CHERI.

* NFC: plumb entropy from LocalAlloc to BatchIt

* BatchIt random eviction

In order not to thwart `mitigations(random_preserve)` too much, if it's on in
combination with BatchIt, roll the dice every time we append to a batch to
decide if we should stochastically evict this batch.  By increasing the number
of batches, we allow the recipient allocator increased opportunity to randomly
stripe batches across the two `freelist::Builder` segments associated with each
slab.

---------

Co-authored-by: Nathaniel Wesley Filardo <nfilardo@microsoft.com>
Co-authored-by: Matthew Parkinson <mattpark@microsoft.com>
  • Loading branch information
3 people authored Sep 23, 2024
1 parent 416fd39 commit fb776da
Show file tree
Hide file tree
Showing 9 changed files with 755 additions and 51 deletions.
5 changes: 5 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,9 @@ endif()
set(SNMALLOC_MIN_ALLOC_SIZE "" CACHE STRING "Minimum allocation bytes (power of 2)")
set(SNMALLOC_MIN_ALLOC_STEP_SIZE "" CACHE STRING "Minimum allocation step (power of 2)")

set(SNMALLOC_DEALLOC_BATCH_RING_ASSOC "" CACHE STRING "Associativity of deallocation batch cache; 0 to disable")
set(SNMALLOC_DEALLOC_BATCH_RING_SET_BITS "" CACHE STRING "Logarithm of number of deallocation batch cache associativity sets")

if(MSVC AND SNMALLOC_STATIC_LIBRARY AND (SNMALLOC_STATIC_LIBRARY_PREFIX STREQUAL ""))
message(FATAL_ERROR "Empty static library prefix not supported on MSVC")
endif()
Expand Down Expand Up @@ -251,6 +254,8 @@ if (SNMALLOC_NO_REALLOCARR)
endif()
add_as_define_value(SNMALLOC_MIN_ALLOC_SIZE)
add_as_define_value(SNMALLOC_MIN_ALLOC_STEP_SIZE)
add_as_define_value(SNMALLOC_DEALLOC_BATCH_RING_ASSOC)
add_as_define_value(SNMALLOC_DEALLOC_BATCH_RING_SET_BITS)

target_compile_definitions(snmalloc INTERFACE $<$<BOOL:CONST_QUALIFIED_MALLOC_USABLE_SIZE>:MALLOC_USABLE_SIZE_QUALIFIER=const>)

Expand Down
8 changes: 8 additions & 0 deletions src/snmalloc/backend/backend.h
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,14 @@ namespace snmalloc
local_state.get_object_range()->dealloc_range(arena, size);
}

SNMALLOC_FAST_PATH static capptr::Alloc<void>
capptr_rederive_alloc(capptr::Alloc<void> a, size_t objsize)
{
return capptr_to_user_address_control(
Aal::capptr_bound<void, capptr::bounds::AllocFull>(
Authmap::amplify(a), objsize));
}

template<bool potentially_out_of_range = false>
SNMALLOC_FAST_PATH static const PagemapEntry& get_metaentry(address_t p)
{
Expand Down
39 changes: 39 additions & 0 deletions src/snmalloc/ds/allocconfig.h
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,45 @@ namespace snmalloc
static constexpr size_t REMOTE_SLOTS = 1 << REMOTE_SLOT_BITS;
static constexpr size_t REMOTE_MASK = REMOTE_SLOTS - 1;

#if defined(SNMALLOC_DEALLOC_BATCH_RING_ASSOC)
static constexpr size_t DEALLOC_BATCH_RING_ASSOC =
SNMALLOC_DEALLOC_BATCH_RING_ASSOC;
#else
# if defined(__has_cpp_attribute)
# if ( \
__has_cpp_attribute(msvc::no_unique_address) && \
(__cplusplus >= 201803L || _MSVC_LANG >= 201803L)) || \
__has_cpp_attribute(no_unique_address)
// For C++20 or later, we do have [[no_unique_address]] and so can also do
// batching if we aren't turning on the backward-pointer mitigations
static constexpr size_t DEALLOC_BATCH_MIN_ALLOC_WORDS =
mitigations(freelist_backward_edge) ? 4 : 2;
# else
// For C++17, we don't have [[no_unique_address]] and so we always end up
// needing all four pointers' worth of space (because BatchedRemoteMessage has
// two freelist::Object::T<> links within, each of which will have two fields
// and will be padded to two pointers).
static constexpr size_t DEALLOC_BATCH_MIN_ALLOC_WORDS = 4;
# endif
# else
// If we don't even have the feature test macro, we're C++17 or earlier.
static constexpr size_t DEALLOC_BATCH_MIN_ALLOC_WORDS = 4;
# endif

static constexpr size_t DEALLOC_BATCH_RING_ASSOC =
(MIN_ALLOC_SIZE >= (DEALLOC_BATCH_MIN_ALLOC_WORDS * sizeof(void*))) ? 2 : 0;
#endif

#if defined(SNMALLOC_DEALLOC_BATCH_RING_SET_BITS)
static constexpr size_t DEALLOC_BATCH_RING_SET_BITS =
SNMALLOC_DEALLOC_BATCH_RING_SET_BITS;
#else
static constexpr size_t DEALLOC_BATCH_RING_SET_BITS = 3;
#endif

static constexpr size_t DEALLOC_BATCH_RINGS =
DEALLOC_BATCH_RING_ASSOC * bits::one_at_bit(DEALLOC_BATCH_RING_SET_BITS);

static_assert(
INTERMEDIATE_BITS < MIN_ALLOC_STEP_BITS,
"INTERMEDIATE_BITS must be less than MIN_ALLOC_BITS");
Expand Down
135 changes: 102 additions & 33 deletions src/snmalloc/mem/corealloc.h
Original file line number Diff line number Diff line change
Expand Up @@ -380,9 +380,15 @@ namespace snmalloc
}

/**
* Very slow path for deallocating an object locally.
* Very slow path for object deallocation.
*
* The object has already been returned to the slab, so all that is left to
* do is update its metadata and, if that pushes us into having too many
* unused slabs in this size class, return some.
*
* Also while here, check the time.
*/
SNMALLOC_SLOW_PATH void dealloc_local_object_slower(
SNMALLOC_SLOW_PATH void dealloc_local_object_meta(
const PagemapEntry& entry, BackendSlabMetadata* meta)
{
smallsizeclass_t sizeclass = entry.get_sizeclass().as_small();
Expand Down Expand Up @@ -427,14 +433,17 @@ namespace snmalloc
* This is either waking up a slab that was not actively being used
* by this thread, or handling the final deallocation onto a slab,
* so it can be reused by other threads.
*
* Live large objects look like slabs that need attention when they become
* free; that attention is also given here.
*/
SNMALLOC_SLOW_PATH void
dealloc_local_object_slow(capptr::Alloc<void> p, const PagemapEntry& entry)
SNMALLOC_SLOW_PATH void dealloc_local_object_slow(
capptr::Alloc<void> p,
const PagemapEntry& entry,
BackendSlabMetadata* meta)
{
// TODO: Handle message queue on this path?

auto* meta = entry.get_slab_metadata();

if (meta->is_large())
{
// Handle large deallocation here.
Expand All @@ -460,7 +469,8 @@ namespace snmalloc
return;
}

dealloc_local_object_slower(entry, meta);
// Not a large object; update slab metadata
dealloc_local_object_meta(entry, meta);
}

/**
Expand Down Expand Up @@ -503,13 +513,11 @@ namespace snmalloc
SNMALLOC_FAST_PATH_LAMBDA {
return capptr_domesticate<Config>(local_state, p);
};
auto cb = [this,
&need_post](freelist::HeadPtr msg) SNMALLOC_FAST_PATH_LAMBDA {
auto cb = [this, domesticate, &need_post](
capptr::Alloc<RemoteMessage> msg) SNMALLOC_FAST_PATH_LAMBDA {
auto& entry =
Config::Backend::template get_metaentry(snmalloc::address_cast(msg));

handle_dealloc_remote(entry, msg.as_void(), need_post);

handle_dealloc_remote(entry, msg, need_post, domesticate);
return true;
};

Expand Down Expand Up @@ -548,32 +556,56 @@ namespace snmalloc
*
* need_post will be set to true, if capacity is exceeded.
*/
template<typename Domesticator_queue>
void handle_dealloc_remote(
const PagemapEntry& entry,
CapPtr<void, capptr::bounds::Alloc> p,
bool& need_post)
capptr::Alloc<RemoteMessage> msg,
bool& need_post,
Domesticator_queue domesticate)
{
// TODO this needs to not double count stats
// TODO this needs to not double revoke if using MTE
// TODO thread capabilities?

if (SNMALLOC_LIKELY(entry.get_remote() == public_state()))
{
dealloc_local_object(p, entry);
auto meta = entry.get_slab_metadata();

auto unreturned =
dealloc_local_objects_fast(msg, entry, meta, entropy, domesticate);

/*
* dealloc_local_objects_fast has updated the free list but not updated
* the slab metadata; it falls to us to do so. It is UNLIKELY that we
* will need to take further steps, but we might.
*/
if (SNMALLOC_UNLIKELY(unreturned.template step<true>()))
{
dealloc_local_object_slow(msg.as_void(), entry, meta);

while (SNMALLOC_UNLIKELY(unreturned.template step<false>()))
{
dealloc_local_object_meta(entry, meta);
}
}

return;
}
else

auto nelem = RemoteMessage::template ring_size<Config>(
msg,
freelist::Object::key_root,
entry.get_slab_metadata()->as_key_tweak(),
domesticate);
if (
!need_post &&
!attached_cache->remote_dealloc_cache.reserve_space(entry, nelem))
{
if (
!need_post &&
!attached_cache->remote_dealloc_cache.reserve_space(entry))
{
need_post = true;
}
attached_cache->remote_dealloc_cache
.template dealloc<sizeof(CoreAllocator)>(
entry.get_remote()->trunc_id(), p.as_void());
need_post = true;
}
attached_cache->remote_dealloc_cache
.template forward<sizeof(CoreAllocator)>(
entry.get_remote()->trunc_id(), msg);
}

/**
Expand Down Expand Up @@ -698,10 +730,12 @@ namespace snmalloc
CapPtr<void, capptr::bounds::Alloc> p,
const typename Config::PagemapEntry& entry)
{
if (SNMALLOC_LIKELY(dealloc_local_object_fast(entry, p, entropy)))
auto meta = entry.get_slab_metadata();

if (SNMALLOC_LIKELY(dealloc_local_object_fast(p, entry, meta, entropy)))
return;

dealloc_local_object_slow(p, entry);
dealloc_local_object_slow(p, entry, meta);
}

SNMALLOC_FAST_PATH void
Expand All @@ -714,12 +748,11 @@ namespace snmalloc
}

SNMALLOC_FAST_PATH static bool dealloc_local_object_fast(
const PagemapEntry& entry,
CapPtr<void, capptr::bounds::Alloc> p,
const PagemapEntry& entry,
BackendSlabMetadata* meta,
LocalEntropy& entropy)
{
auto meta = entry.get_slab_metadata();

SNMALLOC_ASSERT(!meta->is_unused());

snmalloc_check_client(
Expand All @@ -736,6 +769,42 @@ namespace snmalloc
return SNMALLOC_LIKELY(!meta->return_object());
}

template<typename Domesticator>
SNMALLOC_FAST_PATH static auto dealloc_local_objects_fast(
capptr::Alloc<RemoteMessage> msg,
const PagemapEntry& entry,
BackendSlabMetadata* meta,
LocalEntropy& entropy,
Domesticator domesticate)
{
SNMALLOC_ASSERT(!meta->is_unused());

snmalloc_check_client(
mitigations(sanity_checks),
is_start_of_object(entry.get_sizeclass(), address_cast(msg)),
"Not deallocating start of an object");

size_t objsize = sizeclass_full_to_size(entry.get_sizeclass());

auto [curr, length] = RemoteMessage::template open_free_ring<Config>(
msg,
objsize,
freelist::Object::key_root,
meta->as_key_tweak(),
domesticate);

// Update the head and the next pointer in the free list.
meta->free_queue.append_segment(
curr,
msg.template as_reinterpret<freelist::Object::T<>>(),
length,
freelist::Object::key_root,
meta->as_key_tweak(),
entropy);

return meta->return_objects(length);
}

template<ZeroMem zero_mem>
SNMALLOC_SLOW_PATH capptr::Alloc<void>
small_alloc(smallsizeclass_t sizeclass, freelist::Iter<>& fast_free_list)
Expand Down Expand Up @@ -871,11 +940,11 @@ namespace snmalloc

if (destroy_queue)
{
auto cb = [this](capptr::Alloc<void> p) {
auto cb = [this, domesticate](capptr::Alloc<RemoteMessage> m) {
bool need_post = true; // Always going to post, so ignore.
const PagemapEntry& entry =
Config::Backend::get_metaentry(snmalloc::address_cast(p));
handle_dealloc_remote(entry, p.as_void(), need_post);
Config::Backend::get_metaentry(snmalloc::address_cast(m));
handle_dealloc_remote(entry, m, need_post, domesticate);
};

message_queue().destroy_and_iterate(domesticate, cb);
Expand Down
32 changes: 32 additions & 0 deletions src/snmalloc/mem/freelist.h
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@

namespace snmalloc
{
class BatchedRemoteMessage;

static constexpr address_t NO_KEY_TWEAK = 0;

/**
Expand Down Expand Up @@ -139,6 +141,8 @@ namespace snmalloc

friend class Object;

friend class ::snmalloc::BatchedRemoteMessage;

class Empty
{
public:
Expand Down Expand Up @@ -916,6 +920,34 @@ namespace snmalloc
return {first, last};
}

/**
* Put back an extracted segment from a builder using the same key.
*
* The caller must tell us how many elements are involved.
*/
void append_segment(
Object::BHeadPtr<BView, BQueue> first,
Object::BHeadPtr<BView, BQueue> last,
uint16_t size,
const FreeListKey& key,
address_t key_tweak,
LocalEntropy& entropy)
{
uint32_t index;
if constexpr (RANDOM)
index = entropy.next_bit();
else
index = 0;

if constexpr (TRACK_LENGTH)
length[index] += size;
else
UNUSED(size);

Object::store_next(cast_end(index), first, key, key_tweak);
set_end(index, &(last->next_object));
}

template<typename Domesticator>
SNMALLOC_FAST_PATH void validate(
const FreeListKey& key, address_t key_tweak, Domesticator domesticate)
Expand Down
14 changes: 10 additions & 4 deletions src/snmalloc/mem/localalloc.h
Original file line number Diff line number Diff line change
Expand Up @@ -286,7 +286,7 @@ namespace snmalloc
address_cast(entry.get_slab_metadata()));
#endif
local_cache.remote_dealloc_cache.template dealloc<sizeof(CoreAlloc)>(
entry.get_remote()->trunc_id(), p);
entry.get_slab_metadata(), p, &local_cache.entropy);
post_remote_cache();
return;
}
Expand Down Expand Up @@ -658,6 +658,12 @@ namespace snmalloc
return;
}

dealloc_remote(entry, p_tame);
}

SNMALLOC_SLOW_PATH void
dealloc_remote(const PagemapEntry& entry, capptr::Alloc<void> p_tame)
{
RemoteAllocator* remote = entry.get_remote();
if (SNMALLOC_LIKELY(remote != nullptr))
{
Expand All @@ -673,12 +679,12 @@ namespace snmalloc
if (local_cache.remote_dealloc_cache.reserve_space(entry))
{
local_cache.remote_dealloc_cache.template dealloc<sizeof(CoreAlloc)>(
remote->trunc_id(), p_tame);
entry.get_slab_metadata(), p_tame, &local_cache.entropy);
# ifdef SNMALLOC_TRACING
message<1024>(
"Remote dealloc fast {} ({}, {})",
p_raw,
alloc_size(p_raw),
address_cast(p_tame),
alloc_size(p_tame.unsafe_ptr()),
address_cast(entry.get_slab_metadata()));
# endif
return;
Expand Down
Loading

0 comments on commit fb776da

Please sign in to comment.