Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(allocator): add Vec::into_boxed_slice #6195

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

DonIsaac
Copy link
Collaborator

@DonIsaac DonIsaac commented Sep 30, 2024

Note that this PR does not implement the inverse operation (Box::to_vec for [T]).

Copy link

graphite-app bot commented Sep 30, 2024

Your org has enabled the Graphite merge queue for merging into main

Add the label “0-merge” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. Or use the label “hotfix” to add to the merge queue as a hot fix.

You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link.

Copy link
Collaborator Author

DonIsaac commented Sep 30, 2024

Copy link

codspeed-hq bot commented Sep 30, 2024

CodSpeed Performance Report

Merging #6195 will not alter performance

Comparing don/09-30-feat_allocator_add_vec_into_boxed_slice_ (e14701c) with main (5db9b30)

Summary

✅ 29 untouched benchmarks

@overlookmotel overlookmotel changed the base branch from don/09-30-perf_allocator_use_lower_bound_of_size_hint_when_creating_vecs_from_an_iterator to graphite-base/6195 October 1, 2024 00:07
@overlookmotel overlookmotel force-pushed the don/09-30-feat_allocator_add_vec_into_boxed_slice_ branch from 1156adc to 8324e85 Compare October 1, 2024 00:11
@overlookmotel overlookmotel changed the base branch from graphite-base/6195 to main October 1, 2024 00:12
@overlookmotel overlookmotel force-pushed the don/09-30-feat_allocator_add_vec_into_boxed_slice_ branch from 8324e85 to e14701c Compare October 1, 2024 00:12
Comment on lines +124 to +130
pub fn into_boxed_slice(self) -> Box<'alloc, [T]> {
let b = self.0.into_boxed_slice();
let (ptr, _) = allocator_api2::boxed::Box::into_non_null(b);
// SAFETY: this is effectively a transmute from allocator_api2's box to
// our own representation. Ptr is non-null and points to initialized, well-aligned memory.
unsafe { Box::from_non_null(ptr) }
}
Copy link
Collaborator

@overlookmotel overlookmotel Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I'm not sure about this.

I'm not that familiar with allocator_api2's implementation of into_boxed_slice, but it sounds like it can reallocate (copy all the data), which we definitely don't want.

That makes sense for allocator_api2 as it's generic over any kind of allocator, but in our case we're using a bump allocator which cannot recover excess capacity, so it's pointless to ever copy the data.

I don't know if bumpalo::Bump would automatically ignore the request to reallocate or not. Would you like to look into that?

Alternatively, we could get the pointer and length of the Vec and construct a NonNull<[T]> manually. That'd definitely avoid memory copies. (Note: would need to wrap the Vec in a ManuallyDrop first, as allocator_api2's Vec is Drop)

This all requires quite a bit of care. I'm not completely up to speed on the safety constraints. Sometimes there are non-obvious subtleties to avoiding UB, and I usually end up having to spend a few hours boning up on arcane details before I'm sure (well, at least reasonably sure) that I've not committed some UB crime!

Do you have a concrete use case for wanting a boxed slice to data in the arena?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, we're planning to replace this Vec implementation with a hand-written one with length and capacity fields as u32s, to bring size of Vec down from 32 bytes to 24 bytes (oxc-project/backlog#18). Dunqing was going to look at doing that fairly soon.

So at that point, we wouldn't be able to use allocator_api2's into_boxed_slice anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/zakarumych/allocator-api2/blob/f937b9650e10137f782698bd4eb4f8b36f1817a7/src/stable/vec/mod.rs#L1046

I don't think it creates a new allocation. I want this because it cuts the size of a dynamically sized vec in half (32 -> 16 bytes) at the cost of not being re-sizeable. I'm working on a prototype where this is useful.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs for into_boxed_slice specifically say "If the vector has excess capacity, its items will be moved into a newly-allocated buffer with exactly the right capacity." i.e. memory copy.

into_boxed_slice calls shrink_to_fit, and shrink_to_fit can cause a memory copy. Or maybe it doesn't if Bumpalo chooses not to, but to determine that conclusively would require tracing the calls all the way down to Bump::shrink.

At the very least it invokes quite a lot of complex machinery, which may not get elided.

Ideally we'd do this instead:

pub fn into_boxed_slice(self) -> Box<'alloc, [T]> {
    let (ptr, len, _) = self.0.into_raw_parts();
    let ptr = unsafe { NonNull::new_unchecked(ptr) };
    let ptr = NonNull::slice_from_raw_parts(ptr, len);
    unsafe { Box::from_non_null(ptr) }
}

But I don't think we can, because Vec::into_raw_parts does not guarantee the pointer it returns is non-null. In practice it is, because RawVec contains a NonNull, but that's an internal implementation detail of Vec, and no guarantee it won't change.

Copy link
Collaborator

@overlookmotel overlookmotel Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this works:

pub fn into_boxed_slice(self) -> Box<'alloc, [T]> {
    let slice = self.0.leak();
    let ptr = NonNull::from(slice);
    // SAFETY: `ptr` points to a valid slice `[T]`.
    // Lifetime of returned `Box<'alloc, [T]>` is same as lifetime of consumed `Vec<'alloc, T>`,
    // so data in the `Box` must be valid for its lifetime.
    unsafe { Box::from_non_null(ptr) }
}

We can use Vec::leak because in our use case, the types stored in arena are all non-Drop.

Then you can remove the comment about "If the vector has excess capacity, its items will be moved into a newly-allocated buffer with exactly the right capacity."

Comment on lines +82 to +84
/// # SAFETY
/// It is only safe to create a `Box` this way with pointers taken out of
/// another `Box` in the same arena.
Copy link
Collaborator

@overlookmotel overlookmotel Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be updated. It's not true that the pointer has to be from another Box. If that was the case, then Vec::into_boxed_slice which you're adding here would violate the contract, as it's not a Box!

The requirement is that pointer needs to point to data in same arena (could be from a Vec, a String, anything else).

Actually, not even that is quite right. Intended usage is that it be data from the same arena, but the safety constraint I think is a bit looser - it can point to any data as long as that data outlives 'alloc.

There's also a safety requirement about no aliasing.

And the NonNull<T> must have been created from a mutable pointer *mut T, not a *const T, or that would be UB too.

This may all seem like nitpicking, but in my view it's vitally important that documentation for unsafe methods is accurate about what the contract is, otherwise it's impossible to verify the method is being used correctly.

@DonIsaac
Copy link
Collaborator Author

DonIsaac commented Oct 1, 2024

@overlookmotel does it make sense to just abandon this PR?

@overlookmotel
Copy link
Collaborator

No, I think we can do it. I just think better to use this version:

pub fn into_boxed_slice(self) -> Box<'alloc, [T]> {
    let slice = self.0.leak();
    let ptr = NonNull::from(slice);
    // SAFETY: `ptr` points to a valid slice `[T]`.
    // Lifetime of returned `Box<'alloc, [T]>` is same as lifetime of consumed `Vec<'alloc, T>`,
    // so data in the `Box` will be valid for its lifetime.
    unsafe { Box::from_non_null(ptr) }
}

It's safe, and won't copy memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

allocator: Vec::into_boxed_slice does not return our Box
2 participants