-
-
Notifications
You must be signed in to change notification settings - Fork 400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(allocator): add Vec::into_boxed_slice
#6195
base: main
Are you sure you want to change the base?
Conversation
Your org has enabled the Graphite merge queue for merging into mainAdd the label “0-merge” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. Or use the label “hotfix” to add to the merge queue as a hot fix. You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link. |
This stack of pull requests is managed by Graphite. Learn more about stacking. |
CodSpeed Performance ReportMerging #6195 will not alter performanceComparing Summary
|
0af64c6
to
5db9b30
Compare
1156adc
to
8324e85
Compare
8324e85
to
e14701c
Compare
pub fn into_boxed_slice(self) -> Box<'alloc, [T]> { | ||
let b = self.0.into_boxed_slice(); | ||
let (ptr, _) = allocator_api2::boxed::Box::into_non_null(b); | ||
// SAFETY: this is effectively a transmute from allocator_api2's box to | ||
// our own representation. Ptr is non-null and points to initialized, well-aligned memory. | ||
unsafe { Box::from_non_null(ptr) } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I'm not sure about this.
I'm not that familiar with allocator_api2
's implementation of into_boxed_slice
, but it sounds like it can reallocate (copy all the data), which we definitely don't want.
That makes sense for allocator_api2
as it's generic over any kind of allocator, but in our case we're using a bump allocator which cannot recover excess capacity, so it's pointless to ever copy the data.
I don't know if bumpalo::Bump
would automatically ignore the request to reallocate or not. Would you like to look into that?
Alternatively, we could get the pointer and length of the Vec
and construct a NonNull<[T]>
manually. That'd definitely avoid memory copies. (Note: would need to wrap the Vec
in a ManuallyDrop
first, as allocator_api2
's Vec
is Drop
)
This all requires quite a bit of care. I'm not completely up to speed on the safety constraints. Sometimes there are non-obvious subtleties to avoiding UB, and I usually end up having to spend a few hours boning up on arcane details before I'm sure (well, at least reasonably sure) that I've not committed some UB crime!
Do you have a concrete use case for wanting a boxed slice to data in the arena?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, we're planning to replace this Vec
implementation with a hand-written one with length and capacity fields as u32
s, to bring size of Vec
down from 32 bytes to 24 bytes (oxc-project/backlog#18). Dunqing was going to look at doing that fairly soon.
So at that point, we wouldn't be able to use allocator_api2
's into_boxed_slice
anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it creates a new allocation. I want this because it cuts the size of a dynamically sized vec in half (32 -> 16 bytes) at the cost of not being re-sizeable. I'm working on a prototype where this is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs for into_boxed_slice
specifically say "If the vector has excess capacity, its items will be moved into a newly-allocated buffer with exactly the right capacity." i.e. memory copy.
into_boxed_slice
calls shrink_to_fit
, and shrink_to_fit
can cause a memory copy. Or maybe it doesn't if Bumpalo chooses not to, but to determine that conclusively would require tracing the calls all the way down to Bump::shrink
.
At the very least it invokes quite a lot of complex machinery, which may not get elided.
Ideally we'd do this instead:
pub fn into_boxed_slice(self) -> Box<'alloc, [T]> {
let (ptr, len, _) = self.0.into_raw_parts();
let ptr = unsafe { NonNull::new_unchecked(ptr) };
let ptr = NonNull::slice_from_raw_parts(ptr, len);
unsafe { Box::from_non_null(ptr) }
}
But I don't think we can, because Vec::into_raw_parts
does not guarantee the pointer it returns is non-null. In practice it is, because RawVec
contains a NonNull
, but that's an internal implementation detail of Vec
, and no guarantee it won't change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this works:
pub fn into_boxed_slice(self) -> Box<'alloc, [T]> {
let slice = self.0.leak();
let ptr = NonNull::from(slice);
// SAFETY: `ptr` points to a valid slice `[T]`.
// Lifetime of returned `Box<'alloc, [T]>` is same as lifetime of consumed `Vec<'alloc, T>`,
// so data in the `Box` must be valid for its lifetime.
unsafe { Box::from_non_null(ptr) }
}
We can use Vec::leak
because in our use case, the types stored in arena are all non-Drop
.
Then you can remove the comment about "If the vector has excess capacity, its items will be moved into a newly-allocated buffer with exactly the right capacity."
/// # SAFETY | ||
/// It is only safe to create a `Box` this way with pointers taken out of | ||
/// another `Box` in the same arena. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should be updated. It's not true that the pointer has to be from another Box
. If that was the case, then Vec::into_boxed_slice
which you're adding here would violate the contract, as it's not a Box
!
The requirement is that pointer needs to point to data in same arena (could be from a Vec
, a String
, anything else).
Actually, not even that is quite right. Intended usage is that it be data from the same arena, but the safety constraint I think is a bit looser - it can point to any data as long as that data outlives 'alloc
.
There's also a safety requirement about no aliasing.
And the NonNull<T>
must have been created from a mutable pointer *mut T
, not a *const T
, or that would be UB too.
This may all seem like nitpicking, but in my view it's vitally important that documentation for unsafe methods is accurate about what the contract is, otherwise it's impossible to verify the method is being used correctly.
@overlookmotel does it make sense to just abandon this PR? |
No, I think we can do it. I just think better to use this version: pub fn into_boxed_slice(self) -> Box<'alloc, [T]> {
let slice = self.0.leak();
let ptr = NonNull::from(slice);
// SAFETY: `ptr` points to a valid slice `[T]`.
// Lifetime of returned `Box<'alloc, [T]>` is same as lifetime of consumed `Vec<'alloc, T>`,
// so data in the `Box` will be valid for its lifetime.
unsafe { Box::from_non_null(ptr) }
} It's safe, and won't copy memory. |
Note that this PR does not implement the inverse operation (
Box::to_vec
for[T]
).