-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce new Chunk iteration APIs #8553
Conversation
teh-cmc
commented
Dec 20, 2024
•
edited
Loading
edited
- Way more ergonomic
- Same performance
- Less code
- Support for structs!
- Basis for new upcoming codegen'd deserialization APIs (...at some point)
Web viewer built successfully. If applicable, you should also test it:
Note: This comment is updated whenever you push a commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noice
let Some(inner_list_array) = array.as_any().downcast_ref::<Arrow2ListArray<i32>>() else { | ||
if cfg!(debug_assertions) { | ||
panic!("downcast failed for {component_name}, data discarded"); | ||
} else { | ||
re_log::error_once!("downcast failed for {component_name}, data discarded"); | ||
} | ||
return Either::Left(std::iter::empty()); | ||
}; | ||
|
||
let inner_offsets = inner_list_array.offsets(); | ||
let inner_lengths = inner_list_array.offsets().lengths().collect_vec(); | ||
|
||
let Some(fixed_size_list_array) = inner_list_array | ||
.values() | ||
.as_any() | ||
.downcast_ref::<Arrow2FixedSizeListArray>() | ||
else { | ||
if cfg!(debug_assertions) { | ||
panic!("downcast failed for {component_name}, data discarded"); | ||
} else { | ||
re_log::error_once!("downcast failed for {component_name}, data discarded"); | ||
} | ||
return Either::Left(std::iter::empty()); | ||
}; | ||
|
||
let Some(values) = fixed_size_list_array | ||
.values() | ||
.as_any() | ||
.downcast_ref::<Arrow2PrimitiveArray<T>>() | ||
else { | ||
if cfg!(debug_assertions) { | ||
panic!("downcast failed for {component_name}, data discarded"); | ||
} else { | ||
re_log::error_once!("downcast failed for {component_name}, data discarded"); | ||
} | ||
return Either::Left(std::iter::empty()); | ||
}; | ||
|
||
let size = fixed_size_list_array.size(); | ||
let values = values.values(); | ||
|
||
// NOTE: No need for validity checks here, `iter_offsets` already takes care of that. | ||
Either::Right(component_offsets.map(move |(idx, len)| { | ||
let inner_offsets = &inner_offsets.as_slice()[idx..idx + len]; | ||
let inner_lengths = &inner_lengths.as_slice()[idx..idx + len]; | ||
izip!(inner_offsets, inner_lengths) | ||
.map(|(&idx, &len)| { | ||
let idx = idx as usize; | ||
bytemuck::cast_slice(&values[idx * size..idx * size + len * size]) | ||
}) | ||
.collect_vec() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everything except collect_vec
is not dependent on N
-> we likely can save a lot of template instantiations by putting this into a second level utility method that only depends on T
but not on N
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I'm ready to add even more layers and complex return types to all of this (we need the intermediary values too...) for something I don't really have a way to measure right now :/ Also this is private, no the number of instantiations is fixed, at least.