Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python,rust): Make read_json accept empty list input and return empty df #18594

Closed
wants to merge 54 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
b07dae2
make_read_json_accept_empty_list
deanm0000 Sep 6, 2024
9cd24cb
lint
deanm0000 Sep 6, 2024
1e19ce0
fmt
deanm0000 Sep 6, 2024
3ba820b
clippy
deanm0000 Sep 6, 2024
106e239
refactor(rust): Fix nan-ignoring max/min in new-streaming (#18593)
orlp Sep 6, 2024
4516399
print_delta_version
deanm0000 Sep 6, 2024
fcdae15
rd2
deanm0000 Sep 6, 2024
4f7904b
again
deanm0000 Sep 6, 2024
ed79a06
wee
deanm0000 Sep 6, 2024
17cf78f
docs: Fix multiprocessing docs regarding fork method check (#18563)
alonme Sep 7, 2024
9a256a3
test(python): Fix delta test merge (#18601)
ion-elgreco Sep 7, 2024
80769d2
feat: Add IEJoin algorithm for non-equi joins and support Full non-ec…
adamreeve Sep 7, 2024
de344d6
perf: Add upfront partitioning in `ColumnChunkMetadata` (#18584)
nameexhaustion Sep 7, 2024
5d7f41e
fix: Allow for date/datetime subclasses (e.g. pd.Timestamp, FreezeGun…
MarcoGorelli Sep 7, 2024
a0d28fd
fix(rust): Functions for streaming require `streaming` feature (#18602)
eitsupi Sep 7, 2024
aa0b1ed
test(python): Use `streaming` argument in `test_parquet_slice_pushdow…
megaserg Sep 7, 2024
ac4b114
feat(python): Support shortcut eval of common boolean filters in SQL …
alexander-beedie Sep 7, 2024
6037ca5
fix: Fix group first value after group-by slice (#18603)
ritchie46 Sep 7, 2024
98788b2
ci: Fix Python docs build (#18605)
MarcoGorelli Sep 8, 2024
04b9e82
refactor: Raise on suffixed predicate in join_where (#18607)
ritchie46 Sep 8, 2024
6076421
refactor: Check number of binary comparisons in join_where predicates…
ritchie46 Sep 8, 2024
18d3073
perf: Remove cloning of `ColumnChunkMetadata` (#18615)
nameexhaustion Sep 9, 2024
d3a14de
fix(rust): Indicative error in `list.gather` when wrong indices type …
barak1412 Sep 9, 2024
aa3b2c3
test(python): Add benchmark tests for join_where with inequalities (#…
adamreeve Sep 9, 2024
ac2456c
feat: Add support for `IO[bytes]` and `bytes` in `scan_{...}` functio…
coastalwhite Sep 9, 2024
0fe5914
refactor(rust): One simplify expression module and keep utility local…
ritchie46 Sep 9, 2024
1a1796a
perf: Don't traverse file list twice for extension validation (#18620)
nameexhaustion Sep 9, 2024
253057a
refactor(rust): Remove extra schema traits (#18616)
nameexhaustion Sep 9, 2024
433d6c0
fix(python): Raise if single argument form in `replace`/`replace_stri…
henryharbeck Sep 9, 2024
4d176e0
refactor(rust): Fix unimplemented panics to give todo!s for AUTO_NEW_…
orlp Sep 9, 2024
72d861e
fix: Properly slice validity mask on pl.Object series (#18631)
orlp Sep 9, 2024
76a340b
fix: Use Buffer<T> in ObjectSeries, fixes variety of offset bugs (#18…
orlp Sep 9, 2024
45c8e96
refactor: Change join_where semantics (#18640)
ritchie46 Sep 10, 2024
8ebd739
refactor(rust): Rename `MetaData` -> `Metadata` (#18644)
nameexhaustion Sep 10, 2024
38b376c
fix: Enable "polars-json/timezones" feature from "polars-io" (#18635)
philss Sep 10, 2024
5ccb238
fix: Scanning hive partitioned files where hive columns are partially…
nameexhaustion Sep 10, 2024
832aa53
refactor(rust): Scan from BytesIO in new-streaming parquet source (#1…
nameexhaustion Sep 10, 2024
1ee6a82
chore(rust): Feature gate iejoin (#18646)
ritchie46 Sep 10, 2024
5658e65
chore: Check predicates in join_where (#18648)
ritchie46 Sep 10, 2024
fe04390
refactor(rust): Split `parquet_source.rs` in new-streaming (#18649)
nameexhaustion Sep 10, 2024
760ab20
chore: Don't raise on multiple same names in ie_join (#18658)
ritchie46 Sep 10, 2024
655c781
refactor(rust): Rename `MemSlice::from_slice` -> `MemSlice::from_stat…
nameexhaustion Sep 10, 2024
2e92f0a
refactor: Fix a bunch of tests for new-streaming (#18659)
orlp Sep 10, 2024
ad9a1d8
refactor(rust): Remove duplicate byte range calc from new parquet sou…
nameexhaustion Sep 10, 2024
5c4e7e9
refactor(rust): Re-use already decoded metadata for first path (new-p…
nameexhaustion Sep 10, 2024
16d8c83
make_read_json_accept_empty_list
deanm0000 Sep 6, 2024
1e23e5e
lint
deanm0000 Sep 6, 2024
f0dcd24
fmt
deanm0000 Sep 6, 2024
cd04dee
clippy
deanm0000 Sep 6, 2024
03d9eba
print_delta_version
deanm0000 Sep 6, 2024
a92121d
rd2
deanm0000 Sep 6, 2024
270fa93
again
deanm0000 Sep 6, 2024
0adb6de
wee
deanm0000 Sep 6, 2024
a90705e
Merge branch 'empty_json' of https://github.com/deanm0000/polars into…
deanm0000 Sep 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ FILTER_PIP_WARNINGS=| grep -v "don't match your environment"; test $${PIPESTATUS
requirements: .venv ## Install/refresh Python project requirements
@unset CONDA_PREFIX \
&& $(VENV_BIN)/python -m pip install --upgrade uv \
&& $(VENV_BIN)/uv pip install --upgrade --compile-bytecode \
&& $(VENV_BIN)/uv pip install --upgrade --compile-bytecode --no-build \
-r py-polars/requirements-dev.txt \
-r py-polars/requirements-lint.txt \
-r py-polars/docs/requirements-docs.txt \
Expand Down
14 changes: 2 additions & 12 deletions crates/polars-core/src/chunked_array/from_iterator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
use std::borrow::{Borrow, Cow};

#[cfg(feature = "object")]
use arrow::bitmap::{Bitmap, MutableBitmap};
use arrow::bitmap::MutableBitmap;

use crate::chunked_array::builder::{get_list_builder, AnonymousOwnedListBuilder};
#[cfg(feature = "object")]
Expand Down Expand Up @@ -268,17 +268,7 @@ impl<T: PolarsObject> FromIterator<Option<T>> for ObjectChunked<T> {
})
.collect();

let null_bit_buffer: Option<Bitmap> = null_mask_builder.into();
let null_bitmap = null_bit_buffer;

let len = values.len();

let arr = Box::new(ObjectArray {
values: Arc::new(values),
null_bitmap,
offset: 0,
len,
});
let arr = Box::new(ObjectArray::from(values).with_validity(null_mask_builder.into()));
ChunkedArray::new_with_compute_len(
Arc::new(Field::new(PlSmallStr::EMPTY, get_object_type::<T>())),
vec![arr],
Expand Down
2 changes: 1 addition & 1 deletion crates/polars-core/src/chunked_array/iterator/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -432,7 +432,7 @@ impl<T: PolarsObject> ObjectChunked<T> {
// we know that we only iterate over length == self.len()
unsafe {
self.downcast_iter()
.flat_map(|arr| arr.values().iter())
.flat_map(|arr| arr.values_iter())
.trust_my_length(self.len())
}
}
Expand Down
15 changes: 13 additions & 2 deletions crates/polars-core/src/chunked_array/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -690,6 +690,14 @@ where
}
}

#[inline]
pub fn first(&self) -> Option<T::Physical<'_>> {
unsafe {
let arr = self.downcast_get_unchecked(0);
arr.get_unchecked(0)
}
}

#[inline]
pub fn last(&self) -> Option<T::Physical<'_>> {
unsafe {
Expand Down Expand Up @@ -950,9 +958,12 @@ pub(crate) fn to_array<T: PolarsNumericType>(

impl<T: PolarsDataType> Default for ChunkedArray<T> {
fn default() -> Self {
let dtype = T::get_dtype();
let arrow_dtype = dtype.to_physical().to_arrow(CompatLevel::newest());
ChunkedArray {
field: Arc::new(Field::new(PlSmallStr::EMPTY, DataType::Null)),
chunks: Default::default(),
field: Arc::new(Field::new(PlSmallStr::EMPTY, dtype)),
// Invariant: always has 1 chunk.
chunks: vec![new_empty_array(arrow_dtype)],
md: Arc::new(IMMetadata::default()),
length: 0,
null_count: 0,
Expand Down
18 changes: 6 additions & 12 deletions crates/polars-core/src/chunked_array/object/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,8 @@ where
.unwrap_or(0) as IdxSize;

let arr = Box::new(ObjectArray {
values: Arc::new(self.values),
null_bitmap,
offset: 0,
len,
values: self.values.into(),
validity: null_bitmap,
});

self.field.dtype = get_object_type::<T>();
Expand Down Expand Up @@ -140,10 +138,8 @@ where
let field = Arc::new(Field::new(name, DataType::Object(T::type_name(), None)));
let len = v.len();
let arr = Box::new(ObjectArray {
values: Arc::new(v),
null_bitmap: None,
offset: 0,
len,
values: v.into(),
validity: None,
});

unsafe { ObjectChunked::new_with_dims(field, vec![arr], len as IdxSize, 0) }
Expand All @@ -154,10 +150,8 @@ where
let len = v.len();
let null_count = validity.unset_bits();
let arr = Box::new(ObjectArray {
values: Arc::new(v),
null_bitmap: Some(validity),
offset: 0,
len,
values: v.into(),
validity: Some(validity),
});

unsafe {
Expand Down
65 changes: 33 additions & 32 deletions crates/polars-core/src/chunked_array/object/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ use std::hash::Hash;

use arrow::bitmap::utils::{BitmapIter, ZipValidity};
use arrow::bitmap::{Bitmap, MutableBitmap};
use arrow::buffer::Buffer;
use polars_utils::total_ord::TotalHash;

use crate::prelude::*;
Expand All @@ -22,10 +23,8 @@ pub struct ObjectArray<T>
where
T: PolarsObject,
{
pub(crate) values: Arc<Vec<T>>,
pub(crate) null_bitmap: Option<Bitmap>,
pub(crate) offset: usize,
pub(crate) len: usize,
values: Buffer<T>,
validity: Option<Bitmap>,
}

/// Trimmed down object safe polars object
Expand Down Expand Up @@ -80,23 +79,18 @@ impl<T> ObjectArray<T>
where
T: PolarsObject,
{
/// Get a reference to the underlying data
pub fn values(&self) -> &Arc<Vec<T>> {
&self.values
}

pub fn values_iter(&self) -> ObjectValueIter<'_, T> {
self.values.iter()
}

/// Returns an iterator of `Option<&T>` over every element of this array.
pub fn iter(&self) -> ZipValidity<&T, ObjectValueIter<'_, T>, BitmapIter> {
ZipValidity::new_with_validity(self.values_iter(), self.null_bitmap.as_ref())
ZipValidity::new_with_validity(self.values_iter(), self.validity.as_ref())
}

/// Get a value at a certain index location
pub fn value(&self, index: usize) -> &T {
&self.values[self.offset + index]
&self.values[index]
}

pub fn get(&self, index: usize) -> Option<&T> {
Expand All @@ -123,7 +117,7 @@ where
/// No bounds checks
#[inline]
pub unsafe fn is_valid_unchecked(&self, i: usize) -> bool {
if let Some(b) = &self.null_bitmap {
if let Some(b) = &self.validity {
b.get_bit_unchecked(i)
} else {
true
Expand Down Expand Up @@ -157,7 +151,7 @@ where
if matches!(&validity, Some(bitmap) if bitmap.len() != self.len()) {
panic!("validity must be equal to the array's length")
}
self.null_bitmap = validity;
self.validity = validity;
}
}

Expand All @@ -182,10 +176,12 @@ where
}

unsafe fn slice_unchecked(&mut self, offset: usize, length: usize) {
let len = std::cmp::min(self.len - offset, length);

self.len = len;
self.offset = offset;
self.validity = self
.validity
.take()
.map(|bitmap| bitmap.sliced_unchecked(offset, length))
.filter(|bitmap| bitmap.unset_bits() > 0);
self.values.slice_unchecked(offset, length);
}

fn split_at_boxed(&self, offset: usize) -> (Box<dyn Array>, Box<dyn Array>) {
Expand All @@ -199,11 +195,11 @@ where
}

fn len(&self) -> usize {
self.len
self.values.len()
}

fn validity(&self) -> Option<&Bitmap> {
self.null_bitmap.as_ref()
self.validity.as_ref()
}

fn with_validity(&self, validity: Option<Bitmap>) -> Box<dyn Array> {
Expand All @@ -219,7 +215,7 @@ where
}

fn null_count(&self) -> usize {
match &self.null_bitmap {
match &self.validity {
None => 0,
Some(validity) => validity.unset_bits(),
}
Expand All @@ -232,18 +228,16 @@ impl<T: PolarsObject> Splitable for ObjectArray<T> {
}

unsafe fn _split_at_unchecked(&self, offset: usize) -> (Self, Self) {
let (left_values, right_values) = unsafe { self.values.split_at_unchecked(offset) };
let (left_validity, right_validity) = unsafe { self.validity.split_at_unchecked(offset) };
(
Self {
values: self.values.clone(),
null_bitmap: self.null_bitmap.clone(),
len: offset,
offset: self.offset,
values: left_values,
validity: left_validity,
},
Self {
values: self.values.clone(),
null_bitmap: self.null_bitmap.clone(),
len: self.len() - offset,
offset: self.offset + offset,
values: right_values,
validity: right_validity,
},
)
}
Expand Down Expand Up @@ -273,10 +267,8 @@ impl<T: PolarsObject> StaticArray for ObjectArray<T> {

fn full_null(length: usize, _dtype: ArrowDataType) -> Self {
ObjectArray {
values: Arc::new(vec![T::default(); length]),
null_bitmap: Some(Bitmap::new_with_value(false, length)),
offset: 0,
len: length,
values: vec![T::default(); length].into(),
validity: Some(Bitmap::new_with_value(false, length)),
}
}
}
Expand Down Expand Up @@ -324,3 +316,12 @@ where
}
}
}

impl<T: PolarsObject> From<Vec<T>> for ObjectArray<T> {
fn from(values: Vec<T>) -> Self {
Self {
values: values.into(),
validity: None,
}
}
}
8 changes: 8 additions & 0 deletions crates/polars-core/src/datatypes/any_value.rs
Original file line number Diff line number Diff line change
Expand Up @@ -499,6 +499,14 @@ impl<'a> AnyValue<'a> {
)
}

pub fn is_nan(&self) -> bool {
match self {
AnyValue::Float32(f) => f.is_nan(),
AnyValue::Float64(f) => f.is_nan(),
_ => false,
}
}

pub fn is_null(&self) -> bool {
matches!(self, AnyValue::Null)
}
Expand Down
13 changes: 1 addition & 12 deletions crates/polars-core/src/datatypes/static_array_collect.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
use std::sync::Arc;

use arrow::array::ArrayFromIter;
use arrow::bitmap::Bitmap;

use crate::chunked_array::object::{ObjectArray, PolarsObject};

Expand Down Expand Up @@ -41,14 +38,6 @@ impl<'a, T: PolarsObject> ArrayFromIter<Option<&'a T>> for ObjectArray<T> {
})
.collect::<Result<Vec<T>, E>>()?;

let null_bit_buffer: Option<Bitmap> = null_mask_builder.into();
let null_bitmap = null_bit_buffer;
let len = values.len();
Ok(ObjectArray {
values: Arc::new(values),
null_bitmap,
offset: 0,
len,
})
Ok(ObjectArray::from(values).with_validity(null_mask_builder.into()))
}
}
5 changes: 5 additions & 0 deletions crates/polars-core/src/scalar/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,11 @@ impl Scalar {
self.value.is_null()
}

#[inline(always)]
pub fn is_nan(&self) -> bool {
self.value.is_nan()
}

#[inline(always)]
pub fn value(&self) -> &AnyValue<'static> {
&self.value
Expand Down
Loading
Loading