-
feature
- [python] add parquet compression
- [python] implicitly download raw files from the web in
read_parquet
,read_csv
. - [python | rust] methods for local peak finding in numerical series
- [python | rust] faster query optimization due to local memory arena's.
- [rust] reduce default compile time by making less features default.
-
bug fix
- [python] support file buffers for reading and writing csv and parquet
- [python | rust] fix csv-parser: allow new-line character in a string field
- [python | rust] don't let predicate-pushdown pass shift | sort operation to maintain correctness.
- performance
- [python | rust] use mimalloc global allocator
- [python | rust] undo performance regression on large number of threads
- bug fix
- [python | rust] fix accidental over-allocation in csv-parser
- [python] support agg (dictionary aggregation) for downsample
- performance
- [python | rust] categorical type groupby keys (use size hint)
- [python | rust] remove indirection layer in vector hasher
- [python | rust] improve performance of null array creation
- bug fix
- [python] implement set_with_mask for Boolean type
- [python | rust] don't panic (instead return null) in dataframe aggregation
std
andvar
- other
- [rust] internal refactors
- bug fix
- [python] fix various pyarrow related bugs
- feature
- [python] render html tables
- performance
- [python] default to pyarrow for parquet reading
- [python | rust] use u32 instead of usize in groupby and join to increase cache coherence and reduce memory pressure.
- Lot's of bug fixes
-
CSV Read IO
- Parallel csv reader
-
Sample DataFrames/ Series
-
Performance increase in take kernel
-
Performance increase in ChunkedArray builders
-
Join operation on multiple columns.
-
~3.5 x performance increase in groupby operations (measured on db-benchmark), due to embarrassingly parallel grouping and better branch prediction (tight loops).
-
Performance increase on join operation due to better branch prediction.
-
Categorical datatype and global string cache (BETA).
-
Lazy
- Lot's of bug fixes in optimizer.
- Parallel execution of Physical plan
- Partition window function
- More simplify expression optimizations.
- Caching
- Alpha release of Aggregate pushdown optimization.
-
Start of general Object type in ChunkedArray/DataFrames/Series