Skip to content

Dask and delta-rs integeration

Compare
Choose a tag to compare
@rajagurunath rajagurunath released this 14 Oct 07:21
· 54 commits to main since this release
68dce7f

This release builds a wrapper around the Rust package called delta-rs and uses dask for parallel reading.

Features:

  1. Reads the parquet files based on delta logs parallelly using the dask engine
  2. Supports all three filesystems like s3, azurefs, gcsfs
  3. Supports some delta features like
    • Time Travel
    • Schema evolution
    • parquet filters
      • row filter
      • partition filter
  4. Query Delta commit info - History
  5. vacuum the old/ unused parquet files
  6. load different versions of data using DateTime.