Skip to content

0.132.0

Compare
Choose a tag to compare
@jqnatividad jqnatividad released this 21 Aug 10:34
· 1523 commits to master since this release

Highlights

With this release, we finally finish the stats caching refactor started in 0.131.0, replacing the binary encoded stats cache with a simpler JSONL cache. The stats cache stores the necessary statistical metadata to make several key commands smarter & faster. Per the benchmarks:

  • frequency is 6x faster (frequency_index_stats_mode_auto).
    Not only is it faster, it now doesn't need to compile a hashmap for columns with ALL unique values (e.g. ID columns) - practically, making it able to handle "real-world" datasets of any size (that is, unless all the columns have ALL unique cardinalities. In that case, the entire CSV will have to fit into memory).
  • tojsonl is 2.67x faster (tojsonl_index)
  • schema is two orders of magnitude (100x) faster!!! (schema_index)

The stats cache also provides the foundation for even more "smart" features and commands in the future. It also has the side-benefit of adding a way to produce stats in JSONL format that can be used for other purposes beyond qsv.

The search, searchset, and replace commands now also have a --literal option that allows you to search for and replace strings with regex special/reserved characters. This makes it easier to search for and replace strings that contain otherwise reserved regex characters without having to escape them (especially useful with URL columns that often contain characters like ?,:,-,., etc.)


Added

  • search, searchset & replace: add --literal option #2060 & 7196053
  • slice: added usage text examples 04afaa3
  • publish: added workflow to build "portable" binaries with CPU features disabled
  • contrib(completions): add --literal for search and searchset by @rzmk in #2061
  • contrib(completions): add --literal completion to replace by @rzmk in #2062
  • add more polars metadata in --version info #2073
  • docs: added more info to SECURITY.md 609d4df
  • docs: expanded Goals/Non-Goals 54998e3
  • docs: added Installation "Option 0" quick start bf5bf82
  • added search --literal benchmark

Changed

  • stats, schema, frequency & tojsonl: stats caching refactor, replacing binary encoded stats cache with a simpler JSONL cache #2055

  • rename stats --stats-json option to stats --stats-jsonl #2063

  • changed "broken pipe" error to a warning 7353275

  • docs: update multithreading and caching sections of PERFORMANCE.md 5e6bc45

  • deps: switch to our qsv-optimized fork of csv crate 3fc1e82

  • deps: bump polars from 0.41.3 to 0.42.0 #2051

  • build(deps): bump actix-web from 4.8.0 to 4.9.0 by @dependabot in #2041

  • build(deps): bump flate2 from 1.0.31 to 1.0.32 by @dependabot in #2071

  • build(deps): bump indexmap from 2.3.0 to 2.4.0 by @dependabot in #2049

  • build(deps): bump reqwest from 0.12.6 to 0.12.7 by @dependabot in #2070

  • build(deps): bump rust_decimal from 1.35.0 to 1.36.0 by @dependabot in #2068

  • build(deps): bump serde from 1.0.205 to 1.0.206 by @dependabot in #2043

  • build(deps): bump serde from 1.0.206 to 1.0.207 by @dependabot in #2047

  • build(deps): bump serde from 1.0.207 to 1.0.208 by @dependabot in #2054

  • build(deps): bump serde_json from 1.0.122 to 1.0.124 by @dependabot in #2045

  • build(deps): bump serde_json from 1.0.124 to 1.0.125 by @dependabot in #2052

  • apply select clippy lint suggestions

  • updated several indirect dependencies

  • made various usage text improvements

Fixed

  • stats: fix --output delimiter inferencing based on file extension #2065
  • make process_input helper handle stdin better #2058
  • docs: fix completions for --stats-jsonl and qsv pro installation text update by @rzmk in #2072
  • docs: added Note about why luau feature is disabled in musl binaries - ffa2bc5 & 27d0f8e

Removed

  • Removed bincode dependency now that we're using JSONL stats cache #2055 babd92b

Full Changelog: 0.131.1...0.132.0