0.127.0
π Enhanced Frequency Analysis π
This a quick release adding several frequency
enhancements for more detailed frequency analysis. The frequency
command now includes a percentage column, calculates other
values, and supports limiting unique counts and negative limits.
These options provides additional context for Datapusher+, qsv-pro and describegpt
so their metadata inferences are more accurate and comprehensive.
Previously, for a 775-row CSV file containing one column named state
with entries for all 50 states, frequency
only showed1:
qsv frequency freq_state_example.csv | qsv table
field value count
state NY 100
state NJ 70
state CA 60
state MA 55
state FL 45
state TX 43
state NM 40
state AZ 39
state NV 38
state MI 35
Now, there's a new percentage
column and other
values calculation, both of which have configurable options:
qsv frequency freq_state_example.csv | qsv table
field value count percentage
state NY 100 12.90323
state NJ 70 9.03226
state CA 60 7.74194
state MA 55 7.09677
state FL 45 5.80645
state TX 43 5.54839
state NM 40 5.16129
state AZ 39 5.03226
state NV 38 4.90323
state MI 35 4.51613
state Other (40) 250 32.25806
This release is also out of cycle to address a big performance regression in the excel
command caused by unnecessary formula info retrieval for the --error-format
option introduced in 0.126.0. This has been fixed, and the excel
command is now back to its speedy self.
Added
frequency
: added percentage column;other
values calculation, implementing #1774 #1775benchmarks
: added newfrequency
andexcel
benchmarks b83ad3a
Changed
- contrib(bashly): update completions.bash for qsv v0.126.0 by @rzmk in #1771
- build(deps): bump mimalloc from 0.1.39 to 0.1.41 by @dependabot in #1772
- build(deps): bump qsv-stats from 0.14.0 to 0.15.0 by @dependabot in #1773
- updated several indirect dependencies
- applied select clippy recommendations
Fixed
excel
: fixed performance regression because qsv was unnecessarily getting formula info (an expensive operation) for--error-format
option even when not required 772af34- renamed 0.126.0 sqlp_vs_duckdb benchmark results so they're next to each other for easy direct comparison. 7bcd59e.
Per the benchmarks,sqlp
is 2.87 times faster than duckdb v0.10.2 for a simple aggregation (0.066 secs vs 0.19 secs), and 1.42 times faster for an "expensive" aggregation (0.143 secs vs 0.203 secs).
Full Changelog: 0.126.0...0.127.0
-
with its default
--limit
setting of 10 only show the top 10 unique values in the column, sorted by occurence β©