Skip to content

fst v0.8.2

Compare
Choose a tag to compare
@MarcusKlik MarcusKlik released this 13 Dec 23:43
· 302 commits to master since this release

fst package v0.8.2

New features

  • Package fst has support for multi-threading using OpenMP. Compression, decompression and disk IO have been largely parallelized for (much) improved performance.

  • Many new column types are now supported by the fst format (where appropriate, both the double and integer variants are supported):

    • raw
    • DateTime
    • integer64
    • nanotime
    • POSIXct
    • ordered factors
    • difftime

    Thanks @arunsrinivasan, @derekholmes, @phillc73, @HughParsonage, @statquant, @eddelbuettel, @eipi10, and @verajosemanuel for feature requests and helpful discussions.

  • Multi-threaded LZ4 and ZSTD compression using methods compress_fst and decompress_fst. These methods provide a direct API to the LZ4 and ZSTD compressors at speeds of multiple GB/s. A specific block format is used to facilitate parallel processing. For additional stability, hashes can be calculated if required.

  • Method hash_fst provides an extremely fast multi-threaded 64-bit hashing algorithm based on xxHash. Speeds up to the memory bandwidth can be achieved.

  • Faster conversion to data.table in read_fst. Thanks @dselivanov

  • Package data.table is now an optional dependency. Thanks @jimhester. Note that in the near future, a dependency on data.table will probably be introduced again, as fst will get a data.table-like interface.

  • The fst format has a magic number to be able to identify a fst file without actually opening the file or requiring the fstlib library. Thanks @davidanthoff.

  • For development versions, the build number is now shown when fst is loaded. Thanks @skanskan.

  • Character encodings are preserved for character and factor columns. Thanks @carioca67 and @adrianadermon

  • Naming of fst methods is now consistent. Thanks @jimhester and @Brinkhuis.

  • The core C++ code with the API to read and write fst files, and use compression and hashing now lives in a separate library called fstlib. Although not visible to the user, this is a major development allowing fst to be implemented for other languages than R (with comparable performance).

Bugs solved

  • Tilde-expansion in write_fst not correctly processed. Thanks @HughParsonage, @PoGibas.

  • Writing more than INT_MAX rows crashes fst. Thanks @wei-wu-nyc

  • Incorrect fst file is created when an empty data.table is saved. Thanks @wei-wu-nyc.

  • Error/crash when saving factor column with 0 factor levels. Thanks @martinblostein.

  • No warning was given when disk runs out of space during a fstwrite operation.

  • Stack imbalance warnings under centain conditions. Thanks @ryankennedyio

Benchmarks

Thanks to @mattdowle, @st-pasha, @phillc73 for valuable discussions on fst benchmarks and how to accurately perform (and present) them.

Additional credits