fst v0.8.2
fst package v0.8.2
New features
-
Package
fst
has support for multi-threading using OpenMP. Compression, decompression and disk IO have been largely parallelized for (much) improved performance. -
Many new column types are now supported by the
fst
format (where appropriate, both thedouble
andinteger
variants are supported):raw
DateTime
integer64
nanotime
POSIXct
ordered factors
difftime
Thanks @arunsrinivasan, @derekholmes, @phillc73, @HughParsonage, @statquant, @eddelbuettel, @eipi10, and @verajosemanuel for feature requests and helpful discussions.
-
Multi-threaded
LZ4
andZSTD
compression using methodscompress_fst
anddecompress_fst
. These methods provide a direct API to theLZ4
andZSTD
compressors at speeds of multiple GB/s. A specific block format is used to facilitate parallel processing. For additional stability, hashes can be calculated if required. -
Method
hash_fst
provides an extremely fast multi-threaded 64-bit hashing algorithm based onxxHash
. Speeds up to the memory bandwidth can be achieved. -
Faster conversion to
data.table
inread_fst
. Thanks @dselivanov -
Package
data.table
is now an optional dependency. Thanks @jimhester. Note that in the near future, a dependency ondata.table
will probably be introduced again, asfst
will get adata.table
-like interface. -
The
fst
format has a magic number to be able to identify afst
file without actually opening the file or requiring thefstlib
library. Thanks @davidanthoff. -
For development versions, the build number is now shown when fst is loaded. Thanks @skanskan.
-
Character encodings are preserved for character and factor columns. Thanks @carioca67 and @adrianadermon
-
Naming of fst methods is now consistent. Thanks @jimhester and @Brinkhuis.
-
The core C++ code with the API to read and write
fst
files, and use compression and hashing now lives in a separate library calledfstlib
. Although not visible to the user, this is a major development allowingfst
to be implemented for other languages thanR
(with comparable performance).
Bugs solved
-
Tilde-expansion in
write_fst
not correctly processed. Thanks @HughParsonage, @PoGibas. -
Writing more than INT_MAX rows crashes
fst
. Thanks @wei-wu-nyc -
Incorrect fst file is created when an empty data.table is saved. Thanks @wei-wu-nyc.
-
Error/crash when saving factor column with 0 factor levels. Thanks @martinblostein.
-
No warning was given when disk runs out of space during a
fstwrite
operation. -
Stack imbalance warnings under centain conditions. Thanks @ryankennedyio
Benchmarks
Thanks to @mattdowle, @st-pasha, @phillc73 for valuable discussions on fst
benchmarks and how to accurately perform (and present) them.
Additional credits
-
Special thanks to @arunsrinivasan for a lot of valuable discussions on the future direction of the
fst
package, I hopefst
may continue to benefit from your experience! -
Thanks for reporting and discussing various bugs, inconsistencies, instabilities or installation problems to @treysp, @wei-wu-nyc, @khsu15, @PMassicotte, @xiaodaigh, @renkun-ken, @statquant, @tgolden23, @carioca67, @jzzcutler, @MehranMoghtadai.
-
And thanks to @mperone, @kendonB, @xiaodaigh, @derekholmes, @pmakai, @1beb, @BenoitLondon, @skanskan, @petermuller71, @nextpagesoft, @cawthm, @jeroenjanssens, @dselivanov, @Fpadt and @kbroman for helpful (online) discussions and (feature) requests. All the community feedback is much appreciated and tremendously helps to to improve the stability and usability of
fst
! (if I missed anyone, I apologize in advance, please let me know and I will fix this document ASAP)