R crashes while reading an fst file #271

sanjmeh · 2022-11-07T12:03:59Z

A simple fst read can send R crashing down, if the file is corrupted !

How could a data file be so bad that it sends R crashing? Perhaps the fst read function has some aggressive memory management that interferes with the OS.

To replicate, just executing a simple

fst(filename)

And you will get:

<fst file>
323140 rows, 4 columns (1204011660.fst)

And then a series of error messages, followed by R crashing.

[2706278:2706278:20221107,172349.750548:ERROR process_memory_range.cc:86] read out of range
[2706278:2706278:20221107,172349.750641:ERROR elf_image_reader.cc:558] missing nul-terminator
[2706278:2706278:20221107,172349.750779:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.754375:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.754446:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.754496:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.754544:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.754599:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.754816:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.755118:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.755175:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.755228:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.755292:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.755729:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.755814:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.755867:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.755921:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.755983:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756097:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756154:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756204:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756255:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756320:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756367:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756419:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756469:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756521:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756573:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756619:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756669:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756716:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756769:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756819:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756873:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756923:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.756976:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757028:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757079:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757193:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757244:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757325:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757375:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757425:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757472:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757521:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757578:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757630:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757683:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757733:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757785:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757837:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.757893:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758081:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758130:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758180:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758228:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758311:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758359:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758401:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758456:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758506:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758557:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758610:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758658:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758721:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758765:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758840:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758917:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.758996:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.759050:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.759100:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.759149:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.759200:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.759249:ERROR elf_dynamic_array_reader.h:61] tag not found
[2706278:2706278:20221107,172349.760165:ERROR file_io_posix.cc:140] open /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq: No such file or directory (2)
[2706278:2706278:20221107,172349.760187:ERROR file_io_posix.cc:140] open /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq: No such file or directory (2)
[2706278:2706280:20221107,172349.766128:ERROR directory_reader_posix.cc:42] opendir: No such file or directory (2)

I have uploaded the offending file here.
https://drive.google.com/file/d/1hYJLAcqct_5JxTNNXN1c-qKH9bWFhgmO/view?usp=sharing

The text was updated successfully, but these errors were encountered:

eddelbuettel · 2022-11-15T19:17:15Z

Can you please try to turn this into self-contained reproducible example with a script creating a file which subsequently crashes R?

No sane person will read a random binary file off the internet.

sanjmeh · 2022-11-16T06:55:21Z

@eddelbuettel : thanks for looking at this.
Generating the corrupted file with a script looks very difficult because currently there are thousands of fst files created/ overwritten through a crontab scheduler that runs every minute (IOT data keeps coming from thousands of vehicles and we store their tracking, and fuel level data in fst files). A corruption happens in around 1 in thousand writes, and we donot (yet) know how that corruption happens. It is a random event. I suspected the multi core read/write of fst was creating impossible memory allocations but that was just a hunch. I really donot know how to recreate the corruptions using a script.

To prevent multicore I have also added the folowing two lines, as recommended in one of the github issue threads.

fst::threads_fst(nr_of_threads = 1)
fst::threads_fst(reset_after_fork = F)

But I still regularly get the corruptions and the resultant crashes.

MarcusKlik · 2022-11-16T10:01:35Z

Hi @sanjmeh, thanks for reporting. And I will definitely adhere to @eddelbuettel's warning to not try to load your binary file :-)

In the fst format, all meta-data is hashed. So if this data becomes corrupted for some reason, it's extremely unlikely that the file will read without throwing a (friendly) error. Obviously, a maleficent agent could alter the metadata and the stored hashes to overcome this problem and mess up a file read.

The metadata determines how much memory is allocated for storing the result table. However, the actual column data is decompressed from data blocks in the file using zstd or lz4 decompression. In rare cases, malformed data blocks can cause a crash in those libraries during this decompression phase.

To remedy, we could use safe versions of the lz4 and zstd decompression functions, but these will destroy the performance.

Alternatively, fst could provide an option to hash the datablock as well (something like write_fst(x, path, hash_data = TRUE)). For these hashed files, reads could be done using read_fst(path, check_hashes = TRUE)) for example.

This will have a smaller impact on performance and could be used for files read from internet or other suspicious sources (and would need to be done only once after downloading).

sanjmeh · 2022-11-16T15:27:53Z

Thank you @MarcusKlik and welcome back to your own repository. That was indeed a long break and I was afraid if you would be back soon.
Now on your suggested path:

Alternatively, fst could provide an option to hash the datablock as well (something like write_fst(x, path, hash_data = > TRUE)). For these hashed files, reads could be done using read_fst(path, check_hashes = TRUE)) for example.

I donot see the hash_data argument in write.fst()..
I presume you are proposing this functionality and it is not existing in the current version - the feature to hash data.

Meanwhile I will test the first alternative:

To remedy, we could use safe versions of the lz4 and zstd decompression functions, but these will destroy the performance.

If you may please specify how to try the safe options, it will be helpful, as I cannot locate the arguments till now.

By the way can I request you to have a look at the fst file I attached and not treat it as any random binary file from the internet. I am here to claim that it is originating from my system, and not from the internet :-)

eddelbuettel · 2022-11-16T15:31:16Z

@sanjmeh As another open-source volunteer I am am a little surprised by your tone. We give you our labor for free.

sanjmeh · 2022-11-16T18:05:27Z

Oh my! my intention is not at all to offend you guys. You are doing a fantastic job in the open source community of R, and so would never want to turn you away. I hope I am making the fst package more popular by asking to make it more robust. Let me know what was hurtful. thanks.

MarcusKlik · 2022-11-17T10:37:05Z

Yes, unfortunately time is a scarce resource that can only be spent once (except for @eddelbuettel, my theory is that Dirk is somehow able to clone himself into identical copies that can do work in parallel, proof pending...) :-)

About your file @sanjmeh, I will scan the metadata from a container and take a look where things go wrong.

sanjmeh · 2022-12-27T09:03:25Z

Hi Marcus, any progress on the bug?

jfdesomzee · 2023-07-14T08:39:01Z

Hello,

I'm suffering from this bug too. Never had an issue before it appears when multiple machines started to write files on the shared drive.
Is there a way to test the file before trying to load it? Whenever I read a corrupted file I R crashes if I could get an error instead my problems would be solved.
fst rocks I want to keep using it. Please help. And thank you for the good job

AntonWijbenga · 2024-02-08T08:54:55Z

I have previously encoutered the error as well and today again. I suspect the .fst file becomes corrupt during a 'forced' system reboot on a Windows machine (which is a secondary solution on premise, primary/production is running in the cloud on Ubuntu).

I can read the metadata of the .fst file fine, but reading the whole file causes R to crash. I would be great if somehow this just results in an error instead of crashing R. I'm happy to provide the .fst file if needed for testing.

Otherwise the fst package is great and so far I haven't encountered a better alternative (except for maybe parquet because of cross languate (i.e. Python) support).

jfdesomzee · 2024-02-09T08:41:27Z

I switched from fst to qs. About the same perfomances, a bit faster. Only you need to read the whole file you cannot query rows or columns. But you can store any R object and store attributes.

sanjmeh · 2024-02-09T17:29:39Z

I switched from fst to qs. About the same perfomances, a bit faster. Only you need to read the whole file you cannot query rows or columns. But you can store any R object and store attributes.

And what is its advantage over RDS files?

eddelbuettel · 2024-02-09T17:38:48Z

@sanjmeh Start here: https://github.com/traversc/qs

qs and fst are both very good and improve over rds files which themselves are good and portable across R installations.

AntonWijbenga · 2024-02-12T08:09:11Z

Thank you for the tip. However, the ability to read only certain rows or columns is one of the main reasons I use the fst package.

I have matrices with measurements for each minute for a certain number of sensors. As a result I have matrices that are 1.440 (number of minutes in a day) x 18.000 or 80.000 (depending on the sensor type). Using these daily matrices and their pivoted clones, I can very quickly read just one minute of a specific day (the date is the filename, minute the n.th column) or read the 24 hour series of a sensor (again the date is the filename and the column name the ID of the sensor).

Reading such a column (or a set of hem) only takes a few milliseconds. Reading an entire year of a couple of sensor data (using their ID's) is done in a couple of seconds. It is very quick to create certain aggregates (over time) that way.

The same is true for reading several minute data for all sensors. For example, you can very quickly calculate a typical (average) value for a tuesday 11:00 based on a set of previous tuesdays (also 11:00).

The entire dataset is historically available from 2018 and is still updated every minute. It is about 500GB (compressed) and stored on SSD based storage (FSx for Lustre at AWS). Results are presented through a dashboard.

For these purposes it is simply way too slow to read the whole matrix every time. With the solution above, I can read in the 'sensor' dimension and 'time' dimension very quickly no matter if it is about recent or older data (no caching needed). I have also tested databases, but they are either too slow or too costly.

sanjmeh · 2024-02-12T11:11:03Z

I have matrices with measurements for each minute for a certain number of sensors. As a result I have matrices that are 1.440 (number of minutes in a day) x 18.000 or 80.000 (depending on the sensor type). Using these daily matrices and their pivoted clones, I can very quickly read just one minute of a specific day (the date is the filename, minute the n.th column) or read the 24 hour series of a sensor (again the date is the filename and the column name the ID of the sensor).

I have exactly the same application and we also started with fst package for exactly this reason. But I now have moved to mariadb due to this occassional corruption of the fst file. We use RDS for data upto 100 MB and move the data to RDBMS with primary index as time stamp so can quickly query a specific time range.

sanjmeh mentioned this issue Nov 13, 2022

Bug Report: In R: fst crash in both saving and reading very large files. (500M+ rows and 50+ columns, 100+GB) #46

Closed

MarcusKlik self-assigned this Nov 16, 2022

MarcusKlik added the bug label Nov 16, 2022

MarcusKlik added this to the Candidate milestone Nov 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R crashes while reading an fst file #271

R crashes while reading an fst file #271

sanjmeh commented Nov 7, 2022 •

edited

Loading

eddelbuettel commented Nov 15, 2022 •

edited

Loading

sanjmeh commented Nov 16, 2022

MarcusKlik commented Nov 16, 2022

sanjmeh commented Nov 16, 2022

eddelbuettel commented Nov 16, 2022

sanjmeh commented Nov 16, 2022

MarcusKlik commented Nov 17, 2022

sanjmeh commented Dec 27, 2022 •

edited

Loading

jfdesomzee commented Jul 14, 2023

AntonWijbenga commented Feb 8, 2024

jfdesomzee commented Feb 9, 2024

sanjmeh commented Feb 9, 2024

eddelbuettel commented Feb 9, 2024

AntonWijbenga commented Feb 12, 2024 •

edited

Loading

sanjmeh commented Feb 12, 2024

R crashes while reading an fst file #271

R crashes while reading an fst file #271

Comments

sanjmeh commented Nov 7, 2022 • edited Loading

eddelbuettel commented Nov 15, 2022 • edited Loading

sanjmeh commented Nov 16, 2022

MarcusKlik commented Nov 16, 2022

sanjmeh commented Nov 16, 2022

eddelbuettel commented Nov 16, 2022

sanjmeh commented Nov 16, 2022

MarcusKlik commented Nov 17, 2022

sanjmeh commented Dec 27, 2022 • edited Loading

jfdesomzee commented Jul 14, 2023

AntonWijbenga commented Feb 8, 2024

jfdesomzee commented Feb 9, 2024

sanjmeh commented Feb 9, 2024

eddelbuettel commented Feb 9, 2024

AntonWijbenga commented Feb 12, 2024 • edited Loading

sanjmeh commented Feb 12, 2024

sanjmeh commented Nov 7, 2022 •

edited

Loading

eddelbuettel commented Nov 15, 2022 •

edited

Loading

sanjmeh commented Dec 27, 2022 •

edited

Loading

AntonWijbenga commented Feb 12, 2024 •

edited

Loading