Data loss when using compression (ros2 bag record) #978

chrmel · 2022-03-25T11:17:35Z

Description

Recording big bags with ros2 bag record --max-bag-size=2000000000 --compression-mode file --compression-format zstd topics get lost during compression.

Expected Behavior

When a new bag is opened and the old one gets compressed I would expect the new bag to contain all topics published (those published during time of compression as well).

Actual Behavior

During the compression of the just closed bag (due to max-bag-size) there are no topics recorded in the new bag.

To Reproduce

Start a system with big data occurring (in my case a camera (1280x720 @ 10fps) and motion data from a Bluetooth device)
Record raw image topics and motion data with: ros2 bag record --max-bag-size=2000000000 --compression-mode file --compression-format zstd
Replay the recorded bag

System

OS: Ubuntu 18.04
ROS 2 Distro: Foxy (built from source)
Version: ros2

Additional Information

When recording bags without compression this issue does not occur.

Suspicion

Is it possible that the compression of the bag is run in a new thread and not a process? Because, as you probably know, in python a new thread cannot be distributed to a different CPU core only a process can. If recording and compressing are two threads running in the same process could it be that the compression thread blocks the recording thread by using all resources of this one CPU core?

Thank you!

Possibly related to #973

The text was updated successfully, but these errors were encountered:

emersonknapp · 2022-03-28T23:08:59Z

For context, the threading logic all happens in a C++ layer - the Python CLI is only a thin wrapper around calling the C++ core.

You mention you're building from source, are you using the foxy branch, or the foxy-future branch? I might recommend foxy-future as it has many performance improvements and bugfixes that could not be released officially into Foxy due to API breakage

MichaelOrlov · 2022-03-29T09:17:55Z

Can be related to the #936, #866 and #647

chrmel · 2022-03-30T05:59:10Z

> For context, the threading logic all happens in a C++ layer - the Python CLI is only a thin wrapper around calling the C++ core.

Thank you for clarification about Python CLI wrapping C++ logic. I did not know how this worked.

> You mention you're building from source, are you using the foxy branch, or the foxy-future branch? I might recommend foxy-future as it has many performance improvements and bugfixes that could not be released officially into Foxy due to API breakage

Actually I am not quite sure which branch. My work is greatly based on the ros.foxy.Dockerfile from dusty-nv/jetson-containers which uses rosinstall_generator --deps --rosdistro foxy ros_base ... to fetch the repos.

Update: So I am in fact using the foxy branch. Tried building the foxy-future branch but rosinstall_generator does only allow the base branch names (foxy, galactic, ...).

amacneil · 2022-05-01T17:55:34Z

This is a very concerning bug - data loss is a worst case scenario for a data recording tool. Has anyone tried to repro/investigate it in galactic/humble/rolling?

clalancette · 2022-05-02T12:35:35Z

This is a very concerning bug - data loss is a worst case scenario for a data recording tool. Has anyone tried to repro/investigate it in galactic/humble/rolling?

I haven't investigated it myself, but looking at the Dockerfile in use, the original reporter is likely using the foxy branch of this repository. That branch has known performance and dataloss issues, which is why we recommend the foxy-future branch there. It would be interesting to see if the original problem can be reproduced with the foxy-future branch, which is much closer to what is in Galactic.

amacneil · 2022-05-03T03:44:33Z

Good point.

@chrmel have you tested this in Galactic?

chrmel · 2022-05-03T05:47:06Z

@amacneil I tried building from source with branch galactic but it did not succeed. I can try test it with the pre-built packages.

chrmel · 2022-05-03T10:06:33Z

@amacneil, @clalancette, @emersonknapp

Ok, I tested my setup with the pre-built debian packages for distros foxy and galactic.

Testing with sample data

Every step in the data represents the time a new split bag file beeing created.

foxy

Recording data with ros2 bag record --max-bag-size=500000000 --compression-mode file --compression-format zstd /image_raw /image_raw/compressed /motion.

Same issue as described when built from source. Everytime a new split bag file starts there is a data gap.

Play bag with ros2 bag play --topics=/motion --rate=1.0 rosbag2_foxy/

galactic and rolling

Recording data with ros2 bag record --max-bag-size=500000000 /image_raw /image_raw/compressed /motion.

I was not able to reproduce the problem BECAUSE a different problem occurred.
It seems that split bag files cannot be properly played in galactic (with and without compression). When playing the bags only the last file of the sequence of split files is played properly. Preceding files seem to be either skipped or the data beeing squashed at the beginning of the last split bag file.

Play bag with ros2 bag play --read-ahead-queue-size 50000 --topics=/motion --rate=1.0 rosbag2/. Only the data from the last split bag file is played (you can see the last distinctive ripple in the data).

MichaelOrlov · 2022-05-06T16:34:59Z

Issue related to the playing only last bag from split was found before and described here #966

MichaelOrlov · 2023-12-14T05:31:29Z

Notes:

With SQLite3 there is a current design limitation and suboptimal performance when using --max-bag-size differs from 0
See Improving performance of should_split_bagfile() #647 (comment). Recommended to either use MCAP file format or not use --max-bag-size parameter with the SQlite3 backend for better performance.
Yes. There is a known issue that compression threads may consume all CPU resources and recording threads will be in starvation which will lead to the messages being lost. The solution will be done in the Add option to set compression threads priority #1457. However, there are no CLI parameter yet for the compression_thread_priority option. Only will be available via node parameters for the composable node. The follow-up PR with adding CLI parameter is welcome.

Closing this issue as stale and since there are workarounds already exists.

chrmel added the bug Something isn't working label Mar 25, 2022

MichaelOrlov closed this as completed Dec 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data loss when using compression (ros2 bag record) #978

Data loss when using compression (ros2 bag record) #978

chrmel commented Mar 25, 2022 •

edited

Loading

emersonknapp commented Mar 28, 2022

MichaelOrlov commented Mar 29, 2022

chrmel commented Mar 30, 2022 •

edited

Loading

amacneil commented May 1, 2022

clalancette commented May 2, 2022

amacneil commented May 3, 2022

chrmel commented May 3, 2022

chrmel commented May 3, 2022 •

edited

Loading

MichaelOrlov commented May 6, 2022

MichaelOrlov commented Dec 14, 2023

Data loss when using compression (ros2 bag record) #978

Data loss when using compression (ros2 bag record) #978

Comments

chrmel commented Mar 25, 2022 • edited Loading

Description

Expected Behavior

Actual Behavior

To Reproduce

System

Additional Information

Suspicion

emersonknapp commented Mar 28, 2022

MichaelOrlov commented Mar 29, 2022

chrmel commented Mar 30, 2022 • edited Loading

amacneil commented May 1, 2022

clalancette commented May 2, 2022

amacneil commented May 3, 2022

chrmel commented May 3, 2022

chrmel commented May 3, 2022 • edited Loading

Testing with sample data

foxy

galactic and rolling

MichaelOrlov commented May 6, 2022

MichaelOrlov commented Dec 14, 2023

chrmel commented Mar 25, 2022 •

edited

Loading

chrmel commented Mar 30, 2022 •

edited

Loading

chrmel commented May 3, 2022 •

edited

Loading