use parent process to write db and child to detect deadlock instead o… #1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When trying to run this in cactus, I keep getting deadlocks whenever it tries to write a snapshot.
The problem seems to be the function
TimedDB::dump_snapshot_atomic
making a fork() (after checking the db type, StashDB in this case is "forkable"). The child process goes on to write the snapshot, and the parent hangs around waiting for it to complete. If the child crashes or hangs, the parent logs an error and returns false.But I think there's a bug somewhere where where the parent thread isn't locking everything it needs to, in which case fork's results are undefined. So when the child goes to lock a mutex it waits forever. Rather than spend all year looking for the root problem, I'm just flipping the roles here: Have the parent write the db and the child hang around waiting for a deadlock. This seems to fix the issue I'm seeing. Side effects will be
I think that all this is fine for the purposes of cactus, but will try on a big example before merging this to master here and and cactus.