Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crash when releasing all blocks #181

Closed

Conversation

moroten
Copy link
Contributor

@moroten moroten commented Nov 9, 2023

If data corruption was found in the newest block, all blocks would be released and trigger a panic when no block was allocated for writing. See the stack trace below where lbm.newBlocks=0 and i=0 leads to lbm.allocationAttemptsRemaining = 1 << -1
which panics.

2023/11/08 14:33:20 rpc error: code = Internal desc = Releasing 1 blocks due to a data integrity error
panic: runtime error: negative shift amount
goroutine 15748 [running]:
github.com/buildbarn/bb-storage/pkg/blobstore/local.(*OldCurrentNewLocationBlobMap).startAllocatingFromBlock(...)
	pkg/blobstore/local/old_current_new_location_blob_map.go:259
github.com/buildbarn/bb-storage/pkg/blobstore/local.(*OldCurrentNewLocationBlobMap).findBlockWithSpace(0xc00025e140, 0x232)
	pkg/blobstore/local/old_current_new_location_blob_map.go:302 +0x66a
github.com/buildbarn/bb-storage/pkg/blobstore/local.(*OldCurrentNewLocationBlobMap).Put(0xc00025e140, 0x232)
	pkg/blobstore/local/old_current_new_location_blob_map.go:363 +0x27

If data corruption was found in the newest block, all blocks would be
released and trigger a panic when no block was allocated for writing.
See the stack trace below where lbm.newBlocks=0 and i=0 leads to
lbm.allocationAttemptsRemaining = 1 << -1
which panics.

2023/11/08 14:33:20 rpc error: code = Internal desc = Releasing 1 blocks due to a data integrity error
panic: runtime error: negative shift amount
goroutine 15748 [running]:
github.com/buildbarn/bb-storage/pkg/blobstore/local.(*OldCurrentNewLocationBlobMap).startAllocatingFromBlock(...)
	pkg/blobstore/local/old_current_new_location_blob_map.go:259
github.com/buildbarn/bb-storage/pkg/blobstore/local.(*OldCurrentNewLocationBlobMap).findBlockWithSpace(0xc00025e140, 0x232)
	pkg/blobstore/local/old_current_new_location_blob_map.go:302 +0x66a
github.com/buildbarn/bb-storage/pkg/blobstore/local.(*OldCurrentNewLocationBlobMap).Put(0xc00025e140, 0x232)
	pkg/blobstore/local/old_current_new_location_blob_map.go:363 +0x27
@EdSchouten
Copy link
Member

Thanks for the fix, @moroten. Really appreciated! The test case you provided is great.

With regards to the change to the implementation, I took a slightly different route. The problem with your change is that if the call to PushBack() on line 311 fails, we never end up calling startAllocatingFromBlock(). This leaves OldCurrentNewLocationBlobMap in an inconsistent state. We need to account for the case where OldCurrentNewLocationBlobMap needs to be left behind in a state where the underlying BlockList has zero blocks.

We already account for this in NewOldCurrentNewLocationBlobMap() by setting lbm.allocationBlockIndex = -1 and lbm.allocationAttemptsRemaining = 0. This effectively delays the call to startAllocatingFromBlock(0) to the very bottom of findBlockWithSpace(). I have factored this out into a new helper method resetAllocationBlockIndex(), which we now call in all places where we need to rewind.

@moroten
Copy link
Contributor Author

moroten commented Nov 11, 2023

Setting lbm.allocationAttemptsRemaining = 0 is a good trigger for the reset. Nice!

@moroten moroten deleted the fix-block-all-release-panic branch November 11, 2023 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants