Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for pre-init recovery and handling of gaps in mem tables due to snapshotting #456

Merged
merged 1 commit into from
Jul 19, 2024

Conversation

kjnilsson
Copy link
Contributor

@kjnilsson kjnilsson commented Jul 18, 2024

The first thing a ra system does when it starts is run a pre-init
phase for each registered Ra server, mostly to recover the
ra_log_snapshot_state table. This appears to have been broken
for along time and the ra_log_snapshot_table has not been populated
ahead of WAL / segment writer recovery. This was fine as this bit
was just an optimisation and never affected the workings of
the Ra infrastructure.

However since 5b7a265 it needs
this in order to avoid the segment writer crashing when it detects
a gap (caused by the WAL dropping entries lower than the current
snapshot).

This commit mostly fixes the pre-init process but also addresses
a potential race condition which still could cause the segment
writer to crash for the same reason.

See: rabbitmq/rabbitmq-server#11712

@kjnilsson kjnilsson changed the title fixes fixed to pre-init recovery and handling of gaps in mem tables due to snapshotting Jul 18, 2024
@kjnilsson kjnilsson changed the title fixed to pre-init recovery and handling of gaps in mem tables due to snapshotting Fixes for pre-init recovery and handling of gaps in mem tables due to snapshotting Jul 18, 2024
@the-mikedavis
Copy link
Member

tiny typo in the commit message & description: apperas => appears

The first thing a ra system does when it starts is run a pre-init
phase for each registered Ra server, mostly to recover the
ra_log_snapshot_state table. This appears to have been broken
for along time and the ra_log_snapshot_table has not been populated
ahead of WAL / segment writer recovery. This was fine as this bit
was just an optimisation and never affected the workings of
the Ra infrastructure.

However since 5b7a265 it needs
this in order to avoid the segment writer crashing when it detects
a gap (caused by the WAL dropping entries lower than the current
snapshot).

This commit mostly fixes the pre-init process but also addresses
a potential race condition which still could cause the segment
writer to crash for the same reason.

maybe de-flake
@kjnilsson kjnilsson marked this pull request as ready for review July 19, 2024 10:03
@kjnilsson
Copy link
Contributor Author

tiny typo in the commit message & description: apperas => appears

fixed

@kjnilsson kjnilsson merged commit 94cb3d2 into main Jul 19, 2024
9 checks passed
@michaelklishin michaelklishin added this to the 2.13.2 milestone Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants