You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unfortunately still having big issues on every resync which (almost always) lead to corruption in the filesystem. Is there a way to mark a volume as needing an e2fsck before it's next mount ?
The issue is it's always in use, so I can't manually do it. In this specific case it's in use by cloudnative-pg which does not use deployments, so I can't even force it down to 0 replicas to free it up.
Is there a trick to automatically run it, or maybe just to prevent the pvc from being mounted to give me a chance to manually do it ?
Thanks
The text was updated successfully, but these errors were encountered:
My underlying problem is still #579 for which I have no clue. I physically replaced the node that was crashing, so now all three of them do stay up all the time but they blink in an out constantly when a resync is needed. No packet loss but the response times are all over the place with sometimes pauses of a few seconds for some of the nodes.
It still looks to me like the actual sync is fine, once it's going, even if all the volumes are syncing at the same time. The etcd leader keeps changing during the issue, so it's clear the whole node(s) freezes for long enough to trigger an election.
The issue appears during the bitmap calculation (or maybe another step around then that I can't see), with other volumes randomly going through various states like disconnected, unconnected, broken pipe .. and then re-trying. I usually have to disconnect all the volumes myself, and connect them one by one. Once a volume is actually syncing I can move on to the next one, the actual sync is stable and works fine.
All the while fighting the operator that doesn't like having disconnected nodes, but every time it tries to re-connect them all at the same time it causes the whole thing to explode again, so I have to disconnect them and re-connect them one by one quickly enough to be finished before the operator notices.
Because of all those disconnects every time, the quorum keeps getting lost and I often end up in weird states where nodes don't agree on who's UpToDate / Inconsistent. Disconnecting / re-connecting them one by one sometimes clears it, but I also sometimes have to manually pick one. The EXT4 on the volumes themselves almost always ends up with some amount of errors, which is easy enough to fix if the pods can't be scheduled, but once they're running it's a lot trickier to do.
Hi,
Unfortunately still having big issues on every resync which (almost always) lead to corruption in the filesystem. Is there a way to mark a volume as needing an e2fsck before it's next mount ?
The issue is it's always in use, so I can't manually do it. In this specific case it's in use by cloudnative-pg which does not use deployments, so I can't even force it down to 0 replicas to free it up.
Is there a trick to automatically run it, or maybe just to prevent the pvc from being mounted to give me a chance to manually do it ?
Thanks
The text was updated successfully, but these errors were encountered: