-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bees seemingly cannot catch up with snapper snapshot creation #268
Comments
The
It isn't reliable. Snapshot A to B, snapshot B to C, then delete subvol B, and now we can't find the correct relationship between A and C; however, C will still have a parent, just the wrong one.
That's precisely what mode 3 does: it starts from the newest subvols and moves backward to older ones.
Scan mode 1 (equal relative progress on all subvols) might be quantitatively better, but not qualitatively. Subvol-based scanning can't keep up with snapshots at scale. The existing (and still technically experimental) scan modes are all bad in this case. There is nothing that can be done to the subvol scanners that get them anywhere close to the performance of an extent-based scanner in the presence of new snapshots. An extent-based scanner avoids this problem because it doesn't store any information about subvols. When a new snapshot appears or disappears, the extent scanner can adapt on the fly without repeating or wasting any work. It doesn't have to start over from the lowest |
Thanks for the explanation, I'll try that. And thanks for all the work that goes into bees. Every little step counts. :-) |
Okay, I stopped bees, opened :%s/min_transid \(\d\+\) max_transid \(\d\+\)/min_transid \2 max_transid \2/g (after reading your reply again, this is probably not exactly what wanted me to do because I should've used Seems to be much calmer now. But it looks like it still scans the snapshots except now I can clearly see they are scanned in reverse order. Is this probably scanning the last transid because it compares At least it looks like it walks each snapshot in reverse order once, complaining about a few toxic extents, then once in a while steps back to current snapshots and walking its way to the older ones again. In this loop, it logs the same files over and over again (relative to the snapshot root). I would have expected that with Maybe Another observation: Running According to logs, bees worked about half way to the oldest snapshot now. I'll watch it to see how it behaves when it reached the oldest snapshots. |
Looks like it completed a cycle. It then started (according to toxic logs) to read huge amounts of data from my rw subvols (I installed some updates, downloaded some stuff), and now it repeats the initial cycle scanning all the snapshots from youngest to oldest again, logging toxic extents again. But it now does this at a much higher pace at least. But still, this looks like an endless loop, or feedback loop. |
There will be some feedback as splitting an extent in one subvol modifies all subvols that reference it. Hopefully it settles down, though it might take a while to get there.
They would still have to be scanned. In the absolute minimum case, bees will search them, find no new extents, and update the Any new extents would mostly be due to bees cleaning up any duplicate extents it created when splitting an extent somewhere.
That's part of what scan mode 3 does: it scans subvols that were most recently completed, so that new data is deduped immediately. This helps a little when newly created files tend to be copies of other slightly less newly created files. When it has caught up with new data, it moves back to old subvols with older data, until some more new data appears. So it's generally LIFO. The other scan modes are generally FIFO: even if the new data deduplicates well, it won't touch new data at all until everything else is fully scanned. |
Okay, then extent splitting probably explains what I am seeing. Also, it now reads at varying throughput from 5 MB/s to 800 MB/s, so it actually looks like it is making progress without stalling the system all the time. Fixing up the |
Maybe degrade these (and similar) from warning to notice (or info) because there's an active workaround in bees? BEESLOGWARN("WORKAROUND: abandoned toxic match for hash " << hash << " addr " << found_addr << " matching bbd " << bbd);
BEESLOGWARN("WORKAROUND: discovered toxic match at found_addr " << found_addr << " matching bbd " << bbd); |
@Zygo, thank you. All looks good now. Looks like all the split extents have been cleaned up. |
I ran into this and paused timed snapshots, the scan isn't done. I'm a little new to C++, but I could probably hack on this a little if it could be possible to detect when a tree has already been scanned.... or something. Edit: |
I ran into this again because my system froze, and after a reset, xfs cleared the contents of a partially written Not only is this especially harmful for performance because bees encounters toxic extents over and over again, it also vastly multiplies the effect of #260 which hurts performance even more. While bees does its job, my system currently goes down to less than 2 GB cached with 18-24 GB completely free memory, and 2-4 GB pushed to swap instead. Every now and then, cache rapidly increases to 18 GB but then suddenly falls again with swap activity. While this happens, the systems feels like running from swap (and it probably does). @Zygo Can we do something about losing Also, we should look into #260 and find out why btrfs is causing that high memory fragmentation. I feel like this has become worse and worse with each LTS kernel version. It's probably less an issue within bees but rather a side effect of btrfs doing some very unfavorable memory management in the kernel. I'll post there, too. Edit: Due to #260, catching up with snapshots after such an incident is slowed down a lot... |
Which version of kernel are you using? One of my system had low cached memory in the past, but it's much better with 6.8.2. |
@kakra No. I didn't even relate this to bees. It's what I have observed but was confused for long. It didn't happen for older kernels (but I don't have the logs to determine which. Cached drop began around last November but the usage graph became flat since this January and ended with a reboot to 6.8.2). |
For many years it was the opposite--create/fsync/rename would trigger bugs in btrfs (from hangs to metadata corruption), but be harmless on ext4 (see the Those bugs are recently fixed on btrfs now (since 5.16), so it might be worth putting the |
A lot of software uses that pattern without issues. Maybe that has been a problem only when running concurrently with all the btrfs lookups that bees does? Clearly, btrfs does not need that. So maybe enable that only if the statfs for btrfs root (as used by bees) and for those data files mismatches? |
Can bees handle snapper timeline creation events as a special case? |
Are you still working on the extend based scanner? I'm asking because I can't see any related branches or commits in this repo do you just not push major changes like this to the repo until they are ready? |
I wanted ppl to find/see the workaround I used: #268 (comment) |
Hey @Zygo
Here's the situation:
I'm now seeing high IO from bees for the last 14 days at least. Looking at the system journal, it spools a LOT of messages to the journal. This rotates my systemd journals like every hour.
I then inspected beescrawl.dat and it looks like bees cannot catch up with all the snapper snapshots. It has a lot of snapshots still at zero progress:
Also, the new snapshots don't exactly look like they get any reasonable progress:
I'm not exactly sure what each of the numbers mean but they are generally low on the left side.
bees uses scanmode 3 and loadavg limit of 5.
Also, almost every line in the journal complains about toxic extents which means IO does have a lot of lag spikes. There are thousands over thousands of such messages per hour. The system runs everything else but smooth. Even systemd-journal sometimes restarts due to watchdog timeouts. I see the system stalling for 1-2 minutes multiple times per day with btrfs transactions blocking for more than 120s messages in dmesg (btrfs metadata is on two dedicated nvme drives in mraid1 mode, btrfs data spans multiple spinning rust in raid1 mode with bcache).
This was different before the freeze when beescrawl.dat was still in good shape.
I believe one of the problems is snapper duplicating all the toxic extents, and because bees is still catching up, it reads them over and over again.
Could bees be improved to better catch up in those situations? Maybe by treating child/parent snapshot generations as already scanned (or something similar)? Or prefer scanning newer snapshots before older ones so those older ones would eventually rotate out of snapper? Maybe a different scan mode could work better?
I'd rather not disable snapper. It has proven quite useful in the past.
I could remove the loadavg limiter but that would probably only make the situation worse for actually using the system.
I could defrag the files having toxic extents but only in the rw subvols. This would not fix bees still reading the ro snapshots. Also, it probably reduces deduplication ratio a lot because old hashes are already rotated out of the hash file and I'm seeing this on thousands of big files (the system has a rather big Steam library).
In the past, without any pre-existing snapper snapshots, bees would usually catch up within a few days.
beesstats.txt
beescrawl.txt
The text was updated successfully, but these errors were encountered: