-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Used size increased by ~400GB while defragmenting #266
Comments
Update: |
I'm not sure if the DB overfilling is really such a big issue. In the end, it's okay to push out older hashes and keep the hashes for big blocks, and you don't want to have too many shared extents per hash anyways. Thus you probably don't want to keep hashes for small blocks because that's like taking 99% time for 1% space savings. Also, the problem with multiple threads is rather lock contention in btrfs. But I'm not sure if bees does some seek optimizing by re-ordering queued jobs, so seeking may be an issue, too. What you observe for space is a documented behavior of bees, especially when coming from other dedup programs: Before freeing space, used space fills up or free space stops growing until the effort of bees finally resolves into freeing all the extents with the final snapshot sharing it. |
I had half a terabyte left on my 4TB HDD and wanted to dedupe it to increase available size. After running bees for over 36 hours,
btrfs filesystem usage -h /hdd
reportsFree (estimated): 161.13GiB
. Bees is still buzzing along and my free space has stopped shrinking around this point. Also I have to mention that the 4GB hash table started overfilling and i had to restart beesd with 8GB db size in config.Another thing is that bees seem to spam the
2023-09-05 02:11:33 513194.513219<7> crawl_5_680152: exception (ignored): exception type std::runtime_error: FIXME: too many duplicate candidates, bailing out here
thing, sometimes for 15 seconds straight. Is this bad?
The text was updated successfully, but these errors were encountered: