Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignoring read errors in alloc_cluster() can cause an "almost infinite" loop with broken SD cards #15

Open
FreddieChopin opened this issue Apr 14, 2023 · 0 comments · May be fixed by #16

Comments

@FreddieChopin
Copy link
Contributor

In our application we are using your ufat library (via my RTOS). In one of the devices the SD card got damaged - it seems that it is able to correctly read (maybe also write) about 600-700 initial blocks, but for further blocks reads (possibly also writes) cause the SD card to fail with a timeout. This is the behaviour observed on the MCU, on a PC the card just fails to mount and trying to dump it allows you to read just the first ~700 blocks and about 350 kB of data, then it throws an I/O error each block.

The problem now seems to be that if the FAT file system mounts properly (and it does, because the initial part of card is fine), then trying to do some operation - in our case creating a folder - will result in a practically infinite loop due to ignored read errors in alloc_cluster(), exactly here:

ufat/ufat.c

Line 652 in e2a7466

if (!ufat_read_fat(uf, idx, &c) && c == UFAT_CLUSTER_FREE) {

This loop will try to iterate all existing clusters (in case of our 16 GB card - about 2 million...) to find a free one, ignoring read errors. Given that the mode of error of the SD card is "timeout", then each read attempt lasts about 100 ms, so it will take "forever" for the loop to end due to checking all the clusters.

Is ignoring read errors here intentional (e.g. to ignore single broken clusters, without completely failing the top-level operation) or just an omission? I'm not sure how to understand that part of code, so I'm starting with a question before potentially providing a fix.

FreddieChopin added a commit to FreddieChopin/ufat that referenced this issue Apr 25, 2023
alloc_cluster() just ignores read errors, trying next cluster until it
either succeeds (finds an empty one) or runs out of clusters (after
checking all of them). A large volume may have quite a lot of clusters -
eg. a 16 GB SD card with a standard format has about 2 million clusters.
When a "persistent" read error happens during the alloc_cluster() (after
a successful mount operation) - for example a volume is physically
disconnected in a very inconvenient moment or the volume is/gets damaged
and all further reads fail - then this loop becomes practically
infinite. In one application we found a damaged SD card, for which reads
of first 600-700 blocks work perfectly fine, but any read beyond that
results in a SDIO interface timing-out (the card will not switch to
expected state within specified time). As the timeout for the operation
is ~100 ms, then the function would loop for over 2 days. The same card
just fails to work in a PC, where any read beyond first ~350 kB (which
is about 700 blocks) fails with an I/O error.

Fix this by returning from alloc_cluster() with an error when any read
operation fails.

Fixes dlbeer#15
FreddieChopin pushed a commit to DISTORTEC/distortos that referenced this issue Apr 25, 2023
Updated version fixes a potential "infinite" loop when dealing with
damaged devices (dlbeer/ufat#15) and also has
some other fixes (dlbeer/ufat#12 &
dlbeer/ufat#13).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant