-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Null terminated conversion #9
Comments
Thanks for the bug report! First, regarding the overflow situation: in theory, a directory entry could be made up completely of 3-byte characters, so twice the length of the input might not be enough anyway 🤔 Regarding the portion after the null terminator, it contains undefined characters ( |
Yes, you're absolutely right that my output buffer should be at least three times the input! Maybe something to be added to the method documentation since it's not common knowledge and a potential source of buffer overruns problems, which was my case. |
I've created a PR which updates the documentation of As for providing a convenience function which converts an array of |
When using the decode method, I'm passing a UCS-2 encoded buffer that contains a 0x0000 (u16) indicating the end of the string. The buffer is fixed size, but whatever happens after the 0x0000 is undefined, and is normally 0xFFFF. This results in the 0xFFFF getting decoded into a 3 bytes character, which eventually leads to an overflow in the output buffer (output buffer is an u8 array of twice the length of the input. For context, this scenario happens when reading the directory entries in a FAT16 filesystem.
To mitigate this I've had to write code to check if there's a null byte in the input buffer and if so, pass to decode only the slice that doesn't contain it. I'm not entirely unhappy with this solution but it took me many many hours of debugging and eventually stepping into the library code to find out what the problem was, so I thought it'd be nice for the library to either take this into account or provide another method that does. Something like decode_to_end_of_string or something like that. The result count of bytes would be the numbers of bytes converted till the null byte. Actually, I would do this the default and have the option of go past the null bites to decode the buffer in full, but I don't know enough about UCS-2 or UTF-8 to know which one should be the default behaviour.
I'm happy to write a pull request moving the code I wrote and a test if that helps.
Thanks for your crate!
The text was updated successfully, but these errors were encountered: