Improved handling of Authors and Series with names containing non-ASCII characters #3414

thatguy7 · 2024-09-13T16:36:45Z

This PR fixes: #2541

In summary: SQLite does not support the LOWER/UPPER function (unless it is compiled to do so, which it usually is not).

When checking if a Series/Author is already in the DB, these functions are used. Any name containing a capital non-ASCII character would always test as false and thus a new entry in the DB was created every time.

This checks is the name contains a non-ASCII character. If it does, the lower function is not used and the issue is avoided.

Should we not use the lower function at all to avoid this issue?

…match only for ASCII names

nichwall · 2024-09-13T16:52:18Z

Maybe we can use the asciionlytolowercase function? Not sure if it's still in the code base.

b447cf5

thatguy7 · 2024-09-13T17:34:37Z

The function is still in the code base and solves my issue just as well.

In your comment on my issue I set out to fix, you mentioned #2678.

It seems like the goal is to get SQLite to properly handle non-ASCII characters. If done, asciiOnlyToLowercase will again not match what SQLite's lower function does. This fix will later become the cause of the issue it fixes now.

Thus again, the question: Should we here perhaps not try to do anything fancy and just match char for char. Then it's up to the user to properly set the value of the tag.

As it is, If I mistype the author to e.g. "sTephen King", all subsequent books by the author will be tagged with the typo, and there is no way to fix it. If I edit the name to the correct spelling, it will still use the first one it saw.

advplyr · 2024-09-13T17:55:09Z

As it is, If I mistype the author to e.g. "sTephen King", all subsequent books by the author will be tagged with the typo, and there is no way to fix it. If I edit the name to the correct spelling, it will still use the first one it saw.

It should be fixable by editing the author on the authors page. That issue would be there regardless of if we get proper unicode support.

Before Sqlite it used to also strip punctuation. I believe there are some issues currently open to go even further in trying to match with existing authors by removing middle initials, stripping accents, checking against nicknames, and removing prefixes/suffixes.

I'm not sure what the best solution is but this is the right temporary fix I think

advplyr · 2024-09-13T17:58:59Z

Thanks!

thatguy7 · 2024-10-06T08:19:49Z

PR #3468 already breaks this temporary fix.

I checked with the most recent state of master, and the fix now causes the issue it was set out to fix.

@advplyr @nichwall, what is the protocol to undo the changes of this PR?

nichwall · 2024-10-06T13:29:49Z

Probably open a new PR with whatever fix is needed. Having an issue to go with the PR is usually a good idea, but if it's just a PR to fix issues from other PRs you can just open the PR and include details about what was broken.

Are you meaning just get rid of asciionlytolowercase?

thatguy7 · 2024-10-06T14:43:48Z

Are you meaning just get rid of asciionlytolowercase?

Exactly. With #3468, the behaviour of SQLite's lower function was modified to behave more (or perhaps exactly) like String.toLowerCase(). So asciionlytolowercase now breaks things.

when checking if series/author is alread in DB, use case insensitive …

def34a8

…match only for ASCII names

use asciiOnlyToLowerCase to match lower function behaviour of SQLite

0af29a3

advplyr merged commit 16ba6b5 into advplyr:master Sep 13, 2024
5 checks passed

thatguy7 mentioned this pull request Oct 8, 2024

retire unicode handling workaround for Author and Series title #3491

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved handling of Authors and Series with names containing non-ASCII characters #3414

Improved handling of Authors and Series with names containing non-ASCII characters #3414

thatguy7 commented Sep 13, 2024

nichwall commented Sep 13, 2024

thatguy7 commented Sep 13, 2024

advplyr commented Sep 13, 2024

advplyr commented Sep 13, 2024

thatguy7 commented Oct 6, 2024

nichwall commented Oct 6, 2024

thatguy7 commented Oct 6, 2024

Improved handling of Authors and Series with names containing non-ASCII characters #3414

Improved handling of Authors and Series with names containing non-ASCII characters #3414

Conversation

thatguy7 commented Sep 13, 2024

nichwall commented Sep 13, 2024

thatguy7 commented Sep 13, 2024

advplyr commented Sep 13, 2024

advplyr commented Sep 13, 2024

thatguy7 commented Oct 6, 2024

nichwall commented Oct 6, 2024

thatguy7 commented Oct 6, 2024