Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved handling of Authors and Series with names containing non-ASCII characters #3414

Merged
merged 2 commits into from
Sep 13, 2024

Conversation

thatguy7
Copy link
Contributor

This PR fixes: #2541

In summary: SQLite does not support the LOWER/UPPER function (unless it is compiled to do so, which it usually is not).

When checking if a Series/Author is already in the DB, these functions are used. Any name containing a capital non-ASCII character would always test as false and thus a new entry in the DB was created every time.

This checks is the name contains a non-ASCII character. If it does, the lower function is not used and the issue is avoided.

Should we not use the lower function at all to avoid this issue?

@nichwall
Copy link
Contributor

Maybe we can use the asciionlytolowercase function? Not sure if it's still in the code base.

b447cf5

@thatguy7
Copy link
Contributor Author

The function is still in the code base and solves my issue just as well.

In your comment on my issue I set out to fix, you mentioned #2678.

It seems like the goal is to get SQLite to properly handle non-ASCII characters. If done, asciiOnlyToLowercase will again not match what SQLite's lower function does. This fix will later become the cause of the issue it fixes now.

Thus again, the question: Should we here perhaps not try to do anything fancy and just match char for char. Then it's up to the user to properly set the value of the tag.

As it is, If I mistype the author to e.g. "sTephen King", all subsequent books by the author will be tagged with the typo, and there is no way to fix it. If I edit the name to the correct spelling, it will still use the first one it saw.

@advplyr
Copy link
Owner

advplyr commented Sep 13, 2024

As it is, If I mistype the author to e.g. "sTephen King", all subsequent books by the author will be tagged with the typo, and there is no way to fix it. If I edit the name to the correct spelling, it will still use the first one it saw.

It should be fixable by editing the author on the authors page. That issue would be there regardless of if we get proper unicode support.

Before Sqlite it used to also strip punctuation. I believe there are some issues currently open to go even further in trying to match with existing authors by removing middle initials, stripping accents, checking against nicknames, and removing prefixes/suffixes.

I'm not sure what the best solution is but this is the right temporary fix I think

@advplyr
Copy link
Owner

advplyr commented Sep 13, 2024

Thanks!

@advplyr advplyr merged commit 16ba6b5 into advplyr:master Sep 13, 2024
5 checks passed
@thatguy7
Copy link
Contributor Author

thatguy7 commented Oct 6, 2024

PR #3468 already breaks this temporary fix.

I checked with the most recent state of master, and the fix now causes the issue it was set out to fix.

@advplyr @nichwall, what is the protocol to undo the changes of this PR?

@nichwall
Copy link
Contributor

nichwall commented Oct 6, 2024

Probably open a new PR with whatever fix is needed. Having an issue to go with the PR is usually a good idea, but if it's just a PR to fix issues from other PRs you can just open the PR and include details about what was broken.

Are you meaning just get rid of asciionlytolowercase?

@thatguy7
Copy link
Contributor Author

thatguy7 commented Oct 6, 2024

Are you meaning just get rid of asciionlytolowercase?

Exactly. With #3468, the behaviour of SQLite's lower function was modified to behave more (or perhaps exactly) like String.toLowerCase(). So asciionlytolowercase now breaks things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Author/Series with non-ASCII names are listed once for every book when updated
3 participants