Support for non-ASCII Characters in fluent_text Validator #47

khashashin · 2023-09-18T15:46:10Z

Hello,

I have been using the ckanext-fluent extension to facilitate multilingual inputs in my CKAN instance. During the usage, I encountered an issue where non-ASCII characters (like "ä", "ö", "ü", etc.) are being stored as Unicode escaped strings in the database. This is happening because the json.dumps method in the fluent_text validator is encoding these characters to their Unicode escape sequences.

For instance, a text like:

Stromtarif Tarifanteil KEV Standardprodukt gemäss ElCom pro Kategorie

is being stored in the database as:

Stromtarif Tarifanteil KEV Standardprodukt gem\u00e4ss ElCom pro Kategorie

Currently, the relevant part of the code in the fluent_text validator looks like this:

data[key] = json.dumps(value)

and

data[key] = json.dumps(output)

This issue not only affects the way data is stored but also adversely impacts the search functionality in CKAN, as the SOLR search engine fails to match these Unicode escaped sequences with the actual characters in search queries.

To resolve this, I propose updating the above lines to:

data[key] = json.dumps(value, ensure_ascii=False)

and

data[key] = json.dumps(output, ensure_ascii=False)

This modification will ensure that non-ASCII characters are stored as they are, without being converted to their Unicode escape sequences, thus preserving the original characters and facilitating accurate search results.

Moreover, I noticed that other extensions use a validator called "unicode_safe" to handle non-ASCII characters gracefully. I tried using this validator but it seems that the fluent_text validator does not recognize it. Therefore, it would be greatly beneficial if the fluent_text validator could be updated to integrate or recognize the "unicode_safe" validator to allow for the proper handling of non-ASCII characters.

I look forward to hearing your thoughts on this and would greatly appreciate any guidance or support in this regard.

Thank you.

The text was updated successfully, but these errors were encountered:

wardi · 2023-09-18T19:45:26Z

Sure I'd accept a PR that makes these changes.

khashashin · 2024-04-25T11:53:51Z

fixed in #50

khashashin mentioned this issue Mar 7, 2024

Allow json to store the non ascii text in the database #50

Merged

khashashin closed this as completed Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for non-ASCII Characters in fluent_text Validator #47

Support for non-ASCII Characters in fluent_text Validator #47

khashashin commented Sep 18, 2023 •

edited

Loading

wardi commented Sep 18, 2023

khashashin commented Apr 25, 2024

Support for non-ASCII Characters in fluent_text Validator #47

Support for non-ASCII Characters in fluent_text Validator #47

Comments

khashashin commented Sep 18, 2023 • edited Loading

wardi commented Sep 18, 2023

khashashin commented Apr 25, 2024

khashashin commented Sep 18, 2023 •

edited

Loading