Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve use of apostrophe, single quote, and other similar letter-likes #47

Closed
kontur opened this issue Apr 9, 2021 · 2 comments
Closed
Labels

Comments

@kontur
Copy link
Contributor

kontur commented Apr 9, 2021

I think one issue throughout the data is that different sources and data input has handled orthographies with an "apostrophe"-looking character.

The data might have:

  • ' (U+0027) Apostrophe
  • (U+2019) Right single quotation mark
  • ʼ (U+02BC) Modifier letter apostrophe

...and probably more (even combining marks).

While there might be proper canonical information for some orthographies, it seems to me that this is most likely arbitrary based on sources and data input and should probably be canonized or disambiguated in some way. E.g. we might want to unify how such characters are input in the data, and we might want to disambiguate this character so that several "possible alternatives" satisfy a language check.

It is further questionable if those should be treated as character, or if those would make a good case for required punctuation, or what indeed their role is.

Ping @MrBrezina

@kontur kontur added enhancement New feature or request needs more information labels Apr 9, 2021
@meehkal
Copy link
Contributor

meehkal commented Apr 9, 2021

It is further questionable if those should be treated as character, or if those would make a good case for required punctuation, or what indeed their role is.

This depends on the language. In Kildin Saami (single apostrophe) or Nenets (double apostrophe) they should be regarded letter characters, because they have sound equivalents.

@kontur kontur mentioned this issue Sep 6, 2021
@kontur
Copy link
Contributor Author

kontur commented Jun 17, 2022

Closing this, in favor of #82

@kontur kontur closed this as completed Jun 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants