-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Tlingit (iso 639-3 tli) #147
Conversation
Covers both current Tlingit orthographies.
Thanks @jcrippen for this very good PR, great to have linguists contribute data directly 👍 Before merging a few quick questions about the dual orthography. Looking at both orthographies being Unless truly both are equally defacto standard I would prefer to a) not have both |
Please also add yourself to the |
This is great! According to Crippen 2019:839-840
@jcrippen Does that mean the Revised Popular orthography is the primary orthography? It would be useful to indicate the first orthography is the Revised Popular and the second is Leer’s orthography, if I’m not mistaken. I’m not sure the following note is useful or necessary as it applies to every decomposable character and whichever orthography they are used in. While it surely is a font or font shaper or input related issue occuring in this orthography, it is not specific to these character sequences. @kontur maybe hyperglot should automatically have a general note for when graphemes can be both composed or decomposed, instead of adding such a note to every single language where this does or can occur.
|
Just wondering about Y̱ y̱ and Ɏ ɏ in the auxiliary, it seems that underlined letters were used for letters with underscore below (here U+0331 macron below), hence the underscore sometimes crossing the descender of g as noted. But in that case, if Ɏ ɏ are taken to represent some generic "y with stroke" here with stroke through descender, given Ǥ ǥ is currently a generic "g with stroke" often with a stroke through descender, then should either Ǥ ǥ be included in the auxiliary like Ɏ ɏ, or, the opposite, Ɏ ɏ be kept out of the auxiliary like Ǥ ǥ? It also seems the uppercase is inadequate for both letters Y and G as the stroke only crossed the descenders of the lowercase. |
I think as
So this is less specific to the language and more specific to the font and how stringently the user wants to test it. All combining marks are implicitly required if a base + mark combination does not exist as encoded character, meaning the mark (and the base, of course) are always required to form the composite. |
I misremembered seeing this only in Haida but it is in Tlingit: Keri Edwards, Dictionary of Tlingit, Juneau: Sealaska Heritage Institute, 2009 uses ɢ̱ (or what looks like it) for the lowercase of G̱ instead of g̱. Is that too rare to be added to the auxiliary or mentionned in the note? |
That’s me.
Even though no community officially uses the Leer orthography, it’s still in use by individuals in various contexts. I see the Leer orthography in use on Facebook, in community posters, and on business cards, for example. I’ve even seen them mixed together in a few cases. There are presumably a bunch of social and personal reasons for using one or the other. Technically, supporting the Leer orthography is easier than the RP orthography because the diacritics are more common. All of ◌́ ◌̀ ◌̂ ◌̈ ◌̃ are used in western European orthographies and ◌̨ is used for Polish and Lithuanian. So if a font is intended to support these then it also supports the Leer orthography for free. The real problem for in my experience is figuring out whether a font supports the RP orthography, particularly <◌̱> U+0331 and <Ḵḵ> U+1E34 and U+1E35. If it supports these then it most likely supports the ◌́ ◌̀ ◌̂ ◌̈ ◌̃ ◌̨ diacritics as well. Maybe the best solution is
Then for language detection a font will need to minimally support <ḴḵX̱x̱G̱g̱>. If it also happens to support ◌̨ then the font can handle both orthographies.
That’s a good point, I’ll add a
I agree, and I’ll take out that point. I put it in because I’ve had exactly the problem of many fonts having <◌̱> U+0331 so that <X̱x̱G̱g̱> work fine but then lacking <Ḵḵ> U+1E34 and U+1E35 so that text mixing them (which is Unicode NFC compliant!) is broken. (The reverse where a font only has <Ḵḵ> and no <◌̱> U+0331 is far more common because apparently font designers often think in Unicode code blocks.) I will remove the
Upon reflection I think that both <Y̱y̱> and <Ɏɏ> should be excluded. They are limited to only a few documents and the <Ÿÿ> has completely replaced them in all cases. I’ve in fact seen both <Y̱y̱> and <Ɏɏ> replaced with <Ÿÿ> in digital text versions of older documents so there is already a precedent for this substitution. People who need them probably already need other specialized font stuff. Having removed <Ɏɏ> there’s no need to be concerned with <Ǥǥ> which to my knowledge has not been used for Tlingit. Also, these both decompose to base + <◌̵> U+0335 COMBINING SHORT STROKE OVERLAY rather than to <◌̱> U+0331 COMBINING MACRON BELOW which is the decomposition of <Ḵḵ>. So they’re not logically a part of the <ḴḵX̱x̱G̱g̱> system and this is another point against including them.
As I recall, Edwards used <ɢ̱ > instead of <g̱> specifically because the available fonts did not support <g̱>. Nobody has done this since, so it’s best considered a stylistic variant rather than a distinct character. I recall seeing <ḡ> U+1E21 used for similar reasons (paired with <G̱> not <Ḡ> U+1E20). All this trouble because it was easy to type Oh, I forgot to include the combinations of <◌̨> and <◌́> etc. I’ll add those to the One last thought: the website implies a way to make per-character design requirements or notes. Is this possible or planned? Because most of the |
See rosettatype#147 for discussion. Drop duplicates of non-precomposed <ḴḵḺḻṈṉ>. Clarifications to `design_requirements`. Add `note` for each orthography. Add combinations of ogonek and diacritics to `auxiliary` for Leer orthography. Change Leer orthography to `status: secondary` but keep `preferred_as_group: true`.
Add additional `design_requirement` point for U+0331 COMBINING MACRON BELOW. This applies to all uses of U+0331, not just those with base <g>.
Naturally I forgot to run
There’s no indication of what is wrong. But on a hunch I removed the |
The : makes the yaml parser choke.
Oh never mind, the problem is <:> in the contents of each |
This looks good to me 👍 As discussed here and in #149 Hyperglot doesn't distinguish between Composed or Decomposed forms in the data, but does so upon checking a font for language support. The issue of supporting either or both forms is, as I would see it, a concern of language detection parameters and not of encoding orthography data — in fact where a Composed form is possible we save it, implicitly noting the existence of it — or rather it would be significant if a Composed form were not encoded in unicode. The note formatting works like this. You can also wrap values in single or double quotes in YAML to get around the colon issue, which is interpreted as YAML syntax otherwise. |
@jcrippen thank you. It looks good too me too. |
Super! Thanks @jcrippen — the new languages will be added to the next release 👍 |
Covers both current Tlingit orthographies.