Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Glyphs3 RTL kerning #778

Closed
wants to merge 56 commits into from
Closed

Support Glyphs3 RTL kerning #778

wants to merge 56 commits into from

Conversation

yanone
Copy link
Contributor

@yanone yanone commented Feb 21, 2022

Glyphs 3 kerning is sorted into the new kerningLTR and kerningRTL font data attributes instead of the previous plain kerning. The kerning for RTL is now stored in opposite attributes, contradicting the first/second logic of UFO and G2. @MMK_L_* becomes @MMK_R_* (and vv), and the visually right glyph side’s kerning class is now stored in kernRight as opposed to the previous kernLeft (=first/incoming side).

This means that correctly sorting glyphs into first and second UFO groups now involves checking a glyph’s script and whether or not it’s a number (that are always LTR even for RTL scripts).

This in turn required a fix for the glyph data generator that previously didn't generate the script for unencoded derivative glyphs such as hah-ar.init.swsh. The algorithm then cycles through the glyph name variations by cutting off dot-separated suffices from the right until glyph data could be found (hah-ar.init.swsh->hah-ar.init), and adds only the script value, as the other values were found to be already correctly generated.

@yanone
Copy link
Contributor Author

yanone commented Feb 22, 2022

One sec, I’m gonna rewrite the script construction for the glyph data to be more aligned with the methods already in place for constructing the production name and categories.

@yanone
Copy link
Contributor Author

yanone commented Feb 22, 2022

Okay, I’m done. The script construction is now analog to the category construction, and also covers ligatures as requested in the #TODO comment.
The failure in the Windows tests I believe is unrelated to this PR.

@yanone
Copy link
Contributor Author

yanone commented Feb 22, 2022

It seems wasteful to do the info lookup for each property. Why not add one getGlyphInfo(name, data) method and then get all needed info from it?

I didn’t write the initial state of this method, but what you’re describing is already happening, I believe, in the initial attributes = _lookup_attributes(glyph_name, data).

The subsequent methods construct the missing attributes in case a glyph couldn’t be found in the data by its name or unicode, which is quite the norm for my Arabic use cases (hah-ar.init.swsh)

@schriftgestalt
Copy link
Collaborator

My comment was more general, not about your changes. I should make a separate issue.

if glyph.parent.format_version > 2: # Glyphs 3
glyph_data = glyphdata.get_glyph(glyph.name)
if (
glyph_data.script in ("arabic", "hebrew")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only "arabic" and "hebrew"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the two RTL scripts to my knowledge. This point is arguably the critical one, also about the numbers. I chose those to my best knowledge. Some algorithm needs to be in place to sort Glyphs’ kerning classes into the correct first/second pairs, and I didn’t know any better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add the writing direction to the glyphdata.xml but there might be a way to query unicode to get the writing direction of each code. Then we don’t need to check for script == arabic and category != Number.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get script direction from a function in ufo2ft.utils.

(I’m on the road this week so won’t be able to review.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add the writing direction to the glyphdata.xml but there might be a way to query unicode to get the writing direction of each code. Then we don’t need to check for script == arabic and category != Number.

Yes, I could try this approach. @anthrotype Seems like a good idea?

@khaledhosny
Copy link
Collaborator

This is getting dangerously convoluted. The designer manually determines which kerning is RTL and which is LTR, then glyphsLib turns this into a single kerning table, then ufo2ft’s kern feature writer will try to deduce which kerning is RTL and which is LTR.

@khaledhosny
Copy link
Collaborator

Also, some of the work being done here conflicts with #771.

if direction == "LTR":
match = re.match(r"@MMK_L_(.+)", left)
elif direction == "RTL":
match = re.match(r"@MMK_R_(.+)", left)
Copy link
Member

@anthrotype anthrotype Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait.. does Glyphs 3 really call the class on the left-hand side @MMK_R_ and the one on the right-hand side @MMK_L_ for RTL kerning?!

Copy link
Contributor Author

@yanone yanone Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll try to answer: The kerning pairs are still defined in logical order. So the code logic doesn’t change here. First stays first, second stays second. Additionally, for the new RTL-specific kerning table, Georg has switched the L and R to now be visual rather than logical, (as he did with kernLeft and kernRight classes) which I believe shouldn't have happened. We should have silently accepted that L and R were never appropriate names but are used to describe 1st and 2nd.

FYI, you can check the two added RTL_kerning_v2.glyphs and RTL_kerning_v3.glyphs files for reference, in particular the kerning pair reh-ar,hah-ar.init.swsh (in logical order) which is defined in both.

@yanone
Copy link
Contributor Author

yanone commented Feb 22, 2022

This is getting dangerously convoluted. The designer manually determines which kerning is RTL and which is LTR, then glyphsLib turns this into a single kerning table, then ufo2ft’s kern feature writer will try to deduce which kerning is RTL and which is LTR.

We’re hoping for a remedy with UFO 4 to support separate kerning tables for each writing direction, if I understand correctly? Today, G3 sources don’t produce any Arabic kerning at all, so there is an urgency.

Also, some of the work being done here conflicts with #771.

They way I see it, it doesn't conflict with my issue logically, so I could easily wait it out and apply my changes to the updated code after #771. Is that PR under active development?

@anthrotype
Copy link
Member

I actually agree with Khaled. This is getting too complicated and we need to step back a little. Who decides what pairs get placed in the kerningLTR and kerningRTL? If it's the user, then this classification needs to be carried across to ufo2ft somehow, we can't rely on ufo2ft splitting it again, with the risk of applying a slightly different split.

Second question: in the kerningRTL dictionary, is the order logical (like in OpenType's and UFO's first and second glyph depending on a script writing direction) or is it a visual order, like literally "left" and "right" (as in the proposed but never implemented hkerning.plist in UFO4)?

@schriftgestalt
Copy link
Collaborator

I have finished the GlyphsInfo stuff. Please have a look at the GlyphsInfo3 branch and pull request.

@simoncozens
Copy link
Collaborator

I've said this privately to Yan, but I think it's worth talking about here as well:

I have some misgivings about the direction of this PR, but I’m aware that some of that may be to do with the fact that I have some misgivings about the whole concept of a separate RTL kerning dictionary. So it’s hard for me to separate my feelings about the PR implementation from my feelings about what it’s implementing.

I think one of the reasons why you're not getting traction on this PR (at least from me), is that conceptually I can't get my head around it. Splitting kerning by direction at the source level doesn't make sense to me. Most glyphs have inherent directionality so you don't need to store this separately. For those which do not, like kerning two common script glyphs together, then what? Two numbers will be displayed in logical order regardless of direction, so that doesn't need a separate RTL dictionary. So really the only case this deals with is punctuation against punctuation. And then why split by direction? Perhaps your Hebrew is more tightly spaced than Arabic - then what? I'm sure there are reasons why separate LTR and RTL kerning is a thing. But to be honest, none of them convince me. So it's not something I really want to be spending my time thinking about anyway.

And when I do think about it, in terms of the implementation, I feel like this could be much, much simpler. glyphsLib is part of a UFO-based compilation approach where we convert glyphs files to UFO for compilation. There is only one kerning dictionary in UFO. So no matter what we do, everything is going to end up in a single kerning dictionary anyway, at which point it is the responsibility of ufo2ft to turn the contents of that dictionary into GPOS rules. ufo2ft already does the clever stuff about working out whether a kern is LTR or RTL.

So my conclusion is that glyphsLib does not need to do that - it just needs to gather all the kerns wherever they live and put them into a UFO kerning file. And now we have a situation where this is all based on a GlyphsInfo file? That makes even less sense to me.

@schriftgestalt
Copy link
Collaborator

It is not Jan’s fault that their file structure is how it is.

In Glyphs, there are different aspects about the kerning.

  • It needs to be able to produce proper GPOS.
  • It needs to display the data in the UI.
  • should the pairs be stored in typing or visual order. Same goes for classes.

Splitting it into two kerning dicts made the handling of the UI much easier and fixed some edge cases that were in Glyphs 2.

I’m happy to discuss this in detail and if you can explain how to handle it properly, I can change it for the next version.

@simoncozens
Copy link
Collaborator

I think the way to handle it properly is to combine kerningLTR and kerningRTL into one dictionary and then let ufo2ft do its thing. But I haven't tested this.

@yanone
Copy link
Contributor Author

yanone commented Dec 2, 2022

I'm eager to try.

_glyph_kerning_attr() (https://github.com/googlefonts/glyphsLib/blob/support_glyphs3_rtl_kerning-3/Lib/glyphsLib/builder/groups.py#L141, this is my latest branch, not this one) uses the glyph direction to determine the group. You're saying, the group and order should be determined strictly by the origin dict of the pair (LRT or RTL)? I'm not sure where to begin here.

This would work, but will break round-tripping, as for UFO to Glyphs conversion we wouldn't know in which dict to move the pairs then.

@yanone
Copy link
Contributor Author

yanone commented Dec 2, 2022

So I think it's really the group where the problem lies here.

The kerning dict conversion in kerning.py probably won't need changing.
But the two separate kerning dicts are only half of the story.

Since Georg also decided to switch the kerning groups from leftKerningGroup to rightKerningGroup and vv for RTL, we need to know the writing direction of each glyph for groups.py, which is how we ended up here in the first place.

@yanone
Copy link
Contributor Author

yanone commented Dec 2, 2022

The only solution to this that I can see without Georg’s updated glyph data is to decide the group sides based on whether or not a glyph can be found as part of the RTL kerning, so walk the dict for each group decision (or compile that once for speed)

@yanone
Copy link
Contributor Author

yanone commented Dec 2, 2022

... which won’t work for bidirectional glyphs. They'll basically always end up with the wrong side groups as soon as they are referenced anywhere in the RTL dict.

@simoncozens
Copy link
Collaborator

simoncozens commented Dec 2, 2022

ufo2ft already does the clever stuff about working out whether a kern is LTR or RTL.

You talk about needing to know the writing direction for each glyph, but ufo2ft already works this out: take the writing direction for encoded glyphs from Unicode and extend it via GSUB closure to unencoded glyphs. We don't need to use GlyphsInfo for it.

@anthrotype
Copy link
Member

So I think it's really the group where the problem lies here... Since Georg also decided to switch the kerning groups from leftKerningGroup to rightKerningGroup and vv for RTL, we need to know the writing direction of each glyph for groups.py, which is how we ended up here in the first place.

Yes, Yanone is right, this is the crux of the problem. It's not sufficient to simply merge the two kerning dicts. We need to find a way to deal with the groups as well.

@simoncozens
Copy link
Collaborator

The groups are going to be split by the kern splitter, though.

@anthrotype
Copy link
Member

the problem is _glyph_kerning_attr method, which determines what is to be put in public.kern1 and public.kern2 kerning groups, used here:

for glyph in self.font.glyphs.values():
for side in 1, 2:
if (glyph.name, side) not in recovered:
attr = _glyph_kerning_attr(glyph, side)
group = getattr(glyph, attr)
if group:
group = f"public.kern{side}.{group}"
groups[group].append(glyph.name)

This returs "leftKerningGroup" or "rightKerningGroup" depending on the UFO group's logical position (1 or 2). For Glyphs-2 kerning, this would always be "rightKerningGroup" for public.kern1 and "leftKerningGroup" for public.kern2.
Now, since Glyphs3, for RTL kerning, these have been swapped: i.e. "leftKerningGroup" for public.kern1 and "rightKerningGroup" for public.kern2.

@anthrotype
Copy link
Member

will break round-tripping, as for UFO to Glyphs conversion we wouldn't know in which dict to move the pairs then

we could work around this by storing stuff in the lib, but at this stage is more urgent to be able to compile these fonts, we can add support for roundtripping at a later stage

@yanone
Copy link
Contributor Author

yanone commented Dec 2, 2022

You have no idea how delighted I am that someone utters the word urgency in this context other than me. It is indeed the case that this whole disaster is currently leading to Arabic and Hebrew typography being slowed down and disadvantaged. I know of several projects by several studios that are on hold since a while now, they're just not speaking out about it even though I asked them to.

@simoncozens
Copy link
Collaborator

simoncozens commented Dec 2, 2022

So how I would attempt to approach this would be to put everything into two groups, e.g. public.kern1.A and public.kern2.RTL.A if it's a left glyph, or public.kern2.A and public.kern1.RTL.A if it's a right glyph, and when merging the two dictionaries together, rewrite the groups depending on whether you're processing kerningLTR or kerningRTL.

@yanone
Copy link
Contributor Author

yanone commented Dec 2, 2022

Deriving the writing direction of a glyph from Unicode will have to go through glyphdata.py to make it work for unencoded derivative glyphs, which will prove again why GlyphData.xml was a mistake in the first place rather than using Unicode on-the-fly.

@anthrotype
Copy link
Member

anthrotype commented Dec 2, 2022

hm Simon is onto something, keep going!

@yanone
Copy link
Contributor Author

yanone commented Dec 2, 2022

So basically here?

@yanone
Copy link
Contributor Author

yanone commented Dec 2, 2022

So how I would attempt to approach this would be to put everything into two groups, e.g. public.kern1.A and public.kern2.RTL.A if it's a left glyph, or public.kern2.A and public.kern1.RTL.A if it's a right glyph, and when merging the two dictionaries together, rewrite the groups depending on whether you're processing kerningLTR or kerningRTL.

I understand the first part, but I don't understand where the merging is happening that Simon is speaking of.

@simoncozens
Copy link
Collaborator

OK, I'm going to avoid using real examples because if I do, I'll get horribly confused with MMK_L_ and MMK_R_ and muck it up, and that won't help explain anything.

So the first step is to store each glyph into @MMK_some_kerning_group and also into @MMK_some_other_kerning_group_RTL.

After we've built all those groups, what we currently do is:

  • iterate through each pair defined in kerning or kerningLTR
  • for each pair, if it contains a group, transform the groups into @public.kernX.whatever where X depends on whether it was left or right
  • then emit the kern pair into the UFO kerning.

We keep doing that. But then next, we also do the same thing but for kerningRTL:

  • for each pair, if it contains a group, transform the groups into @public.kernX.RTL.whatever where X depends on whether it was right or left. (i.e. switching it round here)
  • then emit the kern pair into the UFO kerning.

That will work for all cases except for situations where the same glyph pair has both LTR and RTL kerning, with different values in each table. But we were never going to get that right anyway.

@anthrotype
Copy link
Member

wow.. and the result of this very complicated process is basically to undo Glyphs3's LTR vs RTL kerning split such that ufo2ft can later on redo that in its own way (ie. using cmap+GSUB closure to identify script and pair direction).

@schriftgestalt
Copy link
Collaborator

That will work for all cases except for situations where the same glyph pair has both LTR and RTL kerning, with different values in each table. But we were never going to get that right anyway.

That is also one reason why I did the split.

anthrotype added a commit to anthrotype/GlyphsSDK that referenced this pull request Dec 6, 2022
if I look at googlefonts/glyphsLib#778, I see the name of the Right-to-left kerning dictionary in the .glyphs plist file uses the correct spelling `kerningRTL`, so the incorrect misspelled `kerningRLT` must have been fixed already and a mention of it is no longer needed in this .md doc?

/cc @schriftgestalt @yanone
@yanone
Copy link
Contributor Author

yanone commented Dec 8, 2022

Gentlemen, I’ve implemented Simon’s idea and am happy to announce that I think it works. I made it into a separate PR for clarity: #838

@yanone
Copy link
Contributor Author

yanone commented Mar 3, 2023

This is the first of the three related PRs and has now been fixed in #865, so closing this too.

@yanone yanone closed this Mar 3, 2023
@yanone yanone deleted the support_glyphs3_rtl_kerning branch March 3, 2023 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants