Split kerning by script, not by direction (second attempt) #667

madig · 2022-10-11T13:02:24Z

This pulls the commits for #636 and some follow-ups aside, to unblock releases from main until the code is more thoroughly tested.

Relevant issues:

Kern splitter: Different number of kern features per master #642
RTL kerning missing from font #658 (maybe)
Compile variable features #635 (indirectly, because it stumbles over the same issue of what to do when not all sources have the same kerning information and you try to compile a varaible font)

Todo:

Track down the issue in RTL kerning missing from font #658
Do a bit of before-after compilation of single UFOs to compare their kerning and see if splitting by script introduces differences when it shouldn't.
Figure out what would need to be done to make variable fonts compile and behave correctly when not all kerning pairs are defined across all sources. The issue already existed before, but since we were splitting by direction, which is a much coarser property, it rarely was problematic. separate issue

Closes #658

madig · 2022-10-17T10:24:57Z

While investigating #658, I stumbled over the following:

The splitter stuffs foreign script kern pairs into lookups. E.g. lookup_Latn may suddenly contain both pos [Y ...] [A ...] and pos [Upsilon ...] [A ...] because the original pos @kern1.Y @kern2.A that contained their union was split up into the same lookup. I'd assume they should land in different ones?
The lookup statements are unsorted, potentially triggering a bug in https://github.com/fonttools/fonttools/blob/64fd837ca1c2669f7e0e9d41f47b376e3fa153c3/Lib/fontTools/otlLib/builder.py#L1252-L1277, where a subtable break is inserted before a value for a class pair is declared but after the classdefs are associated with the subtable already, leading to a kerning value of zero entry being inserted in the earlier subtable (because https://github.com/fonttools/fonttools/blob/64fd837ca1c2669f7e0e9d41f47b376e3fa153c3/Lib/fontTools/otlLib/builder.py#L2131 is None, None), while the next one has the correct one, but is ignored by shapers. Additionally, sorting the statements manually shaved 10 KB off the font size on my test font source. Reverting to fontTools 4.37.1 (before Promote pair-pos value records fonttools/fonttools#2776) does not help.
[otlLib] ClassPairPosSubtableBuilder logic is wrong fonttools/fonttools#2793 says there may be bigger problems lurking?!

madig · 2022-10-17T19:39:03Z

So, in the test font provided by yanone, the feature file does not declare languagesystem grek dflt; explicitly in the features.fea file, but the kern writer uses these declarations to sort kern pairs into script buckets. Greek glyphs are then considered script-less, which is wrong, and they get dumped into random other lookups. Uh. Should the writer even do this when kerning typically touches more scripts that GSUB? I'd derive the scripts in use purely from the glyph set? @simoncozens

madig · 2022-10-18T09:27:48Z

Side-note: the code in

ufo2ft/Lib/ufo2ft/featureWriters/kernFeatureWriter.py

Line 369 in 58d8b2b

for flags, pairs in sorted(pairsByFlags.items()):

sorts the kerning pairs by type, but this is later undone in

ufo2ft/Lib/ufo2ft/featureWriters/kernFeatureWriter.py

Line 501 in 58d8b2b

def _makeSplitScriptKernLookups(

when splitting a pair can result in different types of kerning pairs. The sorting can probably be moved down to the split kern emitter.

madig · 2022-10-18T10:43:40Z

From call:

Split kerning pairs by script into buckets
Use fontTools.misc.classifyTools within each bucket to make disjoint kern1 classes so that coverage per PairPos subtable is unique (to avoid potential fontTools otlLib subtable break bug).
1. Collect all kern1 classes from the bucket, pass them to classifyTools
2. Get back a mapping of {glyph: {disjoint set for that glyph}}
3. For each "pos kern1 kern2 value;" in the bucket:
  1. smaller_kern1s = group_by(kern1, glyph => mapping[glyph])
  2. emit ["pos smaller_kern1 kern2 value;" for each smaller_kern1 in smaller_kern1s]
Sort the whole bucket to have:
- all the enum pos and glyph-to-glyph pos first, then the class-to-class pos second;
- the "pos small_kern1" with the same kern1 next to each other

madig · 2022-10-18T10:47:14Z

Also, the PairPos coverage must be unique inside a lookup, otherwise things are unreachable. That's an invariant to follow.

madig · 2022-10-18T11:11:53Z

So, regarding #667 (comment), we either:

Tell the designer to add a Greek languagesystem, or
Insert the lookup under DFLT (if exists, otherwise error?), or
Just use the script in the lookup even when the languagesystem is not declared.

Maybe I'll try the latter and see what happens in various shaping engines.

anthrotype · 2022-10-18T12:54:56Z

the feature file does not declare languagesystem grek dflt; explicitly in the features.fea file, but the kern writer uses these declarations to sort kern pairs into script buckets. Greek glyphs are then considered script-less, which is wrong, and they get dumped into random other lookups.

This thing of using the globally declared default languagesystems pre-dates the split-kerning-by-scripts PR, but maybe doesn't make sense any more and should be revised.

In the previous code in which we'd only split lookups between LTR/RTL, the default languagesystems were parsed and grouped into buckets depending on whether they were associated with "kern" or "dist", and on the script's horizontal writing direction (LTR/RTL), stored in self.context.scriptGroups dict of dicts (see _groupScriptsByTagAndDirection)

Then, if there was no globally declared script associated with "dist" feature, none would be generated.

Then, kerning pairs would be split into three lookups: LTR, RTL and a kern_dflt for pairs whose glyphs' script/bidi properties don't fit either LTR or RTL (e.g. punctuation, numbers, etc.). Finally, when deciding under which script/language/feature (kern vs dist) to register each of these lookups, again these script groupings derived from the global default languagesystems would be used (e.g. see _registerKernLookups). E.g. if kerning.plist contains pairs between LTR glyphs but default languagesystems only declares RTL scripts and no LTR ones, then the kern_ltr lookup would be registered under DFLT/dflt.

What we could have done maybe was to register such kern_ltr, not under DFLT/dflt, but under all the LTR scripts that are actually used therein e.g. by adding script grek; and such statements, even if they are not globally declared in the default languagesystems (which we know it's ok). I'm honestly not sure and can't remember why we didn't actually do that...
Maybe we didn't feel like we could safely infer stuff and preferred to leave it to the font developer to decide about.
Or maybe it was just simpler and OK enough that way.

Even so, imposing the font developer to write explicit default languagesystems to control the behavior of the kern feature writer may not always be desirable, as those apply indiscriminately to both GPOS and GSUB. In the example brought forward by Nikolaus, the GSUB features work fine without Greek being declared among the default languagesystems, but GPOS kerning may need those otherwise ufo2ft places greek kerning under DFLT/dflt; but if greek were added to default languagesystems then the meaning of all the pos/sub rules elsewhere that are un-marked by explicit script/language statements would suddenly also be registered for Greek too - though i guess they'd be no-op since no greek glyphs are probably used in them...

I'm warming up to the idea that for the split-by-script kern writer, we should just ignore the global languagesystem and use whatever scripts that we find among the kerned glyphs.

Thoughts @simoncozens @khaledhosny @madig ?

simoncozens · 2022-10-18T13:16:17Z

That sounds good to me, the only issue being that we still need to look at the languagesystem to be aware of what shaper the font goes through, so that we can put kerns in kern or dist appropriately. So we still end up looking at the languagesystems...

anthrotype · 2022-10-18T14:08:16Z

so that we can put kerns in kern or dist appropriately

once we know what scripts are used in kerning pairs, and have split these accordingly, we also know whether there are dist-enabled scripts or not

madig · 2022-10-18T14:41:04Z

It seems that the new kern writer should be reworked a bit to adjust for our new assumptions. I'll try and draw up an algorithm to better wrap my mind around it.

Also, blindly using unicodedata.script_extension instead of

ufo2ft/Lib/ufo2ft/featureWriters/kernFeatureWriter.py

Lines 445 to 454 in 58d8b2b

    
           def knownScriptsPerCodepoint(self, uv): 
        
               if not self.context.knownScripts: 
        
                   # If there are no languagesystems, consider everything common; 
        
                   # it'll all end in DFLT/dflt anyway 
        
                   return COMMON_SCRIPT 
        
               return [ 
        
                   x 
        
                   for x in unicodedata.script_extension(chr(uv)) 
        
                   if x in self.context.knownScripts or x in DFLT_SCRIPTS 
        
               ]

gives me additional lookup kern_Thaa and lookup kern_Yezi blocks for Arabic numerals :) I suppose they don't hurt, but are they useful?

madig · 2022-10-18T15:10:28Z

@anthrotype what's the purpose of dealing with bases and marks in the kern writer?

anthrotype · 2022-10-18T15:28:53Z

I am not sure we want to use script_extension.. that may add unwanted/unccessary script records. How about we extend the global default languagesystems with the set of scripts (not script_extensions!) actually used by the kerning pairs. You make a single kern_Arab lookup and also register the same under Thana or Yezi or whatever "extension" scripts
but only iff those are defined in the global languagesystems..

anthrotype · 2022-10-18T15:31:32Z

aargh this is getting too complicated now, too much guessing. Maybe using the default languagesystems as we were/are is not so bad after all, at least it's explicit and leaves the responsibility to the font developer. 🤔

madig · 2022-10-18T17:15:16Z

I think we should follow the guessing/inference approach until we hit an actual roadblock. I'll look into using the script rather than script_extension.

Also, I found another problem: enum pos em-cy [oslash.BRACKET.150 oslashacute.BRACKET.150] -10;. I think ufo2ft does not currently access Designspace rules for script discovery and the rules are tacked onto the binary not through features, but by writing into the final binary later in varLib. This particular line was in the feature output of a single UFO, so I think there's little we can do there. We probably need to pass on the info somehow at least when compiling from a Designspace, to avoid merging weirdness later?

madig · 2022-10-18T17:21:24Z

Note to self: look at trying to preserve the kerning pair type (glyph to glyph, class to class, ...) when splitting pairs, by emitting eg pos [a] [b c] instead of enum pos a [b c].

madig · 2022-10-18T17:58:25Z

You make a single kern_Arab lookup and also register the same under Thana or Yezi or whatever "extension" scripts

Maybe one could also classifyTools these specific multi-script pairs, group them into their own lookup and insert references in feature kern for each script 🤔

madig · 2022-10-18T19:01:14Z

Hm, if some codepoints are used in multiple scripts (e.g. Arabic numerals in Arabic and Thaana), is just storing them for Arabic gonna work? Are shapers going to run thaa and arab scripts? I think I'll keep the script_extensions for now and figure something out later.

madig · 2022-10-19T10:02:07Z

More thinking... since the BRACKET glyphs aren't reachable in the UFO, should the kern writer actually do a reachability analysis (look for Unicoded glyphs plus scour GSUB; additionally scan Designspace rules) and ignore glyphs that aren't reachable?

anthrotype · 2022-10-19T10:07:55Z

feature writers work on ufos, they don't know about designspace..

madig · 2022-10-19T10:32:54Z

Hm. So, how do we pass in higher level information? Add new attribute additionalGlyphSubstitutions: dict[GlyphName, GlyphName to BaseFeatureCompiler, mirroring Designspace rules? That may not be enough though, because feature writers preceeding the kern writer could add substitutions and thereby influence script discovery. Or, well, such modifications work on the feature file, so compileGSUB would include them. So this may work?

anthrotype · 2022-10-19T10:36:14Z

it might work but feels iffy

anthrotype · 2022-10-19T10:40:47Z

i'm thinking we should provide a way for users to override ufo2ft's inference of unicode properties such as script/bidi category based on cmap+GSUB for specific glyphs, we might need that anyway to support Glyphs.app's RTL kerning #658 (comment)

madig · 2022-10-19T11:02:28Z

So, what do we want to be able to pass in? A glyphUnicodeMapping: dict[str, int] | None where the latter would trigger inference and the former is considered complete?

madig · 2022-10-19T12:19:49Z

Access to the Designspace is solved with a new VariableFeatureCompiler in https://github.com/googlefonts/ufo2ft/pull/635/files#diff-7d78e1bc9030aa197442bd5c7eca11381272a8c20f3467bbd7bea27d12bc17eeR355-R393. Having a separate glyphUnicodeMapping might still be useful.

anthrotype · 2022-10-19T12:25:00Z

Having a separate glyphUnicodeMapping might still be useful.

ok let's do that in a different pr though

madig · 2022-10-19T22:57:40Z

I believe my WIP does what we need, all class-to-class pairings now seem to have a disjoint kern1 classes I believe, from looking at output for yanone's private real world font right here.

I did spot two things though in an excerpt, one weird (two kern2 classes for the same kern1 parent that were almost identical except for one additional glyph???) and the other a possible size optimization opportunity. Consider the following class to class pairings:

[A B C] [a b] -10
[A B C] [c d] -10
[A B C] [e f] -20

can we collapse them into the following?

[A B C] [a b c d] -10
[A B C] [e f] -20

Or is this an optimization that only the otlLib subtable builder could do because kern2s have to be disjoint per subtable?

Lib/ufo2ft/featureWriters/kernFeatureWriter.py

bidi types may not be treated right, see test_kern_split_and_mix_common -> nkoo [skip ci]

[skip ci]

madig mentioned this pull request Oct 11, 2022

Revert "Split kerning by script, not by direction (#636)" #666

Merged

Base automatically changed from revert-kern-splitter to main October 14, 2022 10:28

madig force-pushed the kern-splitter-v2 branch from 5cc1dd7 to d8638dd Compare October 18, 2022 15:09

madig commented Oct 21, 2022

View reviewed changes

Lib/ufo2ft/featureWriters/kernFeatureWriter.py Outdated Show resolved Hide resolved

madig added 25 commits November 11, 2022 12:50

Remove redundant context variable

7d0ec88

Mark _splitBaseAndMarkPairs a static method

a8adf6c

Translate test into integration test

7d13cb6

Remove unused import

a1b7aa7

Mark _splitBaseAndMarkPairs a static method

eee7a76

Write scripts and bidiTypes nito pairs up front

bbf002d

Turn test into integration tests

88ff076

Remove now unused imports

20419a8

Remove redudant "or"

38c3303

WIP: try to simplify kerningpair

4015ba0

bidi types may not be treated right, see test_kern_split_and_mix_common -> nkoo [skip ci]

More WIP

0962d45

[skip ci]

WIP: add test for bidis

f483564

[skip ci]

WIP: add assert

9661866

[skip ci]

Split and filter BiDi types correctly

3097ff5

Restore test to original

a314b7d

Change logger messages

b854e77

add comment

a0f7068

[skip ci]

WIP

2ae3bad

[skip ci]

Simplify KerningPair by using tuples

6ac95c8

Avoid sorting because we already did

9ef27cd

Move splitting and PairPos making into standalone functions

f2826b0

Make KerningPair a dataclass

4459404

Remove stale comment

e44b0f3

[skip ci]

Remove bidiTypes from KerningPair

3a1752d

Remove scripts from KerningPair

2575bc6

madig mentioned this pull request Nov 14, 2022

Split kerning by script, not by direction (third attempt) #679

Merged

madig marked this pull request as draft November 15, 2022 11:27

Remove redundant set

d07125e

madig closed this Feb 20, 2023

khaledhosny deleted the kern-splitter-v2 branch March 8, 2023 21:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split kerning by script, not by direction (second attempt) #667

Split kerning by script, not by direction (second attempt) #667

madig commented Oct 11, 2022 •

edited

Loading

madig commented Oct 17, 2022

madig commented Oct 17, 2022

madig commented Oct 18, 2022

madig commented Oct 18, 2022 •

edited by belluzj

Loading

madig commented Oct 18, 2022

madig commented Oct 18, 2022 •

edited

Loading

anthrotype commented Oct 18, 2022 •

edited

Loading

simoncozens commented Oct 18, 2022

anthrotype commented Oct 18, 2022

madig commented Oct 18, 2022

madig commented Oct 18, 2022

anthrotype commented Oct 18, 2022

anthrotype commented Oct 18, 2022

madig commented Oct 18, 2022 •

edited

Loading

madig commented Oct 18, 2022

madig commented Oct 18, 2022

madig commented Oct 18, 2022

madig commented Oct 19, 2022

anthrotype commented Oct 19, 2022

madig commented Oct 19, 2022 •

edited

Loading

anthrotype commented Oct 19, 2022 •

edited

Loading

anthrotype commented Oct 19, 2022

madig commented Oct 19, 2022 •

edited

Loading

madig commented Oct 19, 2022 •

edited

Loading

anthrotype commented Oct 19, 2022

madig commented Oct 19, 2022

Split kerning by script, not by direction (second attempt) #667

Split kerning by script, not by direction (second attempt) #667

Conversation

madig commented Oct 11, 2022 • edited Loading

madig commented Oct 17, 2022

madig commented Oct 17, 2022

madig commented Oct 18, 2022

madig commented Oct 18, 2022 • edited by belluzj Loading

madig commented Oct 18, 2022

madig commented Oct 18, 2022 • edited Loading

anthrotype commented Oct 18, 2022 • edited Loading

simoncozens commented Oct 18, 2022

anthrotype commented Oct 18, 2022

madig commented Oct 18, 2022

madig commented Oct 18, 2022

anthrotype commented Oct 18, 2022

anthrotype commented Oct 18, 2022

madig commented Oct 18, 2022 • edited Loading

madig commented Oct 18, 2022

madig commented Oct 18, 2022

madig commented Oct 18, 2022

madig commented Oct 19, 2022

anthrotype commented Oct 19, 2022

madig commented Oct 19, 2022 • edited Loading

anthrotype commented Oct 19, 2022 • edited Loading

anthrotype commented Oct 19, 2022

madig commented Oct 19, 2022 • edited Loading

madig commented Oct 19, 2022 • edited Loading

anthrotype commented Oct 19, 2022

madig commented Oct 19, 2022

madig commented Oct 11, 2022 •

edited

Loading

madig commented Oct 18, 2022 •

edited by belluzj

Loading

madig commented Oct 18, 2022 •

edited

Loading

anthrotype commented Oct 18, 2022 •

edited

Loading

madig commented Oct 18, 2022 •

edited

Loading

madig commented Oct 19, 2022 •

edited

Loading

anthrotype commented Oct 19, 2022 •

edited

Loading

madig commented Oct 19, 2022 •

edited

Loading

madig commented Oct 19, 2022 •

edited

Loading