Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPOS/feature writing: an overall design #682

Closed
cmyr opened this issue Jan 24, 2024 · 3 comments
Closed

GPOS/feature writing: an overall design #682

cmyr opened this issue Jan 24, 2024 · 3 comments

Comments

@cmyr
Copy link
Member

cmyr commented Jan 24, 2024

So after much dawdling I'm now trying to look seriously at what will be involved in actually matching the GPOS table generated by fontmake, for Oswald.

The current implementation was a useful first pass, but it is crude, and does not do a number of things that the python implementation does:

  • it doesn't look for glyph classes defined in either the AST or elsewhere (the ufo lib, possibly somewhere else in glyphs.app?)
  • it doesn't handle 'magic comments' for inserting the generated lookups in the correct position (Intelligently merge lookups #523)
  • it doesn't attempt to split kern rules into separate lookups based on the relevant language systems (and writing direction?) (GPOS: feature writers should split lookups based on language system #619)
  • this last step also involves determining, for each glyph, the scripts it is used in; and this in turn means we likely need to know what the GSUB rules are, so that we can figure out which glyphs are reachable from which other glyphs? (I'm not sure exactly about this, but it looks like kernFeatureWriter.py is doing something like this)

In short, there is a bunch of stuff left to do, and making this work will require some architectural changes. Currently we compile kerning in two steps: we have a job that takes the raw IR kerning and generates lookup builders, and then we have a separate job that compiles the user FEA, and then inserts the builders via a callback.

The main issue is that in order to correctly generate the builders, we now need access to the FEA, in order to inspect what language systems are defined, as well as whether or not there is an explicit GDEF block.

but we still want to be able to start the kerning work as soon as possible, and not need to wait for the compilation of the user FEA to complete before this work can start. So what I'm now imagining is that we take more control over the compilation of FEA, and in particular we need to split out the parsing step from the validation/compilation step.

So what I'm now imagining is something like this:

  • as soon as possible after a run starts, we start a ParseFea task. This task produces an AST.
  • when this task finishes, we can then start the Kerning & Mark tasks, which now need access to the AST
  • we would also like to start the main feature compilation task at this point
  • when all of these things are done we can then merge it all together and dump the table.

Most of this should be easy enough; in particular fea-rs already has a clear separation between the parsing and compilation passes, and it is easy enough to run them separately.

The one thing that is going to be a bit annoying is the merging logic, and I'll need to take a closer look at the python implementation to figure out how best we want to implement this; my hope is that it is as simple as recording the current lookup ID when we encounter a magic comment, and then inserting the generated lookups for the relevant feature at that point in the lookup list, but it is possible that this is more complicated.

@anthrotype
Copy link
Member

anthrotype commented Jan 25, 2024

it doesn't look for glyph classes defined in either the AST or elsewhere (the ufo lib, possibly somewhere else in glyphs.app?)

Well, for kerning, we do already look at the kerning groups as defined in UFO groups.plist or Glyphs.app's kernRight/kernLeft properties of glyphs. For mark, the classes are defined implicitly by multiple mark glyphs having the same named mark anchor. The only other place where (kern or mark) glyph classes may be defined is inside the features.fea itself if there are already some hand-written kern or mark/mkmk features present in the FEA and the ufo2ft is inserting its own in the same file, it wants to prevent defining the same classes twice or with a name that clashes with existing ones. But in fontc we do not generate FEA and then compile, but directly build the tables with fea-rs, so we should be able to ignore this situation, and don't need to look anywhere else for glyph class definitions.
If you are referring to the GDEF GlyphClassDefs statement (base, mark, ligature, ligature component), we can get away with not parsing that from the FEA, because both input sources have a built-in way to define those glyph classifications (UFO has 'public.openTypeCategories' lib key, Glyphs.app has the global GlyphData.xml database optionally overridden by per-glyph properties), and hand-writing those in the FEA (while possible in the current workflow) can be considered 'legacy'.

it doesn't handle 'magic comments' for inserting the generated lookups in the correct position

yeah, that is something that we would like to support. I believe right now we append the generated lookups at the end of the lookup list as built from the hand-written FEA, right?

it doesn't attempt to split kern rules into separate lookups based on the relevant language systems (and writing direction?)

for correctness sake, only splitting by script writing direction is strictly required (and as Jany pointed out in the other thread, there are arguments for not wanting to split lookups by scripts). A cmap/GSUB closure (or something equivalent) is needed to classify glyphs into RTL and LTR buckets.

so that we can figure out which glyphs are reachable from which other glyphs

exactly that.

we now need access to the FEA, in order to inspect what language systems are defined

We do that to know which scripts the font was meant to support. We first look at the cmap, gathering the scripts associated with unicode codepoints defined there (via unicode script_extension property), but some characters are associated with more than one unicode script, so we complement that by looking at the scripts explicitly defined in the FEA languagesystem statements. We also use that info to know under which languages we need to register our kern lookups.

whether or not there is an explicit GDEF block

as I said above, we can probably get away without that, since that specific piece of info can be defined elsewhere outside FEA (and should be easy to convert older project.

we need to split out the parsing step from the validation/compilation step

yes, that sounds a sensible thing to do.

my hope is that it is as simple as recording the current lookup ID when we encounter a magic comment, and then inserting the generated lookups for the relevant feature at that point in the lookup list

I think so

@cmyr
Copy link
Member Author

cmyr commented Jan 25, 2024

it doesn't look for glyph classes defined in either the AST or elsewhere (the ufo lib, possibly somewhere else in glyphs.app?)

Well, for kerning, we do already look at the kerning groups as defined in UFO groups.plist or Glyphs.app's kernRight/kernLeft properties of glyphs. For mark, the classes are defined implicitly by multiple mark glyphs having the same named mark anchor. The only other place where (kern or mark) glyph classes may be defined is inside the features.fea itself if there are already some hand-written kern or mark/mkmk features present in the FEA and the ufo2ft is inserting its own in the same file, it wants to prevent defining the same classes twice or with a name that clashes with existing ones. But in fontc we do not generate FEA and then compile, but directly build the tables with fea-rs, so we should be able to ignore this situation, and don't need to look anywhere else for glyph class definitions. If you are referring to the GDEF GlyphClassDefs statement (base, mark, ligature, ligature component), we can get away with not parsing that from the FEA, because both input sources have a built-in way to define those glyph classifications (UFO has 'public.openTypeCategories' lib key, Glyphs.app has the global GlyphData.xml database optionally overridden by per-glyph properties), and hand-writing those in the FEA (while possible in the current workflow) can be considered 'legacy'.

yes I was looking at https://github.com/googlefonts/ufo2ft/blob/cea60d71dfcf0b1c0fa4e133ec4231ba06fe0da0/Lib/ufo2ft/featureWriters/baseFeatureWriter.py#L379, which checks both the FEA and also public.openTypeCategories. If we can skip looking at the fea here that would be a win!

yeah, that is something that we would like to support. I believe right now we append the generated lookups at the end of the lookup list as built from the hand-written FEA, right?

correct

@rsheeter
Copy link
Contributor

We're actively writing features so I think this should be called done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants