Split kerning by script #731

cmyr · 2024-03-12T22:11:10Z

This patch is imperfect, but I think it is worth getting something committed that we can iterate from.

modulo a very small number of differences, this matches the output of fontmake on Oswald.
this does not correctly set the lookup flags (or mark filtering set); I'll do that as a followup
this does not handle the 'dist' feature
the testing here is anemic. I have at least figured out how to write tests against this, and I see that there are some ideas at https://github.com/googlefonts/ufo2ft/blob/cea60d71dfcf0b1c0fa4e133ec4231ba06fe0da0/tests/featureWriters/kernFeatureWriter_test.py that look worth porting, but I think it makes sense to focus on this as I continue to refine.

- a method to iterate the scripts + extensions for a codepoint - ability to classifty glyphs into groups based on bidi class or & script

The opentype script tags have a different format from the short names used by unicode.

anthrotype

thanks Colin this is big! I've only started looking at this today, will continue review in the next few days

anthrotype · 2024-03-13T10:33:26Z

Cargo.toml

@@ -52,3 +51,6 @@ members = [
    "layout-normalizer",
 ]

+[patch.crates-io]
+icu_properties = { version = "1.4", git = "https://github.com/unicode-org/icu4x.git", rev = "728eb44" }


they don't publish on crates.io?

they do, but I had to fix a few things to get this to work, and those things aren't published yet. See unicode-org/icu4x#4681

https://docs.rs/icu_properties/latest/icu_properties/, maybe there's an unreleased feature we need? If so we should get them to publish.

see my comment above: I had to PR a few things to icu_properties. They have a release planned, but it's a ways away.

Let's add a comment pointing at unicode-org/icu4x#4681 and noting that we should be able to drop this as of their 1.5 release.

fontbe/src/features/properties.rs

This is very closely based on the code in the KernFeatureWriter, in ufo2ft. With this patch we very nearly match fontmake's output for kerning in oswald, with a few lingering differences. It feels like there is room for polish here, but I also think it's worth checkpointing here, for further iteration.

madig · 2024-03-14T13:57:43Z

Before looking at this, does this PR incorporate the ideas in #619 (comment)?

anthrotype · 2024-03-14T14:06:17Z

Before looking at this, does this PR incorporate the ideas in #619 (comment)?

no, I don't think so. This is generating multiple per-script lookups (like ufo2ft currently does), not subtables as Jany proposed there. I think Colin wants to just match fontmake output for now.

anthrotype · 2024-03-14T14:12:37Z

@madig also note that Colin's PR does implements the feature which Khaled added recently to ufo2ft whereby kerning lookups get merged into one when cross-script kerning is present, which I think addresses your concerns about Indesign dumb composer.

cmyr · 2024-03-14T17:47:20Z

okay so there's one remaining issue here afaik, which rod posted in #733:

#733 (comment)

dfrg · 2024-03-14T17:59:23Z

fontbe/src/features/properties.rs

+pub const COMMON_SCRIPT: UnicodeShortName =
+    unsafe { UnicodeShortName::from_bytes_unchecked(*b"Zyyy") };
+pub const INHERITED_SCRIPT: UnicodeShortName =
+    unsafe { UnicodeShortName::from_bytes_unchecked(*b"Zinh") };
+
+static SCRIPT_DATA: ScriptWithExtensionsBorrowed<'static> =
+    icu_properties::script::script_with_extensions();
+
+/// The type used by icu4x for script names
+pub type UnicodeShortName = tinystr::TinyAsciiStr<4>;


I think we should just define a ScriptTag([u8; 4]) type (or just reuse Tag?) and bypass TinyAsciiStr. At least for shaping, we definitely don't want to deal with this intermediate type and I don't think it adds any value.

I'd add a const new([u8; 4]) -> Self constructor for this and a new_checked([u8; 4]) -> Option<Self> constructor that tries to round trip through the name_to_enum_mapper.

This also lets us avoid the unsafe code.

thinking on this a bit more, I don't think the checked constructor is even necessary but fn icu_script(self) -> Option<icu_properties::Script> might be useful and provides the same validation

the reason to prefer this type is because it is used by icu_propeties, so that we can do things like query the properties of a unicode value, which returns Scripts, and then we can convert those to the unicode names via that crate.

oof, I just wrote something similar a couple weeks ago: https://github.com/dfrg/fount/blob/731e374eb963d30abb4d0f938bb515cc3de0a823/fontique/src/script.rs 🤷

looking at yours, is the unicode-script crate necessary? I think that information is available via icu_properties?

in any case my vote right now is that we punt on this and can figure out a more holistic approach when we get to shaping?.

yeah, happy to punt on this for now

I added optional unicode-script support because there are potential users that already depend on that library for scripts.

anthrotype · 2024-03-15T11:13:00Z

fontbe/src/features/kern.rs

+                                (&side1_marks, &side2_marks),
+                            ] {
+                                if !side1.is_empty() && !side2.is_empty() {
+                                    base_pairs.push(Cow::Owned(PairPosEntry::Class(


@cmyr If I followed the logic correctly, shouldn't this base_pairs actually be mark_pairs?

good catch, I have tests for this in #733 but apparently not handling the class case, will fix that now :)

anthrotype

belated LGTM! you did an excellent job porting over to Rust the complicated logic of ufo2ft kernFeatureWriter 💯

Left just a little comment above

cmyr added 2 commits March 13, 2024 13:49

[fontbe] Add properties module and initial helpers

68e49e5

- a method to iterate the scripts + extensions for a codepoint - ability to classifty glyphs into groups based on bidi class or & script

[fontbe] Implement Ot script -> Unicode script mapping

e1c5234

The opentype script tags have a different format from the short names used by unicode.

cmyr force-pushed the split-kern-by-script branch from ba3e461 to da37594 Compare March 13, 2024 17:51

anthrotype reviewed Mar 13, 2024

View reviewed changes

cmyr mentioned this pull request Mar 13, 2024

set lookupflag and mark filtering set for kerning lookups #733

Merged

cmyr added 3 commits March 13, 2024 17:10

[kerning] Add a simple test case for split-by-script

ce7122e

[kerning] Support math script

d4160af

cmyr force-pushed the split-kern-by-script branch from da37594 to d4160af Compare March 13, 2024 21:12

cmyr force-pushed the split-kern-by-script branch from af09f35 to b687af6 Compare March 14, 2024 16:38

[kerning] cleanup and code review

e1e743d

cmyr force-pushed the split-kern-by-script branch from b687af6 to e1e743d Compare March 14, 2024 17:01

dfrg reviewed Mar 14, 2024

View reviewed changes

cmyr added this pull request to the merge queue Mar 14, 2024

Merged via the queue into main with commit 7de98d2 Mar 14, 2024
10 checks passed

cmyr deleted the split-kern-by-script branch March 14, 2024 19:22

anthrotype reviewed Mar 15, 2024

View reviewed changes

cmyr mentioned this pull request Mar 18, 2024

GPOS: feature writers should split lookups based on language system #619

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split kerning by script #731

Split kerning by script #731

cmyr commented Mar 12, 2024

anthrotype left a comment

anthrotype Mar 13, 2024

cmyr Mar 13, 2024

rsheeter Mar 14, 2024

cmyr Mar 14, 2024

rsheeter Mar 14, 2024

madig commented Mar 14, 2024

anthrotype commented Mar 14, 2024 •

edited

Loading

anthrotype commented Mar 14, 2024

cmyr commented Mar 14, 2024

dfrg Mar 14, 2024

dfrg Mar 14, 2024

cmyr Mar 14, 2024

dfrg Mar 14, 2024

cmyr Mar 14, 2024

cmyr Mar 14, 2024

dfrg Mar 14, 2024

anthrotype Mar 15, 2024

cmyr Mar 15, 2024

anthrotype left a comment

Split kerning by script #731

Split kerning by script #731

Conversation

cmyr commented Mar 12, 2024

anthrotype left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

madig commented Mar 14, 2024

anthrotype commented Mar 14, 2024 • edited Loading

anthrotype commented Mar 14, 2024

cmyr commented Mar 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anthrotype left a comment

Choose a reason for hiding this comment

anthrotype commented Mar 14, 2024 •

edited

Loading