Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split kerning by script #731

Merged
merged 6 commits into from
Mar 14, 2024
Merged

Split kerning by script #731

merged 6 commits into from
Mar 14, 2024

Conversation

cmyr
Copy link
Member

@cmyr cmyr commented Mar 12, 2024

This patch is imperfect, but I think it is worth getting something committed that we can iterate from.

- a method to iterate the scripts + extensions for a codepoint
- ability to classifty glyphs into groups based on bidi class or &
  script
The opentype script tags have a different format from the short names
used by unicode.
Copy link
Member

@anthrotype anthrotype left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Colin this is big! I've only started looking at this today, will continue review in the next few days

@@ -52,3 +51,6 @@ members = [
"layout-normalizer",
]

[patch.crates-io]
icu_properties = { version = "1.4", git = "https://github.com/unicode-org/icu4x.git", rev = "728eb44" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they don't publish on crates.io?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they do, but I had to fix a few things to get this to work, and those things aren't published yet. See unicode-org/icu4x#4681

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.rs/icu_properties/latest/icu_properties/, maybe there's an unreleased feature we need? If so we should get them to publish.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment above: I had to PR a few things to icu_properties. They have a release planned, but it's a ways away.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment pointing at unicode-org/icu4x#4681 and noting that we should be able to drop this as of their 1.5 release.

fontbe/src/features/properties.rs Outdated Show resolved Hide resolved
fontbe/src/features/properties.rs Outdated Show resolved Hide resolved
This is very closely based on the code in the KernFeatureWriter, in
ufo2ft. With this patch we very nearly match fontmake's output for
kerning in oswald, with a few lingering differences.

It feels like there is room for polish here, but I also think it's worth
checkpointing here, for further iteration.
@madig
Copy link
Contributor

madig commented Mar 14, 2024

Before looking at this, does this PR incorporate the ideas in #619 (comment)?

@anthrotype
Copy link
Member

anthrotype commented Mar 14, 2024

Before looking at this, does this PR incorporate the ideas in #619 (comment)?

no, I don't think so. This is generating multiple per-script lookups (like ufo2ft currently does), not subtables as Jany proposed there. I think Colin wants to just match fontmake output for now.

@anthrotype
Copy link
Member

@madig also note that Colin's PR does implements the feature which Khaled added recently to ufo2ft whereby kerning lookups get merged into one when cross-script kerning is present, which I think addresses your concerns about Indesign dumb composer.

@cmyr
Copy link
Member Author

cmyr commented Mar 14, 2024

okay so there's one remaining issue here afaik, which rod posted in #733:

#733 (comment)

Comment on lines +22 to +31
pub const COMMON_SCRIPT: UnicodeShortName =
unsafe { UnicodeShortName::from_bytes_unchecked(*b"Zyyy") };
pub const INHERITED_SCRIPT: UnicodeShortName =
unsafe { UnicodeShortName::from_bytes_unchecked(*b"Zinh") };

static SCRIPT_DATA: ScriptWithExtensionsBorrowed<'static> =
icu_properties::script::script_with_extensions();

/// The type used by icu4x for script names
pub type UnicodeShortName = tinystr::TinyAsciiStr<4>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just define a ScriptTag([u8; 4]) type (or just reuse Tag?) and bypass TinyAsciiStr. At least for shaping, we definitely don't want to deal with this intermediate type and I don't think it adds any value.

I'd add a const new([u8; 4]) -> Self constructor for this and a new_checked([u8; 4]) -> Option<Self> constructor that tries to round trip through the name_to_enum_mapper.

This also lets us avoid the unsafe code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking on this a bit more, I don't think the checked constructor is even necessary but fn icu_script(self) -> Option<icu_properties::Script> might be useful and provides the same validation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason to prefer this type is because it is used by icu_propeties, so that we can do things like query the properties of a unicode value, which returns Scripts, and then we can convert those to the unicode names via that crate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at yours, is the unicode-script crate necessary? I think that information is available via icu_properties?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in any case my vote right now is that we punt on this and can figure out a more holistic approach when we get to shaping?.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, happy to punt on this for now

I added optional unicode-script support because there are potential users that already depend on that library for scripts.

@cmyr cmyr added this pull request to the merge queue Mar 14, 2024
Merged via the queue into main with commit 7de98d2 Mar 14, 2024
10 checks passed
@cmyr cmyr deleted the split-kern-by-script branch March 14, 2024 19:22
(&side1_marks, &side2_marks),
] {
if !side1.is_empty() && !side2.is_empty() {
base_pairs.push(Cow::Owned(PairPosEntry::Class(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmyr If I followed the logic correctly, shouldn't this base_pairs actually be mark_pairs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, I have tests for this in #733 but apparently not handling the class case, will fix that now :)

Copy link
Member

@anthrotype anthrotype left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

belated LGTM! you did an excellent job porting over to Rust the complicated logic of ufo2ft kernFeatureWriter 💯

Left just a little comment above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants