-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split kerning by script #731
Conversation
- a method to iterate the scripts + extensions for a codepoint - ability to classifty glyphs into groups based on bidi class or & script
The opentype script tags have a different format from the short names used by unicode.
ba3e461
to
da37594
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks Colin this is big! I've only started looking at this today, will continue review in the next few days
@@ -52,3 +51,6 @@ members = [ | |||
"layout-normalizer", | |||
] | |||
|
|||
[patch.crates-io] | |||
icu_properties = { version = "1.4", git = "https://github.com/unicode-org/icu4x.git", rev = "728eb44" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they don't publish on crates.io?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they do, but I had to fix a few things to get this to work, and those things aren't published yet. See unicode-org/icu4x#4681
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.rs/icu_properties/latest/icu_properties/, maybe there's an unreleased feature we need? If so we should get them to publish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment above: I had to PR a few things to icu_properties. They have a release planned, but it's a ways away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a comment pointing at unicode-org/icu4x#4681 and noting that we should be able to drop this as of their 1.5 release.
This is very closely based on the code in the KernFeatureWriter, in ufo2ft. With this patch we very nearly match fontmake's output for kerning in oswald, with a few lingering differences. It feels like there is room for polish here, but I also think it's worth checkpointing here, for further iteration.
da37594
to
d4160af
Compare
Before looking at this, does this PR incorporate the ideas in #619 (comment)? |
no, I don't think so. This is generating multiple per-script lookups (like ufo2ft currently does), not subtables as Jany proposed there. I think Colin wants to just match fontmake output for now. |
@madig also note that Colin's PR does implements the feature which Khaled added recently to ufo2ft whereby kerning lookups get merged into one when cross-script kerning is present, which I think addresses your concerns about Indesign dumb composer. |
af09f35
to
b687af6
Compare
b687af6
to
e1e743d
Compare
okay so there's one remaining issue here afaik, which rod posted in #733: |
pub const COMMON_SCRIPT: UnicodeShortName = | ||
unsafe { UnicodeShortName::from_bytes_unchecked(*b"Zyyy") }; | ||
pub const INHERITED_SCRIPT: UnicodeShortName = | ||
unsafe { UnicodeShortName::from_bytes_unchecked(*b"Zinh") }; | ||
|
||
static SCRIPT_DATA: ScriptWithExtensionsBorrowed<'static> = | ||
icu_properties::script::script_with_extensions(); | ||
|
||
/// The type used by icu4x for script names | ||
pub type UnicodeShortName = tinystr::TinyAsciiStr<4>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just define a ScriptTag([u8; 4])
type (or just reuse Tag
?) and bypass TinyAsciiStr
. At least for shaping, we definitely don't want to deal with this intermediate type and I don't think it adds any value.
I'd add a const new([u8; 4]) -> Self
constructor for this and a new_checked([u8; 4]) -> Option<Self>
constructor that tries to round trip through the name_to_enum_mapper
.
This also lets us avoid the unsafe code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking on this a bit more, I don't think the checked constructor is even necessary but fn icu_script(self) -> Option<icu_properties::Script>
might be useful and provides the same validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason to prefer this type is because it is used by icu_propeties
, so that we can do things like query the properties of a unicode value, which returns Script
s, and then we can convert those to the unicode names via that crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oof, I just wrote something similar a couple weeks ago: https://github.com/dfrg/fount/blob/731e374eb963d30abb4d0f938bb515cc3de0a823/fontique/src/script.rs 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at yours, is the unicode-script
crate necessary? I think that information is available via icu_properties
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in any case my vote right now is that we punt on this and can figure out a more holistic approach when we get to shaping?.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, happy to punt on this for now
I added optional unicode-script support because there are potential users that already depend on that library for scripts.
(&side1_marks, &side2_marks), | ||
] { | ||
if !side1.is_empty() && !side2.is_empty() { | ||
base_pairs.push(Cow::Owned(PairPosEntry::Class( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmyr If I followed the logic correctly, shouldn't this base_pairs
actually be mark_pairs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, I have tests for this in #733 but apparently not handling the class case, will fix that now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
belated LGTM! you did an excellent job porting over to Rust the complicated logic of ufo2ft kernFeatureWriter 💯
Left just a little comment above
This patch is imperfect, but I think it is worth getting something committed that we can iterate from.