[read-fonts] tt bytecode decoder #773

dfrg · 2024-02-07T09:17:45Z

Adds an opcode definition, instruction representation and decoder for TrueType bytecode.

Also renames Args -> InlineOperands and moves the type from skrifa to read-fonts.

The bulk of this is the 256 variant Opcode enum + some additional tables of the same size that capture information for each opcode. The actual decoding logic is about 30 lines of code.

Adds an opcode definition, instruction representation and decoder for TrueType bytecode. Also renames Args -> InlineOperands and moves the type from skrifa to read-fonts.

Based on #773, uses the decoder in that PR to handle dispatch to the currently implemented instructions. No tests in this one. Will add in a later PR when more functionality has landed.

cmyr

Looks great, a few little comments inline, mostly about all the fun places this could be unsafe 😈

cmyr · 2024-02-07T21:14:48Z

read-fonts/src/tables/truetype.rs

@@ -0,0 +1,3 @@
+//! TrueType (glyf) common code.


Is this a table? I might make it a submodule of glyf if that is more logical, we do this for a few other tables.

I waffled on this because technically bytecode applies to glyf, fpgm and prep so it is more of a common thing. But I think putting it under glyf makes sense and removes the otherwise empty truetype module.

cmyr · 2024-02-07T21:15:18Z

read-fonts/src/tables/truetype/bytecode/decode.rs

+/// An error returned by [`Decoder::decode`] if the end of the bytecode
+/// stream is reached unexpectedly.
+#[derive(Copy, Clone, Debug)]
+pub struct DecodeError(());


Suggested change

pub struct DecodeError(());

pub struct DecodeError;

cmyr · 2024-02-07T21:16:10Z

read-fonts/src/tables/truetype/bytecode/decode.rs

+    /// The bytecode for the program.
+    pub bytecode: &'a [u8],
+    /// The "program counter" or current offset into the bytecode.
+    pub pc: usize,


is pub necessary here?

The control flow instructions will need to read/write pc. Calls need to replace bytecode if the target function is in a different program. I could probably add methods for this but that just seems like extra complexity for what essentially becomes full access anyway 🤷🏻‍♂️

cmyr · 2024-02-07T21:19:47Z

read-fonts/src/tables/truetype/bytecode/decode.rs

+        // <https://gitlab.freedesktop.org/freetype/freetype/-/blob/57617782464411201ce7bbc93b086c1b4d7d84a5/src/truetype/ttinterp.c#L7046>
+        if opcode_len < 0 {
+            let inline_count = *self.bytecode.get(self.pc + 1).ok_or(DecodeError(()))?;
+            opcode_len = -opcode_len * inline_count as i32 + 2;


i would be slightly less confused by opcode_len.abs() but nbd

Good call! This was initially 2 - opcode_len * next_byte which was even more confusing.

cmyr · 2024-02-07T21:30:27Z

read-fonts/src/tables/truetype/bytecode/opcode.rs

+    /// There is a 1:1 mapping between bytes and opcodes.
+    #[inline]
+    pub fn from_byte(byte: u8) -> Self {
+        OPCODE_FROM_BYTE[byte as usize]


given that each possible u8 value is a valid discriminant for our enum, we could simplify this, at the cost of scaring rod, and just write

unsafe { std:mem::transmute(byte) }

I believe we would also need to mark the enum as #[repr(transparent)].

Depending on how smart the compiler is being, this could also end up eliding a bounds check on each call, which could be meaningful.

Alternatively we could also skip that possible bounds check with unsafe { OPCODE_FROM_BYTE.get_unchecked(&byte) }.

I agree this one isn't so scary since clearly all values are valid. Despite that I suggest we start safe and only add scary things once a perf profile shows it's valuable.

I’m fairly certain llvm will elide the bounds checks because it knows the size of the array and the range of the index value.

That leaves the memory read. The optimizer certainly has enough information to avoid this as well but I’d be surprised if it does. I’ll test this later. Even if not, this array will be hot in L1 so I don’t see it being a perf factor.

I plan to do some benchmarking and profiling on this at some point so we can revisit then.

cmyr · 2024-02-07T21:32:29Z

read-fonts/src/tables/truetype/bytecode/opcode.rs

+    }
+
+    pub(super) fn len(self) -> i32 {
+        OPCODE_LENGTHS[self as usize] as i32


worth profiling but if we're really concerned about perf and this is called hot this would be a justifiable place to use get_unchecked and skip a bounds check.

dfrg · 2024-02-08T02:03:38Z

Thanks for the reviews. I’m going to merge this and make the requested changes in #774 since it’s closely related and will avoid rebase nonsense.

- move bytecode to glyf module - replace negation with abs() - make DecodeError a unit-like struct In addition, replace byte to Opcode table with a match expression. This codegens the same as transmute!

Based on #773, uses the decoder in that PR to handle dispatch to the currently implemented instructions. No tests in this one. Will add in a later PR when more functionality has landed. * review feedback - add limit for total number of instructions executed (same as FT) - don't return count from Engine::run() * review changes from #773 - move bytecode to glyf module - replace negation with abs() - make DecodeError a unit-like struct In addition, replace byte to Opcode table with a match expression.

[read-fonts] tt bytecode decoder

9839450

Adds an opcode definition, instruction representation and decoder for TrueType bytecode. Also renames Args -> InlineOperands and moves the type from skrifa to read-fonts.

dfrg mentioned this pull request Feb 7, 2024

[skrifa] tthint: instruction dispatch #774

Merged

cmyr approved these changes Feb 7, 2024

View reviewed changes

rsheeter approved these changes Feb 7, 2024

View reviewed changes

dfrg merged commit 60298ad into main Feb 8, 2024
9 checks passed

dfrg deleted the tthint-bytecode branch February 8, 2024 02:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[read-fonts] tt bytecode decoder #773

[read-fonts] tt bytecode decoder #773

dfrg commented Feb 7, 2024

cmyr left a comment

cmyr Feb 7, 2024

dfrg Feb 8, 2024

cmyr Feb 7, 2024

cmyr Feb 7, 2024

dfrg Feb 8, 2024

cmyr Feb 7, 2024

dfrg Feb 8, 2024

cmyr Feb 7, 2024

rsheeter Feb 7, 2024

dfrg Feb 8, 2024

cmyr Feb 7, 2024

dfrg commented Feb 8, 2024

[read-fonts] tt bytecode decoder #773

[read-fonts] tt bytecode decoder #773

Conversation

dfrg commented Feb 7, 2024

cmyr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dfrg commented Feb 8, 2024