-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try out generating code instead of bincode for glyphsLib data to avoid runtime deserialization #1066
base: main
Are you sure you want to change the base?
Conversation
I wonder if we could just start a job to load the data earlier in execution? It is needed fairly early, but if we triggered the load in another thread immediately on launch it might be ready by the time we need it.. |
I have reservations about this being more complex and more expensive in that we burn cycles pointlessly every run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely agree that we don't want to be spending 10% of our runtime on this, but I wonder if there isn't a slightly easier way to do this (as you mentioned elsewhere) by just directly generating a &'static [GlyphInfo]
; we would then use Cow<'static, [GlyphInfo]>
in the GlyphData
struct so that we were using the static array in the general case, but then it should 'just work'?
basically we can quite easily just swap out the bincode step without needing to make many other changes... I'm now nerdsniped enough I'm going to take a quick look, I think it should be a very small patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so I need to stop ratholing on this for the time being; I had an approach that seems sensible and overall simpler but I'm running into some very funny business where my generated code is hanging fontc?
my main concern with this patch is that it is making a big chunk of changes that seem unrelated to our primary concern, which is to stop parsing an input file in order to generate our initial data. It may be that these other changes make sense, but it makes reasoning about the patch significantly more difficult.
Currently we have a fn load_bundled_data
, which returns Vec<GlyphInfo>
. What if we keep this exact same signature, but with load_bundled_data
just being a codegen'd vec![]
macro of GlyphInfo
structs?
to make this easier (and simply reduce the size in MB of the enormous file we generate) I would add a convenience constructor to GlyphInfo
looking like,
/// Create a new `GlyphInfo` instance from generated code.
///
/// The types and the name of this function are chosen to minimize filesize
/// of the generated code.
#[doc(hidden)]
pub fn bake(
name: &'static str,
category: Category,
subcategory: Subcategory,
unicode: Option<u32>,
production: &'static str,
alt_names: &'static [&'static str],
) -> Self {
let production = if production.is_empty() {
None
} else {
Some(SmolStr::new_static(production))
};
Self {
name: SmolStr::new_static(name),
category,
subcategory,
unicode,
production,
alt_names: alt_names
.into_iter()
.map(|s| SmolStr::new_static(*s))
.collect::<Vec<_>>(),
}
}
and then I would generate a bunch of invocations of this constructor.
This approach isn't perfect; we will be doing some work in this fn, but hopefully it will be significantly less? The existing use of OnceCell means it will only ever be invoked once.
It may be that this solution is inadequate, but it seems to me like it would be worth trying this first, and then if we aren't satisfied we can consider something more invasive?
That's fair, let me think a bit about how to factor into smaller changes after meeting season ends. |
As noted in #774 (comment), loading the glyph data takes a noticeable amount of time, north of 10% on Linux if samply is to be believed. This PR explores generating const arrays instead of bincode we have to deserialize at runtime.
Impact on my Linux box:
The key change is that
update.py
now emits const arrays in Rust code instead of build.rs producing bincode that we load at runtime. You probably don't want to openglyphs-reader/src/glyphdata.rs
in a browser diff view, it's a bit long. Maybe there is a better way?