Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel comemo & optimizations #5

Merged
merged 28 commits into from
Dec 15, 2023
Merged

Parallel comemo & optimizations #5

merged 28 commits into from
Dec 15, 2023

Conversation

Dherse
Copy link
Contributor

@Dherse Dherse commented Dec 8, 2023

This PR introduces the following changes:

  • Now works in a multithreaded contexts: all parallel calls to memoized functions race to completion, could add a system for waiting on already "launched" functions using a OnceCell
  • Now works with local caches, that means that all functions now have a static cache defined locally, this involves the following advantages:
    • Caches are smaller, therefore lookups may be faster
    • Caches are typed: removes the Box<dyn Any> that was used before
    • We don't need to hash the TypeId when doing cache lookups
  • Added ImmutableConstraint which involves the following changes:
    • In the event that a #[track]ed struct only has immutable methods, we can just use a HashMap of visited functions which is faster for lookups, also has some niceties to save some work during validation, etc.
  • Optimized Constraint:
    • Now uses a Vec<Call<T>> for storing function calls in order
    • Now uses HashMap<u128, usize> for storing immutable calls since the last mutable call, this allows much faster lookups in the event of a mixed mutable-immutables tracked struct.
  • Made tests run sequentially, otherwise some test would randomly fail due to test_evict running in parallel and evicting the cache of other tests
  • last_was_hit is now behind a feature flag: we don't need it in actual applications and it saves some time, it's thread-local
  • This purposefully does not use DashMap to avoid a bug in Safari. The problem with making two implementations is that Constraint, ImmutableConstraint, and ACCELERATOR would all change leading to large code duplication, but if @laurmaedje wants it implemented this may, I think it would show better performance on "native" targets.
  • Eviction is now done on a per-cache level and caches are registered during lazy evaluation of the caches.
  • Caches now use a hashbrown::HashMap as they can be slightly faster and we already have this dependency in Typst from indexmap.
  • Regression: memoized functions can no longer be generic, I think this could be alleviated using a special memoized_generic that still uses a Box<dyn Any> under the hood. However, generic memoized functions are not part of Typst and therefore I am unsure whether the burden of maintaining two implementations is worth it.
  • Regression: when doing memoized and tracked functions, you can no longer refer to the current type as Self, it has to be the full name of the type. This is very minor and has little to no impact on functionality.
  • IDs are now atomic, they are using the SeqCst ordering, but I wonder whether they could be made with a less restrictive ordering but I am not knowledgeable enough for this.

Overall, and in spite of locking, these changes bring gigantic performance gains in incremental in Typst, in some cases doubling incremental performance.

@Dherse
Copy link
Contributor Author

Dherse commented Dec 9, 2023

So, after much discussion on the Discord, here are some more changes:

  • Accelerators are sharded with one accelerator but alive Tracked (i.e Tracked creates since the last evict call), all supposed "dead" Tracked instances will just not have an accelerator (no panic, no error, just slightly slower)
  • Fixed a bug where there could be an issue with Send + Sync missing on a tracked value
  • Reverted some changes to the original API since they were not needed.
  • Changed atomic ordering to use Acquire and Release ordering for load and store respectively which should potentially help performance.

Overall, this provides a nice performance bump for Typst, being about 1.5x faster in most tests than before these additional changes. Note that this was measured on a currently ongoing rework of Typst to be parallelized so take those numbers with a grain of salt.

@Enter-tainer
Copy link

IDs are now atomic, they are using the SeqCst ordering, but I wonder whether they could be made with a less restrictive ordering but I am not knowledgeable enough for this.

It seems that they are using Acquire/AcqRel/... now. Do they give a big performance gain?

@Dherse
Copy link
Contributor Author

Dherse commented Dec 12, 2023

IDs are now atomic, they are using the SeqCst ordering, but I wonder whether they could be made with a less restrictive ordering but I am not knowledgeable enough for this.

It seems that they are using Acquire/AcqRel/... now. Do they give a big performance gain?

Not really but it doesn't have any negative impact either, do you have any idea on which would be better ?

@Enter-tainer
Copy link

i prefer seqcst because it is very hard to get the correct memory order. if it is wrong, it will be very hard to reproduce and diagnose the bug.

Cargo.toml Outdated
@@ -10,6 +10,16 @@ license = "MIT OR Apache-2.0"
categories = ["caching"]
keywords = ["incremental", "memoization", "tracked", "constraints"]

[features]
default = [ "last_was_hit" ]
last_was_hit = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a more generic debugging/testing feature would be nicer. It should probably also not be enabled by default.

Cargo.toml Outdated Show resolved Hide resolved
macros/src/track.rs Outdated Show resolved Hide resolved
macros/src/track.rs Outdated Show resolved Hide resolved
src/track.rs Outdated Show resolved Hide resolved
src/cache.rs Outdated Show resolved Hide resolved
src/cache.rs Outdated Show resolved Hide resolved
src/cache.rs Outdated Show resolved Hide resolved
src/cache.rs Outdated Show resolved Hide resolved
src/cache.rs Outdated Show resolved Hide resolved
@laurmaedje laurmaedje merged commit ddb3773 into typst:main Dec 15, 2023
1 check passed
@laurmaedje
Copy link
Member

Thank you for the great work here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants