Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serokell: [Milestone-1] New PersistentOrderedMap.mo lib #654

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

GoPavel
Copy link

@GoPavel GoPavel commented Sep 5, 2024

This is an MR for the 1st Milestone of Serokell's grant about improving Motoko's base library.

Within it, we are introducing a new functional implementation of the map data structure based on the current RBTree.mo.

Content of the MR:

  • extract the implementation from the RBTree.mo
  • change API to keep it convenient to store the map in the stable variable
  • add missing functionality
    • map
    • foldLeft, foldRight
    • mapFilter
    • keys, vals
  • optimize tree representation via elimination of option type ?V -> V
  • unit tests
  • property-based tests
  • documentation
    • check typos again
    • add examples

@sa-github-api
Copy link

Dear @GoPavel,

In order to potentially merge your code in this open-source repository and therefore proceed with your contribution, we need to have your approval on DFINITY's CLA.

If you decide to agree with it, please visit this issue and read the instructions there. Once you have signed it, re-trigger the workflow on this PR to see if your code can be merged.

— The DFINITY Foundation

@GoPavel GoPavel changed the title [Draft] Serokell: New PersistentMap.mo lib [Draft] Serokell: New PersistentOrderedMap.mo lib Sep 30, 2024
@GoPavel GoPavel changed the title [Draft] Serokell: New PersistentOrderedMap.mo lib Serokell: New PersistentOrderedMap.mo lib Oct 2, 2024
src/PersistentOrderedMap.mo Outdated Show resolved Hide resolved
Copy link
Contributor

@crusso crusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some interim comments (more to come, just submitting because I expect GH might lost my review soon)

src/PersistentOrderedMap.mo Outdated Show resolved Hide resolved
src/PersistentOrderedMap.mo Outdated Show resolved Hide resolved
/// with `Map` to maintain invariant that comparator did not changed.
///
/// `MapOps` contains methods that require `compare` internally:
/// operations that may reshape a `Map` or should find something.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// operations that may reshape a `Map` or should find something.
/// operations that update, traverse or search a map.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think traverse doesn't require comp (since we keep tree structure) so it's outside of the MapOps.
So I think it should be

Suggested change
/// operations that may reshape a `Map` or should find something.
/// operations that update or search a map.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's a UX question - should we put all operations in MapOps, even if they don't require the comparison, just for uniformity? I guess we could also do that later easily enough. Something to think about.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we put all operations in MapOps

I also thought about this. We were going to measure its overhead on a benchmark and then decide. If there is no difference I would also put it there.

However for now some modules like RBTree or TrieMap provide some functionality as a module-function, not as a member-function e.g. map or iter. So we thought that the current solution is not very inconsistent.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did performance experiments:

  1. Comparing inlined internal functions and the current version (all calls through MapOps): [DMS-68] Inline internal functions into MapOps object serokell/motoko-base#21
  2. Comparing calling methods through MapOps and through the Internal module passing comp manually: [DMS-68] Compare function calls from MapOps with direct function call serokell/motoko-base#20

The 1st experiment shows that the inlined version has more or less the same performance (<1% difference).
The 2nd one shows that a static function call is faster than a similar one from MapOps on 1-4% percent.

Thus we can say that static calls are less convenient but faster, so probably it makes sense to keep the option ?
We can add the rest of the functions into MapOps and make the Internal module public.
OR keep the current version as compromised

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess we decide to put all operations in MapOps and leave Internal private for now at a 1-4% perf hit.
Does the perf hit disappear entirely if we inline the Internal functions in MapOps, or am I misunderstanding?

src/PersistentOrderedMap.mo Outdated Show resolved Hide resolved
Copy link
Contributor

@crusso crusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more comments...

src/PersistentOrderedMap.mo Show resolved Hide resolved
src/PersistentOrderedMap.mo Outdated Show resolved Hide resolved
src/PersistentOrderedMap.mo Outdated Show resolved Hide resolved
@GoPavel GoPavel changed the title Serokell: New PersistentOrderedMap.mo lib Serokell: [Milestone-1] New PersistentOrderedMap.mo lib Oct 14, 2024
s-and-witch and others added 10 commits October 21, 2024 17:31
No changes in logic so far, just simple refactoring
Add `MapOps` class with the following signature:

  public class MapOps<K>(compare : (K,K) -> O.Order) {

    public func put<V>(rbMap : Map<K, V>, key : K, value : V) : Map<K, V>

    public func fromIter<V>(i : I.Iter<(K,V)>) : Map<K, V>

    public func replace<V>(rbMap : Map<K, V>, key : K, value : V) : (Map<K,V>, ?V)

    public func mapFilter<V1, V2>(f : (K, V1) -> ?V2, rbMap : Map<K, V1>) : Map<K, V2>

    public func get<V>(key : K, rbMap : Map<K, V>) : ?V

    public func delete<V>(rbMap : Map<K, V>, key : K) : Map<K, V>

    public func remove< V>(rbMap : Map<K, V>, key : K) : (Map<K,V>, ?V)

  };

The other functionality provided as standalone functions, as they
don't require comparator:

  public type Direction = { #fwd; #bwd };

  public func iter<K, V>(rbMap : Map<K, V>, direction : Direction) : I.Iter<(K, V)>

  public func entries<K, V>(m : Map<K, V>) : I.Iter<(K, V)>

  public func keys<K, V>(m : Map<K, V>, direction : Direction) : I.Iter<K>

  public func vals<K, V>(m : Map<K, V>, direction : Direction) : I.Iter<V>

  public func map<K, V1, V2>(f : (K, V1) -> V2, rbMap : Map<K, V1>) : Map<K, V2>

  public func size<K, V>(t : Map<K, V>) : Nat

  public func foldLeft<Key, Value, Accum>(
    combine : (Key, Value, Accum) -> Accum,
    base : Accum,
    rbMap : Map<Key, Value>
  ) : Accum

  And foldRight with the same signature as foldLeft

The following functions are new for the API:
- MapOps.put, MapOps.delete
- MapOps.fromIter, entries, keys, vals
- MapOps.mapFilter, map
- foldLeft, foldRight
Problem: now order is not consistent within new module and with old
modules as well.

Solution: make the map argument always go first
In addition to tests this patch removes `direction`
argument from `keys` and `values` function to keep
them simple and provides a new function `Map.empty`
to create a map without knowing its internal representation.
* rename `rbMap` into `m` in signature for brevity & consistent language
* rename `rbMap` into `map` in examples for brevity & encapsulation sake
* rename `tree` into `map` in doc comments for the encapsulation sake
@GoPavel
Copy link
Author

GoPavel commented Oct 21, 2024

@crusso
I've rebased the branch and fixed the name rbMap in doc comments and signatures.
Please resolve conversations that you think should be resolved, so we will see if something left about his PR.
Also, I would ask you to approve the CI workflow since now I see "This workflow is awaiting approval from a maintainer in" here

Copy link
Contributor

@crusso crusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments

/// with `Map` to maintain invariant that comparator did not changed.
///
/// `MapOps` contains methods that require `compare` internally:
/// operations that may reshape a `Map` or should find something.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess we decide to put all operations in MapOps and leave Internal private for now at a 1-4% perf hit.
Does the perf hit disappear entirely if we inline the Internal functions in MapOps, or am I misunderstanding?

test/PersistentOrderedMap.prop.test.mo Outdated Show resolved Hide resolved
@GoPavel
Copy link
Author

GoPavel commented Oct 24, 2024

So I guess we decide to put all operations in MapOps and leave Internal private for now at a 1-4% perf hit.
Does the perf hit disappear entirely if we inline the Internal functions in MapOps, or am I misunderstanding?

We have tried it, and it doesn't affect the benchmark at all. See serokell#21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants