Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speeding up Array Consistency Check by using sorted int-int AVL Tree #55

Closed
wants to merge 5 commits into from

Conversation

baldawar
Copy link
Collaborator

Issue #, if available: N/A

Description of changes:

This a "I'm throwing this out for everyone else and see if it sticks" kind of PR.

While digging around for bottlenecks for #54, I realized that we had a home-grown Int-Int map and its showing up as a bottleneck (for past context, we used to have a HashMap<Integer, Integer> there and moving to Int-Int map helped improved performance from 1735.6 events/sec to 2645.0 events/sec. The code has since improved but isn't at the same level as rest of the matchers.

Moving to using a AVL Tree for checking array membership leads to a significant boost from 2821.3 events/sec to 5275.7 events.sec. We have the option to gain some more performance if we stop the cloning behaviour in ArrayMembership(...) but I've left that as a future exercise (will create a issue for it if this gets merged).

The biggest downside is that its introducing a brand new dependency with lots of classes / structures that we aren't really using. There may be some use for bi-directional iterators or Int2Object Maps in future, but I'm unclear on the benefits for them. so for now, I'd like everyone to assume that we take this new dependency on fastutils-core, it'll be purely for the performance gains for array consistency.

We can also explore modifying our own IntIntMap, but looking a how fastutils-core did it, lots of complex edge-cases to solve.

Note: Apologies for any typos, I sent this PR out between few meetings

Benchmark / Performance (for source code changes):

Before

Reading citylots2
Read 213068 events
EXACT events/sec: 193346.6
WILDCARD events/sec: 123374.6
PREFIX events/sec: 189225.6
SUFFIX events/sec: 213068.0
EQUALS_IGNORE_CASE events/sec: 160322.0
NUMERIC events/sec: 111670.9
ANYTHING-BUT events/sec: 123517.7
**COMPLEX_ARRAYS events/sec: 2821.3
PARTIAL_COMBO events/sec: 74028.2
COMBO events/sec: 2517.7**
Reading citylots2
Read 213068 events
Finding Rules...
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Lines: 213068, Msec: 13478
Events/sec: 15808.6
 Rules/sec: 110660.0
Before: 1767.1
After: 1434.2
Per rule: 832
Turning JSON into field-lists...
Finding Rules...
Lines: 213068, Msec: 2693
Events/sec: 79119.2
Before: 1919.4
After: 1362.7
Per rule: 1391
Reading lines...
Finding Rules...
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Lines: 213068, Msec: 1632
Events/sec: 130556.4
 Rules/sec: 475486308.8
Reading citylots2
Read 213068 events
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Matched: 52527
Lines: 213068, Msec: 12118
Events/sec: 17582.8
Reading lines...
Finding Rules...
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Lines: 213068, Msec: 11923
Events/sec: 17870.3
 Rules/sec: 125092.3
DEEP EXACT events/sec: 11111.1

After

Reading citylots2
Read 213068 events
EXACT events/sec: 182109.4
WILDCARD events/sec: 140084.2
PREFIX events/sec: 198572.2
SUFFIX events/sec: 212430.7
EQUALS_IGNORE_CASE events/sec: 171414.3
NUMERIC events/sec: 102584.5
ANYTHING-BUT events/sec: 127053.1
**COMPLEX_ARRAYS events/sec: 5275.7
PARTIAL_COMBO events/sec: 89675.1
COMBO events/sec: 2727.0**
Reading citylots2
Read 213068 events
Finding Rules...
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Lines: 213068, Msec: 13338
Events/sec: 15974.5
 Rules/sec: 111821.6
Before: 1648.6
After: 1468.3
Per rule: 450
Turning JSON into field-lists...
Finding Rules...
Lines: 213068, Msec: 3110
Events/sec: 68510.6
Before: 2158.4
After: 1350.1
Per rule: 2020
Reading lines...
Finding Rules...
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Lines: 213068, Msec: 1528
Events/sec: 139442.4
 Rules/sec: 507849251.3
Reading citylots2
Read 213068 events
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Matched: 52527
Lines: 213068, Msec: 10525
Events/sec: 20244.0
Reading lines...
Finding Rules...
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Lines: 213068, Msec: 10936
Events/sec: 19483.2
 Rules/sec: 136382.2
DEEP EXACT events/sec: 8333.3


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

This leads to around 1.5x performance gains for array matching
`COMPLEX_ARRAYS events/sec: 4401.5` vs `COMPLEX_ARRAYS events/sec: 5275.7`.
In aggregate, we see benchmark tests complete within 3 min 40  seconds on
my mac pro 2019 w intel i7 compared to 4 mins and 30 seconds before.

The big downside of this commit is that we're introducing a new depenedency
with lots of new classes / structures (https://fastutil.di.unimi.it/docs/)
when we really just need a sorted AVL Int2Int Map.

Benchmark output

Before

```
Reading citylots2
Read 213068 events
EXACT events/sec: 193346.6
WILDCARD events/sec: 123374.6
PREFIX events/sec: 189225.6
SUFFIX events/sec: 213068.0
EQUALS_IGNORE_CASE events/sec: 160322.0
NUMERIC events/sec: 111670.9
ANYTHING-BUT events/sec: 123517.7
COMPLEX_ARRAYS events/sec: 4401.5
PARTIAL_COMBO events/sec: 94028.2
COMBO events/sec: 2517.7
Reading citylots2
Read 213068 events
Finding Rules...
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Lines: 213068, Msec: 13478
Events/sec: 15808.6
 Rules/sec: 110660.0
Before: 1767.1
After: 1434.2
Per rule: 832
Turning JSON into field-lists...
Finding Rules...
Lines: 213068, Msec: 2693
Events/sec: 79119.2
Before: 1919.4
After: 1362.7
Per rule: 1391
Reading lines...
Finding Rules...
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Lines: 213068, Msec: 1632
Events/sec: 130556.4
 Rules/sec: 475486308.8
Reading citylots2
Read 213068 events
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Matched: 52527
Lines: 213068, Msec: 12118
Events/sec: 17582.8
Reading lines...
Finding Rules...
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Lines: 213068, Msec: 11923
Events/sec: 17870.3
 Rules/sec: 125092.3
DEEP EXACT events/sec: 11111.1
```

After
```
Reading citylots2
Read 213068 events
EXACT events/sec: 182109.4
WILDCARD events/sec: 140084.2
PREFIX events/sec: 198572.2
SUFFIX events/sec: 212430.7
EQUALS_IGNORE_CASE events/sec: 171414.3
NUMERIC events/sec: 102584.5
ANYTHING-BUT events/sec: 127053.1
COMPLEX_ARRAYS events/sec: 5275.7
PARTIAL_COMBO events/sec: 89675.1
COMBO events/sec: 2727.0
Reading citylots2
Read 213068 events
Finding Rules...
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Lines: 213068, Msec: 13338
Events/sec: 15974.5
 Rules/sec: 111821.6
Before: 1648.6
After: 1468.3
Per rule: 450
Turning JSON into field-lists...
Finding Rules...
Lines: 213068, Msec: 3110
Events/sec: 68510.6
Before: 2158.4
After: 1350.1
Per rule: 2020
Reading lines...
Finding Rules...
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Lines: 213068, Msec: 1528
Events/sec: 139442.4
 Rules/sec: 507849251.3
Reading citylots2
Read 213068 events
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Matched: 52527
Lines: 213068, Msec: 10525
Events/sec: 20244.0
Reading lines...
Finding Rules...
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Lines: 213068, Msec: 10936
Events/sec: 19483.2
 Rules/sec: 136382.2
DEEP EXACT events/sec: 8333.3
```
@timbray
Copy link
Collaborator

timbray commented Nov 10, 2022

I love this direction, that IntInt map is a code smell. Note that Quamina has a really different simplest-thing-that-could-possibly-work approach, and that code never shows up in any profile output that I've seen. Hmm, I could try jamming a map[int]int in there and see if it makes a difference…

Quamina: https://github.com/timbray/quamina/blob/main/core_matcher.go#L243

@baldawar
Copy link
Collaborator Author

I was wondering if something like https://github.com/timbray/quamina/blob/main/core_matcher.go#L243 would work instead of IntIntMap but wasn't sure if there's some gremlins that I haven't considered yet. I'll try to give it a shot and if it works, update this PR

This leads to a noticable improvement from ~5K Events/sec to ~5.4K Events/sec.
@baldawar
Copy link
Collaborator Author

Looks like sorted tree performs better by small margin. Around 5.1K events/sec vs 5.4K events/sec (ran tests 4 times alternatively to make sure it wasn't just the laptop). The difference is large if I ignore the bottleneck because of the ArrayMembership(...) constructor (6.4K events/sec vs 10.2K events/sec)

@timbray
Copy link
Collaborator

timbray commented Nov 14, 2022

Around 5.1K events/sec vs 5.4K events/sec (ran tests 4 times alternatively to make sure it wasn't just the laptop).

Those numbers, 5.1K and 5.4K etc, which benchmark is that?

@@ -98,6 +99,11 @@
<scope>test</scope>
</dependency>

<dependency>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a standard about adopting dependencies? I remember doing quite a bit of work inside AWS to convince myself/others that a dependency was acceptable. Not saying anything about this one, never heard of it before, but have we done any investigation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a standard today. Created #57 for it.

On this library, I looked up this library based on this comment on the existing IntIntMap class https://github.com/aws/event-ruler/blob/main/src/main/software/amazon/event/ruler/IntIntMap.java#L12. The library was in active development and had good documentation. I see it used in some performance sensitive libraries so I gave it a try for Ruler.

@@ -122,8 +122,8 @@ public void WHEN_EventIsConstructed_THEN_HeterogeneousArraysAreHandled() throws
"lines.points", "lines.points", "lines.points", "lines.points", "lines.points.pp", "lines.points.pp"
};
String[] wantedArrayMemberships = {
"0[0] 2[0] 1[0] ", "0[0] 2[0] 1[0] ", "0[0] 2[1] 1[0] ", "0[0] 2[1] 1[0] ", "0[0] 2[0] 3[2] 1[0] ",
"0[0] 4[0] 2[1] 1[0] "
"0[0] 1[0] 2[0] ", "0[0] 1[0] 2[0] ", "0[0] 1[0] 2[1] ", "0[0] 1[0] 2[1] ", "0[0] 1[0] 2[0] 3[2] ",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why changing this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is compared against the int-int map within ArrayMembership class https://github.com/aws/event-ruler/blob/main/src/test/software/amazon/event/ruler/EventTest.java#L134 . When I changed the internal map to a sorter AVL tree, the tests we failing here because the elements were out-of-order.

@baldawar
Copy link
Collaborator Author

Around 5.1K events/sec vs 5.4K events/sec (ran tests 4 times alternatively to make sure it wasn't just the laptop).

Those numbers, 5.1K and 5.4K etc, which benchmark is that?

https://github.com/aws/event-ruler/blob/main/src/test/software/amazon/event/ruler/Benchmarks.java#L524

Rules are at https://github.com/aws/event-ruler/blob/main/src/test/software/amazon/event/ruler/Benchmarks.java#L189

@timbray
Copy link
Collaborator

timbray commented Nov 14, 2022

Here's my feeling: Adding a dependency to a super-low-level library like Ruler is pretty expensive, both in terms of the obvious stuff like code size, but in terms of governance, defending yourself against vulnerabilities in the dependency tree, and so on. Also, the case where you have lots and lots of arrays and this gets expensive is not that common.

So, my prejudice is would probably be against adopting this library, even though I admit that I haven't looked at it in depth. What the array-consistency check is trying to do is really pretty simple.

But I'm not going to lie down in the road over this one, if you guys think this is a worthwhile trade-off I can live with that.

@baldawar
Copy link
Collaborator Author

FWIW, I was on the fence with taking the dependency as well, so don't mind parking this change until we can find an alternate implementation. Replacing the current map based check altogether is more complex, so needs more bandwidth. That PR will take sometime.

FastUtil does seem to be a popular library https://github.com/vigna/fastutil but it's been challenging to discover how useful it is outside of Amazon. FasterXML/jackson-datatypes-collections#38 was one interesting discussion that implied eventually we might get it through our jackson dependency. Internally, I can see it be used across some performance sensitive places but none that I can easily elaborate on here. Still feels a lot of structures / classes to pull in for just one Int2IntAVLTreeMap.

@baldawar baldawar closed this Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants