-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speeding up Array Consistency Check by using sorted int-int AVL Tree #55
Conversation
This leads to around 1.5x performance gains for array matching `COMPLEX_ARRAYS events/sec: 4401.5` vs `COMPLEX_ARRAYS events/sec: 5275.7`. In aggregate, we see benchmark tests complete within 3 min 40 seconds on my mac pro 2019 w intel i7 compared to 4 mins and 30 seconds before. The big downside of this commit is that we're introducing a new depenedency with lots of new classes / structures (https://fastutil.di.unimi.it/docs/) when we really just need a sorted AVL Int2Int Map. Benchmark output Before ``` Reading citylots2 Read 213068 events EXACT events/sec: 193346.6 WILDCARD events/sec: 123374.6 PREFIX events/sec: 189225.6 SUFFIX events/sec: 213068.0 EQUALS_IGNORE_CASE events/sec: 160322.0 NUMERIC events/sec: 111670.9 ANYTHING-BUT events/sec: 123517.7 COMPLEX_ARRAYS events/sec: 4401.5 PARTIAL_COMBO events/sec: 94028.2 COMBO events/sec: 2517.7 Reading citylots2 Read 213068 events Finding Rules... Lots: 10000 Lots: 20000 Lots: 30000 Lots: 40000 Lots: 50000 Lots: 60000 Lots: 70000 Lots: 80000 Lots: 90000 Lots: 100000 Lots: 110000 Lots: 120000 Lots: 130000 Lots: 140000 Lots: 150000 Lots: 160000 Lots: 170000 Lots: 180000 Lots: 190000 Lots: 200000 Lots: 210000 Lines: 213068, Msec: 13478 Events/sec: 15808.6 Rules/sec: 110660.0 Before: 1767.1 After: 1434.2 Per rule: 832 Turning JSON into field-lists... Finding Rules... Lines: 213068, Msec: 2693 Events/sec: 79119.2 Before: 1919.4 After: 1362.7 Per rule: 1391 Reading lines... Finding Rules... Lots: 10000 Lots: 20000 Lots: 30000 Lots: 40000 Lots: 50000 Lots: 60000 Lots: 70000 Lots: 80000 Lots: 90000 Lots: 100000 Lots: 110000 Lots: 120000 Lots: 130000 Lots: 140000 Lots: 150000 Lots: 160000 Lots: 170000 Lots: 180000 Lots: 190000 Lots: 200000 Lots: 210000 Lines: 213068, Msec: 1632 Events/sec: 130556.4 Rules/sec: 475486308.8 Reading citylots2 Read 213068 events Lots: 10000 Lots: 20000 Lots: 30000 Lots: 40000 Lots: 50000 Lots: 60000 Lots: 70000 Lots: 80000 Lots: 90000 Lots: 100000 Lots: 110000 Lots: 120000 Lots: 130000 Lots: 140000 Lots: 150000 Lots: 160000 Lots: 170000 Lots: 180000 Lots: 190000 Lots: 200000 Lots: 210000 Matched: 52527 Lines: 213068, Msec: 12118 Events/sec: 17582.8 Reading lines... Finding Rules... Lots: 10000 Lots: 20000 Lots: 30000 Lots: 40000 Lots: 50000 Lots: 60000 Lots: 70000 Lots: 80000 Lots: 90000 Lots: 100000 Lots: 110000 Lots: 120000 Lots: 130000 Lots: 140000 Lots: 150000 Lots: 160000 Lots: 170000 Lots: 180000 Lots: 190000 Lots: 200000 Lots: 210000 Lines: 213068, Msec: 11923 Events/sec: 17870.3 Rules/sec: 125092.3 DEEP EXACT events/sec: 11111.1 ``` After ``` Reading citylots2 Read 213068 events EXACT events/sec: 182109.4 WILDCARD events/sec: 140084.2 PREFIX events/sec: 198572.2 SUFFIX events/sec: 212430.7 EQUALS_IGNORE_CASE events/sec: 171414.3 NUMERIC events/sec: 102584.5 ANYTHING-BUT events/sec: 127053.1 COMPLEX_ARRAYS events/sec: 5275.7 PARTIAL_COMBO events/sec: 89675.1 COMBO events/sec: 2727.0 Reading citylots2 Read 213068 events Finding Rules... Lots: 10000 Lots: 20000 Lots: 30000 Lots: 40000 Lots: 50000 Lots: 60000 Lots: 70000 Lots: 80000 Lots: 90000 Lots: 100000 Lots: 110000 Lots: 120000 Lots: 130000 Lots: 140000 Lots: 150000 Lots: 160000 Lots: 170000 Lots: 180000 Lots: 190000 Lots: 200000 Lots: 210000 Lines: 213068, Msec: 13338 Events/sec: 15974.5 Rules/sec: 111821.6 Before: 1648.6 After: 1468.3 Per rule: 450 Turning JSON into field-lists... Finding Rules... Lines: 213068, Msec: 3110 Events/sec: 68510.6 Before: 2158.4 After: 1350.1 Per rule: 2020 Reading lines... Finding Rules... Lots: 10000 Lots: 20000 Lots: 30000 Lots: 40000 Lots: 50000 Lots: 60000 Lots: 70000 Lots: 80000 Lots: 90000 Lots: 100000 Lots: 110000 Lots: 120000 Lots: 130000 Lots: 140000 Lots: 150000 Lots: 160000 Lots: 170000 Lots: 180000 Lots: 190000 Lots: 200000 Lots: 210000 Lines: 213068, Msec: 1528 Events/sec: 139442.4 Rules/sec: 507849251.3 Reading citylots2 Read 213068 events Lots: 10000 Lots: 20000 Lots: 30000 Lots: 40000 Lots: 50000 Lots: 60000 Lots: 70000 Lots: 80000 Lots: 90000 Lots: 100000 Lots: 110000 Lots: 120000 Lots: 130000 Lots: 140000 Lots: 150000 Lots: 160000 Lots: 170000 Lots: 180000 Lots: 190000 Lots: 200000 Lots: 210000 Matched: 52527 Lines: 213068, Msec: 10525 Events/sec: 20244.0 Reading lines... Finding Rules... Lots: 10000 Lots: 20000 Lots: 30000 Lots: 40000 Lots: 50000 Lots: 60000 Lots: 70000 Lots: 80000 Lots: 90000 Lots: 100000 Lots: 110000 Lots: 120000 Lots: 130000 Lots: 140000 Lots: 150000 Lots: 160000 Lots: 170000 Lots: 180000 Lots: 190000 Lots: 200000 Lots: 210000 Lines: 213068, Msec: 10936 Events/sec: 19483.2 Rules/sec: 136382.2 DEEP EXACT events/sec: 8333.3 ```
I love this direction, that IntInt map is a code smell. Note that Quamina has a really different simplest-thing-that-could-possibly-work approach, and that code never shows up in any profile output that I've seen. Hmm, I could try jamming a map[int]int in there and see if it makes a difference… Quamina: https://github.com/timbray/quamina/blob/main/core_matcher.go#L243 |
I was wondering if something like https://github.com/timbray/quamina/blob/main/core_matcher.go#L243 would work instead of IntIntMap but wasn't sure if there's some gremlins that I haven't considered yet. I'll try to give it a shot and if it works, update this PR |
This leads to a noticable improvement from ~5K Events/sec to ~5.4K Events/sec.
Looks like sorted tree performs better by small margin. Around 5.1K events/sec vs 5.4K events/sec (ran tests 4 times alternatively to make sure it wasn't just the laptop). The difference is large if I ignore the bottleneck because of the |
Those numbers, 5.1K and 5.4K etc, which benchmark is that? |
@@ -98,6 +99,11 @@ | |||
<scope>test</scope> | |||
</dependency> | |||
|
|||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a standard about adopting dependencies? I remember doing quite a bit of work inside AWS to convince myself/others that a dependency was acceptable. Not saying anything about this one, never heard of it before, but have we done any investigation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a standard today. Created #57 for it.
On this library, I looked up this library based on this comment on the existing IntIntMap class https://github.com/aws/event-ruler/blob/main/src/main/software/amazon/event/ruler/IntIntMap.java#L12. The library was in active development and had good documentation. I see it used in some performance sensitive libraries so I gave it a try for Ruler.
@@ -122,8 +122,8 @@ public void WHEN_EventIsConstructed_THEN_HeterogeneousArraysAreHandled() throws | |||
"lines.points", "lines.points", "lines.points", "lines.points", "lines.points.pp", "lines.points.pp" | |||
}; | |||
String[] wantedArrayMemberships = { | |||
"0[0] 2[0] 1[0] ", "0[0] 2[0] 1[0] ", "0[0] 2[1] 1[0] ", "0[0] 2[1] 1[0] ", "0[0] 2[0] 3[2] 1[0] ", | |||
"0[0] 4[0] 2[1] 1[0] " | |||
"0[0] 1[0] 2[0] ", "0[0] 1[0] 2[0] ", "0[0] 1[0] 2[1] ", "0[0] 1[0] 2[1] ", "0[0] 1[0] 2[0] 3[2] ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why changing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is compared against the int-int map within ArrayMembership class https://github.com/aws/event-ruler/blob/main/src/test/software/amazon/event/ruler/EventTest.java#L134 . When I changed the internal map to a sorter AVL tree, the tests we failing here because the elements were out-of-order.
Rules are at https://github.com/aws/event-ruler/blob/main/src/test/software/amazon/event/ruler/Benchmarks.java#L189 |
Here's my feeling: Adding a dependency to a super-low-level library like Ruler is pretty expensive, both in terms of the obvious stuff like code size, but in terms of governance, defending yourself against vulnerabilities in the dependency tree, and so on. Also, the case where you have lots and lots of arrays and this gets expensive is not that common. So, my prejudice is would probably be against adopting this library, even though I admit that I haven't looked at it in depth. What the array-consistency check is trying to do is really pretty simple. But I'm not going to lie down in the road over this one, if you guys think this is a worthwhile trade-off I can live with that. |
FWIW, I was on the fence with taking the dependency as well, so don't mind parking this change until we can find an alternate implementation. Replacing the current map based check altogether is more complex, so needs more bandwidth. That PR will take sometime. FastUtil does seem to be a popular library https://github.com/vigna/fastutil but it's been challenging to discover how useful it is outside of Amazon. FasterXML/jackson-datatypes-collections#38 was one interesting discussion that implied eventually we might get it through our jackson dependency. Internally, I can see it be used across some performance sensitive places but none that I can easily elaborate on here. Still feels a lot of structures / classes to pull in for just one Int2IntAVLTreeMap. |
Issue #, if available: N/A
Description of changes:
This a "I'm throwing this out for everyone else and see if it sticks" kind of PR.
While digging around for bottlenecks for #54, I realized that we had a home-grown Int-Int map and its showing up as a bottleneck (for past context, we used to have a HashMap<Integer, Integer> there and moving to Int-Int map helped improved performance from 1735.6 events/sec to 2645.0 events/sec. The code has since improved but isn't at the same level as rest of the matchers.
Moving to using a AVL Tree for checking array membership leads to a significant boost from
2821.3 events/sec
to5275.7 events.sec
. We have the option to gain some more performance if we stop the cloning behaviour inArrayMembership(...)
but I've left that as a future exercise (will create a issue for it if this gets merged).The biggest downside is that its introducing a brand new dependency with lots of classes / structures that we aren't really using. There may be some use for bi-directional iterators or Int2Object Maps in future, but I'm unclear on the benefits for them. so for now, I'd like everyone to assume that we take this new dependency on fastutils-core, it'll be purely for the performance gains for array consistency.
We can also explore modifying our own IntIntMap, but looking a how fastutils-core did it, lots of complex edge-cases to solve.
Note: Apologies for any typos, I sent this PR out between few meetings
Benchmark / Performance (for source code changes):
Before
After
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.