-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix an N^2 in batch hydration #172
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the failing tests?
80aa97d
to
ad2791e
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #172 +/- ##
=======================================
Coverage 83.58% 83.58%
=======================================
Files 37 37
Lines 2498 2498
Branches 212 212
=======================================
Hits 2088 2088
Misses 198 198
Partials 212 212 ☔ View full report in Codecov by Sentry. |
Bump. I've rewritten the implementation of merging original instances with batch-hydrated instances again. I'm building an index that serves both as the indicator which instances should be taken into the batch to be hydrated and which not; and also it enumerates the instances-to-by-hydrated, so that on the second pass of this index, we can extract the hydrated instances by their position in the batch. The tests are now fixed. |
8828efd
to
17d2c5f
Compare
17d2c5f
to
e5f311d
Compare
e5f311d
to
d907334
Compare
This hadn't been noticed previously since N is generally small, eg. 50 or 60 at most. Working with a Metabase dashboard that loaded 10000 fields (100 each from 100 tables) in a single `select` + `hydrate`, I found this was consuming about 3 seconds. The slow path was effectively a very slow `conj`: taking the `acc` vector, and then for each `annotated-instances` doing: ```clojure (recur (vec (concat acc nil [(first hydrated-instances)])) ...) ``` which is pouring the whole vector into a lazy sequence and back into a vector, once for each instance in the list. The new version first creates an "index" of instances that need hydration, and later uses that index to map the instances in the hydrated instances batch to their places in the initial collection. Co-authored-by: Braden Shepherdson <braden@metabase.com>
d907334
to
4748350
Compare
This hadn't been noticed previously since N is generally small, eg. 50 or 60 at most. Working with a Metabase dashboard that loaded 10000 fields (100 each from 100 tables) in a single
select
+hydrate
, I found this was consuming about 3 seconds.The slow path was effectively a very slow
conj
: taking theacc
vector, and then for eachannotated-instances
doing:which is pouring the whole vector into a lazy sequence and back into a vector, once for each instance in the list.
The new version uses a custom transducer to do the same process in O(n) time (and without any seq overhead, as a bonus).