-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return only unique record when using includes #8
base: main
Are you sure you want to change the base?
Return only unique record when using includes #8
Conversation
Thank you! Missed the notification on this PR, will look into soon. |
When looking into this, did you notice this? ActiveRecord::Base.logger = Logger.new(STDOUT)
og = User.includes(:organizations).where(organizations: { id: Organization.pluck(:id) }).limit(2).order(created_at: :asc)
[5] pry(#<FastPageTest>)> og.reload
D, [2023-07-04T12:04:49.742304 #1600] DEBUG -- : SQL (0.5ms) SELECT DISTINCT "users"."id" FROM "users" LEFT OUTER JOIN "user_organizations" ON "user_organizations"."user_id" = "users"."id" LEFT OUTER JOIN "organizations" ON "organizations"."id" = "user_organizations"."organization_id" WHERE "organizations"."id" IN (?, ?) ORDER BY "users"."created_at" ASC LIMIT ? [["id", 1], ["id", 2], ["LIMIT", 2]]
D, [2023-07-04T12:04:49.743943 #1600] DEBUG -- : SQL (0.2ms) SELECT "users"."id" AS t0_r0, "users"."login" AS t0_r1, "users"."created_at" AS t0_r2, "users"."updated_at" AS t0_r3, "organizations"."id" AS t1_r0, "organizations"."name" AS t1_r1, "organizations"."created_at" AS t1_r2, "organizations"."updated_at" AS t1_r3 FROM "users" LEFT OUTER JOIN "user_organizations" ON "user_organizations"."user_id" = "users"."id" LEFT OUTER JOIN "organizations" ON "organizations"."id" = "user_organizations"."organization_id" WHERE "organizations"."id" IN (?, ?) AND "users"."id" IN (?, ?) ORDER BY "users"."created_at" ASC [["id", 1], ["id", 2], ["id", 1], ["id", 2]] Surprisingly, It seems that ActiveRecord is already using the deferred join technique on this query automatically. It grabs the distinct IDs and then passes them into the 2nd query. In this case, I don't think |
@mscoutermarsh I had a similar case during my work with this gem (when there was a usage of Went more deep down into this gem implementation and found out the issue described above. Also, the spec I provide did not pass without the change proposed in this MR. Dunno, how it is possible you got a |
Guessing the query in your app in slightly different. Could you turn the query logger on and try the query in your app without fast_page? See if it runs the distinct query for you already? I agree this is a bug - I'm thinking the solution might be a bit different though. Since as is (even with your fix), fast_page is adding an extra query that is unnecessary. Making it slower. |
Will try to provide more details this week |
Thanks! I'm curious if the fix here is detecting when ActiveRecord behaves this way and "disabling" fast_page from running the extra query. I'll play around with it some more as well. |
Don't think so, the fix here is simply applying a |
I'll try to explain again, I wasn't clear. 😄 This example without fast_page. ActiveRecord::Base.logger = Logger.new(STDOUT)
og = User.includes(:organizations).where(organizations: { id: Organization.pluck(:id) }).limit(2).order(created_at: :asc)
[5] pry(#<FastPageTest>)> og.reload
D, [2023-07-04T12:04:49.742304 #1600] DEBUG -- : SQL (0.5ms) SELECT DISTINCT "users"."id" FROM "users" LEFT OUTER JOIN "user_organizations" ON "user_organizations"."user_id" = "users"."id" LEFT OUTER JOIN "organizations" ON "organizations"."id" = "user_organizations"."organization_id" WHERE "organizations"."id" IN (?, ?) ORDER BY "users"."created_at" ASC LIMIT ? [["id", 1], ["id", 2], ["LIMIT", 2]]
D, [2023-07-04T12:04:49.743943 #1600] DEBUG -- : SQL (0.2ms) SELECT "users"."id" AS t0_r0, "users"."login" AS t0_r1, "users"."created_at" AS t0_r2, "users"."updated_at" AS t0_r3, "organizations"."id" AS t1_r0, "organizations"."name" AS t1_r1, "organizations"."created_at" AS t1_r2, "organizations"."updated_at" AS t1_r3 FROM "users" LEFT OUTER JOIN "user_organizations" ON "user_organizations"."user_id" = "users"."id" LEFT OUTER JOIN "organizations" ON "organizations"."id" = "user_organizations"."organization_id" WHERE "organizations"."id" IN (?, ?) AND "users"."id" IN (?, ?) ORDER BY "users"."created_at" ASC [["id", 1], ["id", 2], ["id", 1], ["id", 2]] Notice how it does the DISTINCT query and then uses those results in the 2nd query. This is what But for this specific query, it seems that ActiveRecord is already doing it. Which makes Yes, the wrong results are a bug. But the fix might be having fast_page recognize it can't optimize anything further & calling the original query instead of running the deferred join again. |
IMO it may not be |
Ah could be. This was directly from the gem's test suite. Which is sqlite. |
That's Active Record: see ActiveRecord::FinderMethods#apply_join_dependency and ActiveRecord::ConnectionAdapters::SchemaStatements#distinct_relation_for_primary_key: def apply_join_dependency(eager_loading: group_values.empty?)
join_dependency = construct_join_dependency(
eager_load_values | includes_values, Arel::Nodes::OuterJoin
)
relation = except(:includes, :eager_load, :preload).joins!(join_dependency)
if eager_loading && has_limit_or_offset? && !(
using_limitable_reflections?(join_dependency.reflections) &&
using_limitable_reflections?(
construct_join_dependency(
select_association_list(joins_values).concat(
select_association_list(left_outer_joins_values)
), nil
).reflections
)
)
relation = skip_query_cache_if_necessary do
klass.connection.distinct_relation_for_primary_key(relation)
end
end
if block_given?
yield relation, join_dependency
else
relation
end
end and def distinct_relation_for_primary_key(relation) # :nodoc:
primary_key_columns = Array(relation.primary_key).map do |column|
visitor.compile(relation.table[column])
end
values = columns_for_distinct(
primary_key_columns,
relation.order_values
)
limited = relation.reselect(values).distinct!
limited_ids = select_rows(limited.arel, "SQL").map do |results|
results.last(Array(relation.primary_key).length) # ignores order values for MySQL and PostgreSQL
end
if limited_ids.empty?
relation.none!
else
relation.where!(**Array(relation.primary_key).zip(limited_ids.transpose).to_h)
end
relation.limit_value = relation.offset_value = nil
relation
end The references to eager-loaded has_many associations along with a limit/offset triggers this path since the outer joins needed for eager loading would break the logical limit/offset. It would be interesting to be able to explicitly trigger this behavior, since it'd be pretty helpful for pagination in general! |
The problem
In some cases when a query is using
includes
and the included tables are used in thewhere
clauses, the returned results seems to be duplicated and in the end, it returns too little records compared to setlimit
. Example from specs:It's happening because at some point (when
pluck
is called) a SQLJOIN
is executed and if theJOIN
operation multiplies the results, theids
contains duplicated record IDs. This way, when theids.first(limit_value)
is called, it takes firstlimit_value
values from a duplicated set.The solution
I propose to call
uniq
after taking theids
and this way, operating on a unique set of record IDs