-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand component #1
Comments
how costly it is in terms of Memory along with Grouping? Is it scalable with 100 groups and each group has 3 sub groups and each sub groups has 100 docs? And run this on top of an index with 10M docs? |
The expand component works with a single page of collapsed results. So if your page has 100 groups, with 3 sub groups, with 100 docs each, the component will have to work with 30,000 documents. Not an overwhelming number but not a small number. The 10 million document set will be collapsed by the CollapsingQParserPlugin. How many distinct top level groups are in the index? It sounds like there might be around 33,333 distinct top level groups if each top level group has 300 docs in it. The CollapsingQParserPlugin will eat that for lunch, very little memory used. |
Kranti, I'll be putting the initial implementation up later today or over the weekend. It doesn't cover sub-grouping yet. So if you want to work on that, that would be excellent. We can collaborate on how to add this to the code. Joel |
How many distinct top level groups are in the index?
can you help me to roughly estimate the memory size and response time |
Sure, I can work with you on this. you might need to answer my stupid questions at times :) |
The CollapsingQParserPlugin creates arrays based on the total number of unique values in the field. Rough esitimates for 300,000 unique terms in the field would be 3-5 MB of transient memory per query. The expanding of groups I haven't measured yet. With such a large page, part of the issue will be retrieving the stored values for all those documents. This can be very expensive. |
if we just need docIds at the docList level, means group1=>1234567 (the value of the group field) if we get TopGroups like the above, then metadata can be based on what fields the user wants. I am trying find out the memory and response times for the above structure from the API call. |
Joel, Is it possible to share the ExpandComponent on Saturday (11 Jan), I can spend good time on Sunday and try to get the Sub Groups. I want to also run few performance tests using traditional grouping and the new implementation for collapsing+expanding in the use cases I was describing above. |
Just committed initial implementation of the ExpandComponent at my heliosearch clone in the expand branch: https://github.com/joelbernstein2013/heliosearch/tree/expand Initial patch compiles but has not been tested yet. |
I think it's worth to point to commit itself 2014/1/11 joelbernstein2013 notifications@github.com
|
Joel, I deployed your branch code and started Solr with a pre-populated index having 5M+ documents. Sample Query:
Idea is to get the distinct program ids (collapsing/grouping) and sort them based on the windowStart field. Here is the response <response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">28</int>
<lst name="params">
<str name="expand.rows">1</str>
<str name="sort">windowStart asc</str>
<str name="fl">programId,windowStart</str>
<str name="expand.limit">5</str>
<str name="start">0</str>
<str name="q">
relatedAllIds:8118784557012618112 AND showingType:linear
</str>
<str name="expand">true</str>
<str name="wt">xml</str>
<str name="fq">{!collapse field=programId min=windowStart}</str>
<str name="rows">2</str>
<str name="expand.field">showingId</str>
</lst>
</lst>
<result name="response" numFound="77" start="0">
<doc>
<long name="programId">8050846173392254112</long>
<long name="windowStart">1389375000000</long>
</doc>
<doc>
<long name="programId">8837586713084788112</long>
<long name="windowStart">1389382200000</long>
</doc>
</result>
<lst name="expanded"/>
</response> Why is the expanded result is empty? My expectation is, from the collapsed result, for each programId get top 5 showings sorted by windowStart. how to form the query? |
Reopening - looks like my merge-up of trunk closed this accidentally. |
Added initial test case: |
Added a few more tests to cover the basic functionality. My plan now is to add the distributed test cases and test it at scale and then I think this is nearing initial release condition. Kranti has a few more features he'd like to add (group level paging, subgroup support ) and we can iterate further on these. |
Added basic distributed test cases. joel-bernstein@a9e0b4e Also a small formatting update:joel-bernstein@c7b61a9 Also did some performance testing at scale and the Expand component seems to perform at about the same speed as the CollapsingQParserPlugin. So performing a collapse and expand takes about twice as much time as doing only the collapse. |
This issue introduces a new search component called the Expand component. The Expand component implements group expansion for a single page of results collapsed by the CollapsingQParserPlugin
I'll be working this ticket initially in my fork of the Heliosearch project in a branch called "expand".
https://github.com/joelbernstein2013/heliosearch
The text was updated successfully, but these errors were encountered: