-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add phrase operator for Atlas Search #1586
Conversation
@@ -81,13 +81,23 @@ public SearchConstructibleBsonElement fuzzy(final FuzzySearchOptions options) { | |||
} | |||
|
|||
@Override | |||
public TextSearchOperator synonyms(final String name) { | |||
public SearchConstructibleBsonElement synonyms(final String name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below, the implementation is:
return newWithMutatedValue(doc -> {
doc.remove("fuzzy");
doc.append("synonyms", notNull("name", name));
});
Previously, this was used just for text, so removing "fuzzy" was mostly harmless. However, now the behaviour is undocumented in PhraseSearchOperator synonyms
, and no longer makes sense.
For example, I don't see in principle why synonyms can't be applied before fuzzy matching, and the server might (hypothetically) add this in the future to either text or phrase search. When that happens, the API becomes broken, and the only way out is to break backwards compatibility by removing the remove. Of course, I expect the chances of this actually happening are low, but this is still a problem with the API, that we should at least avoid in the future (that is, we should never "correct" things, but throw a validation exception, though in such cases, we should let the server provide the exception).
We could consider deprecating the "and does not use X" docs for text synonyms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation. The PhraseSearchOperator
does not have the "fuzzy" option, so I originally thought that doc.remove("fuzzy")
would not have an impact on the implementation above (since the key "fuzzy" is not present in the map that we are trying to remove from). But it makes sense that if I were to use the same synonyms()
the removal of "fuzzy" does not make sense in terms of phrase operator's synonyms().
Q: Are you suggesting that we should get rid of doc.remove("fuzzy")
completely and throw a validation exception instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot remove it without breaking backwards compatibility. For example, users might be creating a default options for text that might call fuzzy. Then, they "customize" this default by using the synonyms method. If we stopped removing fuzzy, code that previously worked would result in a server error.
We can either (in my order of preference):
- Do nothing
- Deprecate
- Break backwards compat by removing remove
- Create two versions of synonyms, which makes the API messy
@stIncMale Could you give a secondary review to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, this was used just for text, so removing "fuzzy" was mostly harmless.
This was not only harmless, but by design:
text.fuzzy
andtext.synonyms
are mutually exclusive (and are allowed to be both omitted).- We wanted to allow building a
TextSearchOperator
based on anotherTextSearchOperator
, which means, we had to allow callingTextSearchOperator.synonyms
afterTextSearchOperator.fuzzy
having been called, and vice versa. That means each of these methods must undo the other one1.
I agree with Maxim, we should not piggy-back on SearchConstructibleBsonElement.synonyms
when implementing PhraseSearchOperator
. But we are in no trouble. We simply need to create another subclass of AbstractConstructibleBsonElement
implementing PhraseSearchOperator
, and implement synonyms
there the way it is suitable for PhraseSearchOperator
. You may see that this approach has been used multiple times in the past: there are multiple subclasses of both AbstractConstructibleBsonElement
and AbstractConstructibleBson
, and I remember that at least once I created a subclass because of the issue similar to the one at hand.
1 While TextSearchOperator.synonyms
and TextSearchOperator.fuzzy
undo each other, supporting the idea of building a TextSearchOperator
based on another TextSearchOperator
, there is no way of unsetting both them once one of them is set. That is, I completely failed to implement the API in a way that fully supports the aforementioned idea. Fortunately, fixing it (finishing the implementation) does not require any breaking changes.
Additionally, there is currently no way to build a TextSearchOperator
based on another one in a way that changes the required fields. Again, supporting this does not require breaking changes. But the fact that I did not do that, is quite embarrassing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about this with Maxim in Slack, and I created JAVA-5752 Re-visit some aspects of the Atlas search API design before moving it out of beta
as a result. Given that the API is in beta, we have a chance to re-visit some of the design decisions, and it is worth doing. I mentioned the ticket here just because the current discussion initiated it, not because I see it relevant to the current PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valentin's suggestion to create another implementing class looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just added another implementing class!
return newWithMutatedValue(doc -> { | ||
doc.remove("fuzzy"); | ||
doc.append("synonyms", notNull("name", name)); | ||
}); | ||
} | ||
|
||
@Override | ||
public PhraseSearchOperator slop() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not necessary. The slop field, if omitted, defaults to using 0, so users can omit it here, instead of setting it to 0.
Is there a similar method in another part of this API, that this is based on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I was basing it off with the fuzzy()
method in the same file. If the user calls the fuzzy()
method, it defaults the FuzzySearchOptions to be an EMPTY_IMMUTABLE
, so I thought that having an option for the users to call slop()
with no parameter could also default the slop to be 0.
I was unaware that we don't have to explicitly set the default value inside of a method. I can go ahead and remove it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Maxim that we should not introduce PhraseSearchOperator.slop()
(the no-argument overload).
TextSearchOperator.fuzzy()
is different enough from PhraseSearchOperator.slop()
to give meaning to TextSearchOperator.fuzzy()
, despite PhraseSearchOperator.slop()
not being meaningful:
- Similarities: both
text.fuzzy
andphrase.slope
are optional fields. - Differences: not specifying
text.fuzzy
is not equivalent to specifyingtext.fuzzy
with the default value (which is the empty document). Not specifyingphrase.slope
is equivalent to specifyingphrase.slope
with the default value (which is 0). This means that theTextSearchOperator
for whichTextSearchOperator.fuzzy()
was called is not equivalent to one for which it wasn't called. However, thePhraseSearchOperator
for whichPhraseSearchOperator.slop()
was called is equivalent to one for which it wasn't called.
Thus, you can see why TextSearchOperator.fuzzy()
is needed, while PhraseSearchOperator.slop()
is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Valentin for the explanation! I went ahead and removed slop()
.
Ticket
JAVA-5724
Description
There are several Atlas Search query operators that are not implemented with first class support in the Java driver. This PR adds the
phrase
operator to Atlas Search.Testing
./gradlew check
atlas-search-test
on EvergreenSearchOperatorTest