Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add phrase operator for Atlas Search #1586

Merged
merged 4 commits into from
Jan 14, 2025
Merged

Conversation

joykim1005
Copy link
Contributor

@joykim1005 joykim1005 commented Dec 23, 2024

Ticket

JAVA-5724

Description

There are several Atlas Search query operators that are not implemented with first class support in the Java driver. This PR adds the phrase operator to Atlas Search.

Testing

  • ran ./gradlew check
  • ran atlas-search-test on Evergreen
  • ran SearchOperatorTest
  • ran all Evergreen patch

@joykim1005 joykim1005 changed the title Add phrase Add phrase operator for Atlas Search Dec 23, 2024
@joykim1005 joykim1005 requested review from a team and katcharov and removed request for a team January 6, 2025 14:19
@joykim1005 joykim1005 marked this pull request as ready for review January 7, 2025 15:35
@@ -81,13 +81,23 @@ public SearchConstructibleBsonElement fuzzy(final FuzzySearchOptions options) {
}

@Override
public TextSearchOperator synonyms(final String name) {
public SearchConstructibleBsonElement synonyms(final String name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below, the implementation is:

        return newWithMutatedValue(doc -> {
            doc.remove("fuzzy");
            doc.append("synonyms", notNull("name", name));
        });

Previously, this was used just for text, so removing "fuzzy" was mostly harmless. However, now the behaviour is undocumented in PhraseSearchOperator synonyms, and no longer makes sense.

For example, I don't see in principle why synonyms can't be applied before fuzzy matching, and the server might (hypothetically) add this in the future to either text or phrase search. When that happens, the API becomes broken, and the only way out is to break backwards compatibility by removing the remove. Of course, I expect the chances of this actually happening are low, but this is still a problem with the API, that we should at least avoid in the future (that is, we should never "correct" things, but throw a validation exception, though in such cases, we should let the server provide the exception).

We could consider deprecating the "and does not use X" docs for text synonyms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation. The PhraseSearchOperator does not have the "fuzzy" option, so I originally thought that doc.remove("fuzzy") would not have an impact on the implementation above (since the key "fuzzy" is not present in the map that we are trying to remove from). But it makes sense that if I were to use the same synonyms() the removal of "fuzzy" does not make sense in terms of phrase operator's synonyms().

Q: Are you suggesting that we should get rid of doc.remove("fuzzy") completely and throw a validation exception instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot remove it without breaking backwards compatibility. For example, users might be creating a default options for text that might call fuzzy. Then, they "customize" this default by using the synonyms method. If we stopped removing fuzzy, code that previously worked would result in a server error.

We can either (in my order of preference):

  1. Do nothing
  2. Deprecate
  3. Break backwards compat by removing remove
  4. Create two versions of synonyms, which makes the API messy

@stIncMale Could you give a secondary review to this?

Copy link
Member

@stIncMale stIncMale Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, this was used just for text, so removing "fuzzy" was mostly harmless.

This was not only harmless, but by design:

  • text.fuzzy and text.synonyms are mutually exclusive (and are allowed to be both omitted).
  • We wanted to allow building a TextSearchOperator based on another TextSearchOperator, which means, we had to allow calling TextSearchOperator.synonyms after TextSearchOperator.fuzzy having been called, and vice versa. That means each of these methods must undo the other one1.

I agree with Maxim, we should not piggy-back on SearchConstructibleBsonElement.synonyms when implementing PhraseSearchOperator. But we are in no trouble. We simply need to create another subclass of AbstractConstructibleBsonElement implementing PhraseSearchOperator, and implement synonyms there the way it is suitable for PhraseSearchOperator. You may see that this approach has been used multiple times in the past: there are multiple subclasses of both AbstractConstructibleBsonElement and AbstractConstructibleBson, and I remember that at least once I created a subclass because of the issue similar to the one at hand.


1 While TextSearchOperator.synonyms and TextSearchOperator.fuzzy undo each other, supporting the idea of building a TextSearchOperator based on another TextSearchOperator, there is no way of unsetting both them once one of them is set. That is, I completely failed to implement the API in a way that fully supports the aforementioned idea. Fortunately, fixing it (finishing the implementation) does not require any breaking changes.

Additionally, there is currently no way to build a TextSearchOperator based on another one in a way that changes the required fields. Again, supporting this does not require breaking changes. But the fact that I did not do that, is quite embarrassing.

Copy link
Member

@stIncMale stIncMale Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this with Maxim in Slack, and I created JAVA-5752 Re-visit some aspects of the Atlas search API design before moving it out of beta
as a result. Given that the API is in beta, we have a chance to re-visit some of the design decisions, and it is worth doing. I mentioned the ticket here just because the current discussion initiated it, not because I see it relevant to the current PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valentin's suggestion to create another implementing class looks good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added another implementing class!

return newWithMutatedValue(doc -> {
doc.remove("fuzzy");
doc.append("synonyms", notNull("name", name));
});
}

@Override
public PhraseSearchOperator slop() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not necessary. The slop field, if omitted, defaults to using 0, so users can omit it here, instead of setting it to 0.

Is there a similar method in another part of this API, that this is based on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I was basing it off with the fuzzy() method in the same file. If the user calls the fuzzy() method, it defaults the FuzzySearchOptions to be an EMPTY_IMMUTABLE, so I thought that having an option for the users to call slop() with no parameter could also default the slop to be 0.

I was unaware that we don't have to explicitly set the default value inside of a method. I can go ahead and remove it!

Copy link
Member

@stIncMale stIncMale Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Maxim that we should not introduce PhraseSearchOperator.slop() (the no-argument overload).

TextSearchOperator.fuzzy() is different enough from PhraseSearchOperator.slop() to give meaning to TextSearchOperator.fuzzy(), despite PhraseSearchOperator.slop() not being meaningful:

  • Similarities: both text.fuzzy and phrase.slope are optional fields.
  • Differences: not specifying text.fuzzy is not equivalent to specifying text.fuzzy with the default value (which is the empty document). Not specifying phrase.slope is equivalent to specifying phrase.slope with the default value (which is 0). This means that the TextSearchOperator for which TextSearchOperator.fuzzy() was called is not equivalent to one for which it wasn't called. However, the PhraseSearchOperator for which PhraseSearchOperator.slop() was called is equivalent to one for which it wasn't called.

Thus, you can see why TextSearchOperator.fuzzy() is needed, while PhraseSearchOperator.slop() is not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Valentin for the explanation! I went ahead and removed slop().

@joykim1005 joykim1005 requested a review from katcharov January 13, 2025 18:14
@joykim1005 joykim1005 merged commit 8c49e38 into mongodb:main Jan 14, 2025
56 of 60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants