Skip to content

Commit

Permalink
Merge pull request #4 from querqy/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
JohannesDaniel authored Jun 1, 2023
2 parents 55a582b + ea79126 commit 3cbc89c
Show file tree
Hide file tree
Showing 80 changed files with 2,070 additions and 1,440 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@
build
.DS_Store
local
todo.txt
182 changes: 176 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,11 +59,17 @@ The subsequent examples will be based on the query input `iphone` with a single
Querqy configurations include a parser definition and rewriters. A Querqy configuration can be created as follows:
```java
QuerqyConfig.builder()
.commonRules(
CommonRulesDefinition.builder()
.rewriterId("id1")
.rules("iphone => \n SYNONYM: apple smartphone")
.build()
.replaceRules(
ReplaceRulesDefinition.builder()
.rewriterId("id1")
.rules("aple => apple")
.build()
)
.commonRules(
CommonRulesDefinition.builder()
.rewriterId("id1")
.rules("iphone => \n SYNONYM: apple smartphone")
.build()
)
.build();
```
Expand Down Expand Up @@ -150,9 +156,147 @@ Be aware that the representation above is simplified, as scoring implications of
considered.

Notice that fields either can be configured in the direct way (as above) by passing a field name and a weight or more
specifically by creating a field config passing a query type config (e.g. for defining a specific Solr query parser for a field).
specifically by creating a field config passing a query type config (e.g. for defining a specific Solr query parser for a field).

#### Boost configuration

Querqy rules can include boosts. The subsequent rule pushes all apple products for queries containing `iphone`:

```text
iphone =>
UP(10): apple
```

The query configuration can be enhanced by a boost configuration, which defines the way how boost scores are handled:

```java
QueryConfig.builder()
.field("name", 40.0f)
.field("type", 20.0f)
.minimumShouldMatch("100%")
.tie(0.0f)
.boostConfig(
BoostConfig.builder()
.boostMode(BoostConfig.QueryScoreConfig.ADD_TO_BOOST_PARAM)
.build()
)
.build();
```

There are four boost modes:

*QueryScoreConfig.IGNORE_QUERY_SCORE (default)*

Only the score defined in the parameter of the boost rule is added to the result. Given the term `iphone` matches in the
field `name`, the product gets a basic score of `40`. If the term `apple` additionally matches anywhere, an additional score
of `10` is added.

*QueryScoreConfig.ADD_TO_BOOST_PARAM*

The score of the parameter is added in addition to the score of the boosting query. If the term `apple` matches in
the field `type`, an additional score of `30` (`20` boosting query score, `10` parameter score) is added.

*QueryScoreConfig.MULTIPLY_WITH_BOOST_PARAM*

The score of the parameter is multiplied by the score of the boosting query. If the term `apple` matches in
the field `type`, an additional score of `200` (`20` boosting query score, `10` parameter score) is added.

*QueryScoreConfig.CLASSIC*

This mode aims to achieve a backwards-compatible boost scoring to Querqy as a plugin. However, this mode is currently only
supported for the SolrMap client.


### QueryExpansion Configuration
Several use cases might require to enhance a query irrespective of rules or rewriters. Such enhancements can be configured
via the `QueryExpansionConfig`. The easiest way to add queries is to add them as strings. For the Elasticsearch Java Client,
the syntax must be compatible to query string queries (which are built under the hood). For the SolrMap Client, the syntax must be
compatible to the lucene query parser.

```java
final QueryExpansionConfig.<Query>builder()
.addAlternativeMatchingStringQuery("id:123", 50f)
.addBoostUpStringQuery("brand:apple", 50f)
.filterStringQuery("type:smartphone")
.build()
```

Currently, three types of query expansions are supported:

*Filters* are added within a bool query in addition to the querqy query.

```
bool(
must(
bool(
dismax(...)
)
)
filter(
query-expansion-filter-query()
)
)
```

*Boosts* are added within a bool query as should clauses in addition to the querqy query.

```
bool(
must(
bool(
dismax(...)
)
)
should(
query-expansion-boost-query()
)
)
```

*Alternative matching queries* are fully qualified alternatives to the querqy query, for instance to include a product with
a certain id into the results that is not included in the regular query. The original querqy query and the alternative
matching queries are combined as should clauses in an additional bool layer (notice that the subsequent query also includes a
query expansion boost query for demonstration purposes).

```
bool(
should(
bool(
must(
bool(
dismax(...)
)
)
should(
query-expansion-boost-query()
)
)
query-expansion-alternative-matching-query()
)
)
```


For the case that the string based queries are not sufficient, there is the additional option to include them as query
objects.

```java
import co.elastic.clients.elasticsearch._types.query_dsl.Query;
import co.elastic.clients.elasticsearch._types.query_dsl.TermQuery;

final QueryExpansionConfig.<Query>builder()
.addBoostUpQuery(
new Query(
new TermQuery.Builder()
.field("brand")
.value("apple")
.build()
),
50f
)
.build()
```

### Converters
Converters are the search engine-specific part of Querqy-Unplugged. Converters are created for each query separately via
a factory. The classes related to converters as well as the class `QueryRewriting` make use of generic types, and the
Expand Down Expand Up @@ -205,6 +349,32 @@ final QueryRewriting<Query> queryRewriting = QueryRewriting.<Query>builder()

Using this converter requires including the dependency for the client as Querqy-Unplugged only includes it as `compileOnly`.

The definition of RawQuery instructions always have been cumbersome for Elasticsearch as it expects JSON as a default for queries.
Therefore, users were required to define RawQuery instructions as follows:

```text
apple =>
FILTER: * {\"term\":{\"type\":\"smartphone\"}}
```

Querqy-Unplugged facilitates this by enabling users to define RawQuery instructions using the
[Query String Query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html)
syntax. So the rule above can be defined as follows:

```text
apple =>
FILTER: * type:smartphone
```

If you require Querqy-Unplugged to expect RawQuery instructions as JSON, you need to pass a `ESJavaClientConverterConfig`
to the `ESJavaClientConverterFactory`:

```java
final ConverterFactory<Query> converterFactoryJson = ESJavaClientConverterFactory.of(
ESJavaClientConverterConfig.builder()
.rawQueryInputType(ESJavaClientConverterConfig.RawQueryInputType.JSON)
.build());
```

#### Implementing additional converters

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,11 @@
import querqy.QueryConfig;
import querqy.QueryRewriting;
import querqy.converter.ConverterFactory;
import querqy.converter.elasticsearch.javaclient.ESJavaClientConverterConfig;
import querqy.converter.elasticsearch.javaclient.ESJavaClientConverterFactory;
import querqy.parser.FieldAwareWhiteSpaceQuerqyParser;
import querqy.rewriter.builder.CommonRulesDefinition;
import querqy.rewriter.builder.ReplaceRulesDefinition;

import java.util.List;
import java.util.Map;
Expand All @@ -26,10 +28,12 @@ public class ExpandedQueryTest extends AbstractElasticsearchTest {
private static final Map<String, String> RULES = Map.of(
"filter", "apple => \n FILTER: smartphone",
"filter_with_field", "apple => \n FILTER: type:smartphone",
"raw_filter", "apple => \n FILTER: * {\"term\":{\"type\":\"smartphone\"}}",
"raw_filter", "apple => \n FILTER: * type:smartphone",
"raw_filter_json", "apple => \n FILTER: * {\"term\":{\"type\":\"smartphone\"}}",
"boost", "apple => \n UP(100): smartphone",
"boost_additive", "apple => \n UP(10): smartphone",
"boost_multiplicative", "apple => \n UP(1.5): smartphone"
"boost_multiplicative", "apple => \n UP(1.5): smartphone",
"down_boost", "apple => \n DOWN(10): case"
);

private final List<Product> products = List.of(
Expand All @@ -48,6 +52,12 @@ public class ExpandedQueryTest extends AbstractElasticsearchTest {

private final ConverterFactory<Query> converterFactory = ESJavaClientConverterFactory.create();

private final ConverterFactory<Query> converterFactoryJson = ESJavaClientConverterFactory.of(
ESJavaClientConverterConfig.builder()
.rawQueryInputType(ESJavaClientConverterConfig.RawQueryInputType.JSON)
.build()
);

@Test
public void testThat_allDocumentsAreReturned_forGivenMatchAllQuery() {
final QueryRewriting<Query> queryRewriting = queryRewriting();
Expand All @@ -66,6 +76,40 @@ public void testThat_documentsAreFiltered_forGivenFilterQuery() {
assertThat(toIdList(products)).containsExactlyInAnyOrder("1", "2");
}

@Test
public void testThat_rewriterChainIsApplied_forGivenCommonRulesAndReplaceRewriter() {
final QuerqyConfig querqyConfig = QuerqyConfig.builder()
.replaceRules(
ReplaceRulesDefinition.builder()
.rewriterId("id1")
.rules("aple => apple")
.build()
)
.commonRules(
CommonRulesDefinition.builder()
.rewriterId("id2")
.rules(RULES.get("filter"))
.querqyParserFactory(FieldAwareWhiteSpaceQuerqyParser::new)
.build()
)
.build();

final QueryRewriting<Query> queryRewriting = queryRewriting(querqyConfig, queryConfig);
final Query query = queryRewriting.rewriteQuery("aple").getConvertedQuery();

final List<Product> products = search(query);
assertThat(toIdList(products)).containsExactlyInAnyOrder("1", "2");
}

@Test
public void testThat_documentsAreFiltered_forGivenRawFilterJsonQuery() {
final QueryRewriting<Query> queryRewriting = queryRewriting("raw_filter_json", converterFactoryJson);
final Query query = queryRewriting.rewriteQuery("apple").getConvertedQuery();

final List<Product> products = search(query);
assertThat(toIdList(products)).containsExactlyInAnyOrder("1");
}

@Test
public void testThat_documentsAreFiltered_forGivenRawFilterQuery() {
final QueryRewriting<Query> queryRewriting = queryRewriting("raw_filter");
Expand Down Expand Up @@ -113,7 +157,7 @@ public void testThat_documentsAreBoosted_forGivenAdditiveBoostQuery() {
"boost_additive",
queryConfig.toBuilder()
.boostConfig(BoostConfig.builder()
.boostMode(BoostConfig.BoostMode.ADDITIVE)
.queryScoreConfig(BoostConfig.QueryScoreConfig.ADD_TO_BOOST_PARAM)
.build())
.build()
);
Expand All @@ -133,7 +177,7 @@ public void testThat_documentsAreBoosted_forGivenMultiplicativeBoostQuery() {
"boost_multiplicative",
queryConfig.toBuilder()
.boostConfig(BoostConfig.builder()
.boostMode(BoostConfig.BoostMode.MULTIPLICATIVE)
.queryScoreConfig(BoostConfig.QueryScoreConfig.MULTIPLY_WITH_BOOST_PARAM)
.build())
.build()
);
Expand All @@ -147,6 +191,26 @@ public void testThat_documentsAreBoosted_forGivenMultiplicativeBoostQuery() {
);
}

@Test
public void testThat_documentsArePunished_forGivenDownBoostQuery() {
final QueryRewriting<Query> queryRewriting = queryRewriting(
"down_boost",
queryConfig.toBuilder()
.boostConfig(BoostConfig.builder()
.queryScoreConfig(BoostConfig.QueryScoreConfig.ADD_TO_BOOST_PARAM)
.build())
.build()
);
final Query query = queryRewriting.rewriteQuery("apple").getConvertedQuery();

final List<Product> products = search(query);
assertThat(toIdAndScoreMaps(products)).containsExactlyInAnyOrder(
idAndScoreMap("1", 50.0),
idAndScoreMap("2", 40.0),
idAndScoreMap("3", 40.0)
);
}

private QueryRewriting<Query> queryRewriting() {
return queryRewriting(QuerqyConfig.empty(), queryConfig);
}
Expand All @@ -155,11 +219,19 @@ private QueryRewriting<Query> queryRewriting(final String rulesKey) {
return queryRewriting(querqyConfig(RULES.get(rulesKey)), queryConfig);
}

private QueryRewriting<Query> queryRewriting(final String rulesKey, final ConverterFactory<Query> converterFactory) {
return queryRewriting(querqyConfig(RULES.get(rulesKey)), queryConfig, converterFactory);
}

private QueryRewriting<Query> queryRewriting(final String rulesKey, final QueryConfig queryConfig) {
return queryRewriting(querqyConfig(RULES.get(rulesKey)), queryConfig);
}

private QueryRewriting<Query> queryRewriting(final QuerqyConfig querqyConfig, final QueryConfig queryConfig) {
return queryRewriting(querqyConfig, queryConfig, converterFactory);
}

private QueryRewriting<Query> queryRewriting(final QuerqyConfig querqyConfig, final QueryConfig queryConfig, final ConverterFactory<Query> converterFactory) {
return QueryRewriting.<Query>builder()
.queryConfig(queryConfig)
.querqyConfig(querqyConfig)
Expand Down
Loading

0 comments on commit 3cbc89c

Please sign in to comment.