Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Hybrid Search returns incorrect results when search pipeline is missing #1011

Open
bzhangam opened this issue Dec 12, 2024 · 1 comment
Assignees
Labels
bug Something isn't working hybrid search

Comments

@bzhangam
Copy link
Contributor

bzhangam commented Dec 12, 2024

What is the bug?

Hybrid Search returns incorrect results when the search pipeline is missing. In cases where multiple sub-queries are used in Hybrid Search and no search pipeline is specified for normalization and combination, OpenSearch still returns results. However, instead of combining the results from the sub-queries as expected, it returns all matched results from each sub-query separately. Additionally, no error is raised to indicate that the search pipeline is missing, leading to potentially misleading outcomes.

How can one reproduce the bug?

  1. Follow this tutorial to set up the opensearch.
  2. Do hybrid search without specifying the search pipeline
GET /my-nlp-index/_search
{
  "_source": {
    "excludes": [
      "passage_embedding"
    ]
  },
  "query": {
        "hybrid": {
      "queries": [
        {
          "match": {
            "text": {
              "query": "cowboy rodeo bronco"
            }
          }
        },
        {
          "neural": {
            "passage_embedding": {
              "query_text": "wild west",
              "model_id": <the ML model id>,
              "k": 5
            }
          }
        }
      ]
    }
  }
}
  1. Incorrect results will be returned.
{
    "took": 38,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 5,
            "relation": "eq"
        },
        "max_score": 2.261763,
        "hits": [
            {
                "_index": "my-nlp-index",
                "_id": "3",
                "_score": -9.549512E9,
                "_source": {
                    "text": "People line the stands which advertise Freemont 's orthopedics , a cowboy rides a light brown bucking bronco .",
                    "id": "2664027527.jpg"
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "3",
                "_score": -4.4224404E9,
                "_source": {
                    "text": "People line the stands which advertise Freemont 's orthopedics , a cowboy rides a light brown bucking bronco .",
                    "id": "2664027527.jpg"
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "3",
                "_score": 2.261763,
                "_source": {
                    "text": "People line the stands which advertise Freemont 's orthopedics , a cowboy rides a light brown bucking bronco .",
                    "id": "2664027527.jpg"
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "5",
                "_score": 2.1210756,
                "_source": {
                    "text": "A rodeo cowboy , wearing a cowboy hat , is being thrown off of a wild white horse .",
                    "id": "2691147709.jpg"
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "4",
                "_score": 0.87546873,
                "_source": {
                    "text": "A man who is riding a wild horse in the rodeo is very near to falling off .",
                    "id": "4427058951.jpg"
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "3",
                "_score": -4.4224404E9,
                "_source": {
                    "text": "People line the stands which advertise Freemont 's orthopedics , a cowboy rides a light brown bucking bronco .",
                    "id": "2664027527.jpg"
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "4",
                "_score": 0.015851954,
                "_source": {
                    "text": "A man who is riding a wild horse in the rodeo is very near to falling off .",
                    "id": "4427058951.jpg"
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "2",
                "_score": 0.015748847,
                "_source": {
                    "text": "A wild animal races across an uncut field with a minimal amount of trees .",
                    "id": "1775029934.jpg"
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "5",
                "_score": 0.01517796,
                "_source": {
                    "text": "A rodeo cowboy , wearing a cowboy hat , is being thrown off of a wild white horse .",
                    "id": "2691147709.jpg"
                }
            },
            {
                "_index": "my-nlp-index",
                "_id": "1",
                "_score": 0.013272899,
                "_source": {
                    "text": "A West Virginia university women 's basketball team , officials , and a small gathering of fans are in a West Virginia arena .",
                    "id": "4319130149.jpg"
                }
            }
        ]
    }
}

What is the expected behavior?

An error should be returned saying the search pipeline is missing for the hybrid search.

What is your host/environment?

Opensearch version 2.18.0.0

Do you have any screenshots?

N/A

Do you have any additional context?

N/A

@bzhangam bzhangam added bug Something isn't working untriaged labels Dec 12, 2024
@dblock dblock removed the untriaged label Jan 6, 2025
@dblock
Copy link
Member

dblock commented Jan 6, 2025

[Catch All Triage - 1, 2, 3, 4, 5, 6]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working hybrid search
Projects
None yet
Development

No branches or pull requests

4 participants