Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Incomplete search results against rollup index composed of data from multiple rollup jobs #903

Open
sharebear opened this issue Jul 25, 2023 · 3 comments
Labels
bug Something isn't working rollup small

Comments

@sharebear
Copy link

Describe the bug
Incomplete results when querying rollup index with date histogram.

To Reproduce

Enter the following into the dev console and execute each query in sequence (waiting for completion of rollup jobs at that step)

# Insert some test data

POST sharej-data-2023-07-25/_doc
{
  "timestamp": "2023-07-25T14:10:43.1",
  "numberOfCalls": 1
}

POST sharej-data-2023-07-25/_doc
{
  "timestamp": "2023-07-25T14:12:43.1",
  "numberOfCalls": 1
}

POST sharej-data-2023-07-24/_doc
{
  "timestamp": "2023-07-24T13:14:43.1",
  "numberOfCalls": 4
}

POST sharej-data-2023-07-23/_doc
{
  "timestamp": "2023-07-23T15:11:43.1",
  "numberOfCalls": 2
}

POST sharej-data-2023-07-23/_doc
{
  "timestamp": "2023-07-23T17:18:43.1",
  "numberOfCalls": 4
}

# Create rollup job for above indexes, emulating what happens when you have an ISM policy applying the rollup to each index after X days
PUT _plugins/_rollup/jobs/sharej-rollup-2023-07-23
{
  "rollup": {
    "enabled": true,
    "source_index": "sharej-data-2023-07-23",
    "target_index": "rollup-sharej-data-2023",
    "schedule": {
      "interval": {
        "start_time": 1,
        "period": "1",
        "unit": "Minutes"
      }
    },
    "description": "Test rollup",
    "page_size": 1000,
    "delay": 0,
    "continuous": false,
    "dimensions": [
      {
        "date_histogram": {
          "source_field": "timestamp",
          "fixed_interval": "1h",
          "timezone": "UTC"
        }
      }
    ],
    "metrics": [
      {
        "source_field": "numberOfCalls",
        "metrics": [
          {
            "avg": {}
          },
          {
            "sum": {}
          },
          {
            "max": {}
          },
          {
            "min": {}
          },
          {
            "value_count": {}
          }
        ]
      }
    ]
  }
}

PUT _plugins/_rollup/jobs/sharej-rollup-2023-07-24
{
  "rollup": {
    "enabled": true,
    "source_index": "sharej-data-2023-07-24",
    "target_index": "rollup-sharej-data-2023",
    "schedule": {
      "interval": {
        "start_time": 1,
        "period": "1",
        "unit": "Minutes"
      }
    },
    "description": "Test rollup",
    "page_size": 1000,
    "delay": 0,
    "continuous": false,
    "dimensions": [
      {
        "date_histogram": {
          "source_field": "timestamp",
          "fixed_interval": "1h",
          "timezone": "UTC"
        }
      }
    ],
    "metrics": [
      {
        "source_field": "numberOfCalls",
        "metrics": [
          {
            "avg": {}
          },
          {
            "sum": {}
          },
          {
            "max": {}
          },
          {
            "min": {}
          },
          {
            "value_count": {}
          }
        ]
      }
    ]
  }
}

PUT _plugins/_rollup/jobs/sharej-rollup-2023-07-25
{
  "rollup": {
    "enabled": true,
    "source_index": "sharej-data-2023-07-25",
    "target_index": "rollup-sharej-data-2023",
    "schedule": {
      "interval": {
        "start_time": 1,
        "period": "1",
        "unit": "Minutes"
      }
    },
    "description": "Test rollup",
    "page_size": 1000,
    "delay": 0,
    "continuous": false,
    "dimensions": [
      {
        "date_histogram": {
          "source_field": "timestamp",
          "fixed_interval": "1h",
          "timezone": "UTC"
        }
      }
    ],
    "metrics": [
      {
        "source_field": "numberOfCalls",
        "metrics": [
          {
            "avg": {}
          },
          {
            "sum": {}
          },
          {
            "max": {}
          },
          {
            "min": {}
          },
          {
            "value_count": {}
          }
        ]
      }
    ]
  }
}

# Watch status of rollup jobs until complete

GET _plugins/_rollup/jobs/sharej-rollup-2023-07-23/_explain

GET _plugins/_rollup/jobs/sharej-rollup-2023-07-24/_explain

GET _plugins/_rollup/jobs/sharej-rollup-2023-07-25/_explain

# Execute query against source data. Three buckets returned
GET sharej-data-2023-*/_search
{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggregations": {
    "by_day": {
      "date_histogram": {
        "field": "timestamp",
        "fixed_interval": "1d"
      },
      "aggregations": {
        "totalCalls": {
          "sum": {
            "field": "numberOfCalls"
          }
        }
      }
    }
  }
}

# Execute query against rollup data. Only 1 bucket returned!?!?!? Where's the rest of the data?
GET rollup-sharej-data-2023/_search
{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggregations": {
    "by_day": {
      "date_histogram": {
        "field": "timestamp",
        "fixed_interval": "1d"
      },
      "aggregations": {
        "totalCalls": {
          "sum": {
            "field": "numberOfCalls"
          }
        }
      }
    }
  }
}

# Execute against rollup data without query. Expected result again (but this isn't the query we get when adding a visualisation)
GET rollup-sharej-data-2023/_search
{
  "size": 0,
  "aggregations": {
    "by_day": {
      "date_histogram": {
        "field": "timestamp",
        "fixed_interval": "1d"
      },
      "aggregations": {
        "totalCalls": {
          "sum": {
            "field": "numberOfCalls"
          }
        }
      }
    }
  }
}

Expected behavior

All three queries at the end should return the same results. What appears to be happening is that the results from only one of the rollup jobs are returned when the query parameter is provided to the search against the rollup index.

Host/Environment (please complete the following information):

  • OS: Linux (discovered in Aiven hosted version but behaviour reproduced locally with docker image)
  • Version 2.8.0

Additional context
We've got some metrics that we have posted to daily indexes. We have an ISM policy applied to the daily index pattern that after three days, performs a rollup to an annual index and deletes the source index. When trying to create visualisations based upon the rollup index we're getting strange results. When hand crafting a search against the rollup index I'm able to see that all the expected data is there, but when placing the equivalent query via a visualisation on a dashboard we're missing data. The difference between my hand-crafted search and the search from the dashboard is the presence of the query field that narrows down the time-frame and optionally drills down on other facets (not included in code example above). How do we get our visualisations to show all the data, or have I stubled upon a genuine bug here?

@sharebear sharebear added bug Something isn't working untriaged labels Jul 25, 2023
@sharebear sharebear changed the title [BUG] [BUG] Incomplete search results against rollup index composed of data from multiple rollup jobs Jul 25, 2023
@msfroh
Copy link

msfroh commented Aug 16, 2023

Should we move this to https://github.com/opensearch-project/index-management ?

@msfroh msfroh removed the untriaged label Aug 16, 2023
@dblock dblock transferred this issue from opensearch-project/OpenSearch Aug 22, 2023
@msfroh msfroh removed the untriaged label Aug 23, 2023
@KagariSan
Copy link

Hi @sharebear and @msfroh,

I've identified the cause of the reported behavior and believe this issue can now be closed.

The behavior is related to the code snippet found here:

https://github.com/opensearch-project/index-management/blob/d4ee795e22f4490b78662f171f62d566a81c1abc/src/main/kotlin/org/opensearch/indexmanagement/rollup/interceptor/RollupInterceptor.kt#L347

This code references the setting "plugins.rollup.search.search_all_jobs", documented here:

https://opensearch.org/docs/2.4/im-plugin/index-rollups/settings/

To modify the current behavior, you can update your cluster configuration using the following API call:

PUT https://localhost:9200/_cluster/settings
Content-Type: application/json

{
  "persistent": {
    "plugins.rollup.search.search_all_jobs": true
  },
  "transient": {
    "plugins.rollup.search.search_all_jobs": true
  }
}

This update will enable searching across all rollup jobs, both persistently and transiently.

Please don't hesitate to let me know if you have any questions or need any more help.

@sharebear
Copy link
Author

Thanks, I've confirmed that the setting does seem to resolve my issue in local testing, just need to work out how to get that set in my Aiven hosted instance (not your problem)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working rollup small
Projects
Status: Todo
Development

No branches or pull requests

4 participants