Skip to content

Latest commit

 

History

History
44 lines (27 loc) · 2.35 KB

README.md

File metadata and controls

44 lines (27 loc) · 2.35 KB

Elasticsearch date_histogram aggregations benchmark

Intro

This project is an ES Rally benchmark that measures performance of the date_histogram aggregation for various workloads.

It was created to reproduce a performance issue reported at Elastic forums and GitHub issues.

Test datasets with different workloads have been uploaded here.

Run benchmarks

To run the benchmarks

  1. Install the latest version of Rally, as described in the official Rally documentation.

  2. Configure Rally using esrally configure.

  3. Edit ~/.rally/rally.ini and add the data_histogram-benchmark track in the [tracks] section as shown below (more details in the Rally docs):

    [tracks]
    default.url = https://github.com/elastic/rally-tracks
    date_histogram-benchmark.url = https://github.com/csoulios/date_histogram-benchmark
    
  4. Run rally track with any of the supported challenges.

esrally --on-error=abort --track-repository=date_histogram-benchmark --distribution-version=[elasticsearch_version] --track date_histogram --challenge=[challenge_name]

A different challenge has been created for loading each of the datasets with different distributions of documents in time:

  • timestamps-gaussian-sameday: this dataset represents the actual distribution of log data during a production day. It is a gaussian distribution centered around lunch time (more documents during the day than the night). All documents fit within the same day.
  • timestamps-uniform-sameday: All documents fit within the same day but are evenly distributed (same amount of docs every hours).
  • timestamps-uniform-1s: Documents are spaced a second apart (the first starts at 2000-01-01T00:00:00.000Z, next is 1 second later).
  • timestamps-uniform-10s: 10 second gap between documents.

Acknowledgements

Special thanks to Bertrand Renuart for reporting this issue and creating the benchmark dataset.