forked from HdrHistogram/HdrHistogram
-
Notifications
You must be signed in to change notification settings - Fork 4
A High Dynamic Range (HDR) Histogram
License
AdaptiveConsulting/HdrHistogram
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
HdrHistogram: A High Dynamic Range (HDR) Histogram [excerpt from the Histogram JavaDoc] HdrHistogram supports the recording and analyzing sampled data value counts across a configurable integer value range with configurable value precision within the range. Value precision is expressed as the number of significant digits in the value recording, and provides control over value quantization behavior across the value range and the subsequent value resolution at any given level. For example, a Histogram could be configured to track the counts of observed integer values between 0 and 3,600,000,000 while maintaining a value precision of 3 significant digits across that range. Value quantization within the range will thus be no larger than 1/1,000th (or 0.1%) of any value. This example Histogram could be used to track and analyze the counts of observed response times ranging between 1 microsecond and 1 hour in magnitude, while maintaining a value resolution of 1 microsecond up to 1 millisecond, a resolution of 1 millisecond (or better) up to one second, and a resolution of 1 second (or better) up to 1,000 seconds. At it's maximum tracked value (1 hour), it would still maintain a resolution of 3.6 seconds (or better). The HdrHistogram package includes the Histogram implementation, which tracks value counts in long fields, and is expected to be the commonly used Histogram form. IntHistogram and ShortHistogram, which track value counts in int and short fields respectively, are provided for use cases where smaller count ranges are practical and smaller overall storage is beneficial. HDR Histogram is designed for recoding histograms of value measurements in latency and performance sensitive applications. Measurements show value recording times as low as 3-6 nanoseconds on modern (circa 2012) Intel CPUs. AbstractHistogram maintains a fixed cost in both space and time. A Histogram's memory footprint is constant, with no allocation operations involved in recording data values or in iterating through them. The memory footprint is fixed regardless of the number of data value samples recorded, and depends solely on the dynamic range and precision chosen. The amount of work involved in recording a sample is constant, and directly computes storage index locations such that no iteration or searching is ever involved in recording data values. A combination of high dynamic range and precision is useful for collection and accurate post-recording analysis of sampled value data distribution in various forms. Whether it's calculating or plotting arbitrary percentiles, iterating through and summarizing values in various ways, or deriving mean and standard deviation values, the fact that the recorded data information is kept in high resolution allows for accurate post-recording analysis with low [and ultimately configurable] loss in accuracy when compared to performing the same analysis directly on the potentially infinite series of sourced data values samples. Internally, AbstractHistogram data is maintained using a concept somewhat similar to that of floating point number representation: Using a an exponent a (non-normalized) mantissa to support a wide dynamic range at a high but varying (by exponent value) resolution. AbstractHistogram uses exponentially increasing bucket value ranges (the parallel of the exponent portion of a floating point number) with each bucket containing a fixed number (per bucket) set of linear sub-buckets (the parallel of a non-normalized mantissa portion of a floating point number). Both dynamic range and resolution are configurable, with highestTrackableValue controlling dynamic range, and numberOfSignificantValueDigits controlling resolution. An common use example of an HDR Histogram would be to record response times in units of microseconds across a dynamic range stretching from 1 usec to over an hour, with a good enough resolution to support later performing post-recording analysis on the collected data. Analysis can including computing, examining, and reporting of distribution by percentiles, linear or logarithmic value buckets, mean and standard deviation, or by any other means that can can be easily added by using the various iteration techniques supported by the Histogram. In order to facilitate the accuracy needed for various post-recording analysis techniques, this example can maintain where a resolution of ~1 usec or better for times ranging to ~2 msec in magnitude, while at the same time maintaining a resolution of ~1 msec or better for times ranging to ~2 sec, and a resolution of ~1 second or better for values up to 2,000 seconds. This sort of example resolution can be thought of as "always accurate to 3 decimal points." Such an example Histogram would simply be created with a highestTrackableValue of 3,600,000,000, and a numberOfSignificantValueDigits of 3, and would occupy a fixed, unchanging memory footprint of around 185KB (see "Footprint estimation" below). Synchronization and concurrent access In the interest of keeping value recording cost to a minimum, AbstractHistogram and it's direct implementations are NOT internally synchronized, and do NOT use atomic variables. Callers wishing to make potentially concurrent, multi-threaded updates or queries against Histogram objects should take care to externally synchronize and/or order their access. Iteration Histograms supports multiple convenient forms of iterating through the histogram data set, including linear, logarithmic, and percentile iteration mechanisms, as well as means for iterating through each recorded value or each possible value level. The iteration mechanisms are accessible through the HistogramData available through getHistogramData(). Iteration mechanisms all provide HistogramIterationValue data points along the histogram's iterated data set, and are available for the default (corrected) histogram data set via the following HistogramData methods: percentiles: An Iterable<HistogramIterationValue> through the histogram using a PercentileIterator linearBucketValues: An Iterable<HistogramIterationValue> through the histogram using a LinearIterator logarithmicBucketValues: An Iterable<HistogramIterationValue> through the histogram using a LogarithmicIterator recordedValues: An Iterable<HistogramIterationValue> through the histogram using a RecordedValuesIterator allValues: An Iterable<HistogramIterationValue> through the histogram using a AllValuesIterator Iteration is typically done with a for-each loop statement. E.g.: for (HistogramIterationValue v : histogram.getHistogramData().percentiles(ticksPerHalfDistance)) { ... } or for (HistogramIterationValue v : histogram.getRawHistogramData().linearBucketValues(unitsPerBucket)) { ... } The iterators associated with each iteration method are resettable, such that a caller that would like to avoid allocating a new iterator object for each iteration loop can re-use an iterator to repeatedly iterate through the histogram. This iterator re-use usually takes the form of a traditional for loop using the Iterator's hasNext() and next() methods: to avoid allocating a new iterator object for each iteration loop: PercentileIterator iter = histogram.getHistogramData().percentiles().iterator(ticksPerHalfDistance<); ... iter.reset(percentileTicksPerHalfDistance); for (iter.hasNext() { HistogramIterationValue v = iter.next(); ... } Equivalent Values and value ranges Due to the finite (and configurable) resolution of the histogram, multiple adjacent integer data values can be "equivalent". Two values are considered "equivalent" if samples recorded for both are always counted in a common total count due to the histogram's resolution level. Histogram provides methods for determining the lowest and highest equivalent values for any given value, as we as determining whether two values are equivalent, and for finding the next non-equivalent value for a given value (useful when looping through values, in order to avoid double-counting count). Corrected vs. Raw recordValue()calls In order to support a common use case needed when histogram values are used to track response time distribution, Histogram provides for the recording of corrected histogram value by supporting an optional expectedIntervalBetweenValueSamples parameter to the recordValue() method. This form is useful in [common load generator] scenarios where response times may exceed the expected interval between issuing requests, leading to "dropped" response time measurements that would typically correlate with "bad" results. When a value recorded in the histogram exceeds the expectedIntervalBetweenValueSamples parameter (when provided), recorded histogram data will reflect an appropriate number of additional values, linearly decreasing in steps of expectedIntervalBetweenValueSamples, down to the last value that would still be higher than expectedIntervalBetweenValueSamples. To illustrate why this corrective behavior is critically needed in order to accurately represent value distribution when large value measurements may lead to missed samples, imagine a system for which response times samples are taken once every 10 msec to characterize response time distribution. The hypothetical system behaves "perfectly" for 100 seconds (10,000 recorded samples), with each sample showing a 1msec response time value. At each sample for 100 seconds (10,000 logged samples at 1 msec each). The hypothetical system then encounters a 100 sec pause during which only a single sample is recorded (with a 100 second value). The raw data histogram collected for such a hypothetical system (over the 200 second scenario above) would show ~99.99% of results at 1 msec or below, which is obviously "not right". The same histogram, corrected with the knowledge of an expectedIntervalBetweenValueSamples of 10msec will correctly represent the response time distribution. Only ~50% of results will be at 1 msec or below, with the remaining 50% coming from the auto-generated value records covering the missing increments spread between 10msec and 100 sec. Data sets recorded with and without an expectedIntervalBetweenValueSamples parameter will differ only if at least one value recorded with the recordValue method was greater than it's associated expectedIntervalBetweenValueSamples parameter. Data sets recorded with an expectedIntervalBetweenValueSamples parameter will be identical to ones recorded without it if all values recorded via the recordValue calls were smaller than their associated (and optional) expectedIntervalBetweenValueSamples parameters. When used for response time characterization, the recording with the optional expectedIntervalBetweenValueSamples parameter will tend to produce data sets that would much more accurately reflect the response time distribution that a random, uncoordinated request would have experienced. Footprint estimation Due to it's dynamic range representation, Histogram is relatively efficient in memory space requirements given the accuracy and dynamic range it covers. Still, it is useful to be able to estimate the memory footprint involved for a given highestTrackableValue and numberOfSignificantValueDigits combination. Beyond a relatively small fixed-size footprint used for internal fields and stats (which can be estimated as "fixed at well less than 1KB"), the bulk of a Histogram's storage is taken up by it's data value recording counts array. The total footprint can be conservatively estimated by: largestValueWithSingleUnitResolution = 2 * (10 ^ numberOfSignificantValueDigits); subBucketSize = roundedUpToNearestPowerOf2(largestValueWithSingleUnitResolution); expectedHistogramFootprintInBytes = 512 + ({primitive type size} / 2) * (log2RoundedUp((highestTrackableValue) / subBucketSize) + 2) * subBucketSize A conservative (high) estimate of a Histogram's footprint in bytes is available via the getEstimatedFootprintInBytes() method.
About
A High Dynamic Range (HDR) Histogram
Resources
License
Security policy
Stars
Watchers
Forks
Packages 0
No packages published