Skip to content

How does ADE organize information

Jim Caffrey edited this page Mar 22, 2016 · 1 revision

How ADE organize the information for analysis

ADE organizes the information for Linux logs into time slices and groups of similar systems.

ADE divides the log into days (periods) and then divides each period into time slices (intervals). It then organized the data by those time slices and applies the statistical techniques against the time slice or combinations of consecutive time slices.

Days (periods)

ADE divides the logs into periods which by default are 24 hours long. ADE starts each period at mid night. This can be either UTC or system time depending on the system is configured. The default configuration is to use UTC.

Time slices (intervals)

ADE divides the periods into time slices and calls the time slices intervals. There are two reasons why ADE divides the logs into time slices.

  • The statistical techniques used by ADE require that messages written to Linux logs be grouped into time slices.
  • ADE assumes that the order in which messages are written to a log is not deterministic. That is message C from process 31 could arrive before or after message B from process 103 depending on how those processes were dispatched.

There are three different intervals used by ADE

  • Analysis interval
  • Analysis snapshot
  • Database interval

Analysis Interval

ADE combines the messages that occurred during a time slice called an analysis interval when it calculates the anomaly score for that interval and the anomaly score for the message that occur during the interval. The;analysis interval used by ADE is specified in the flowlayout.xml

The following specifies that every ten minutes upload updates the database.  The TrainingIntervalFactor specified that train uses six intervals (60 minutes).

<tns:UploadFramingFlow xmlns:tns="http://flow.impl.ade.openmainframe.org/factory">tenMinutesTrain</tns:UploadFramingFlow>
<tns:TrainingIntervalFactor xmlns:tns="http://flow.impl.ade.openmainframe.org/factory">6</tns:TrainingIntervalFactor>

<tns:FramingFlow consecutive="true" duration="600000" name="tenMinutesTrain" databaseId="0" xmlns:tns="http://flow.impl.ade.openmainframe.org/factory"><tns:FramerClass>ConsecutiveTimeFramer</tns:FramerClass></tns:FramingFlow>

The following specifies that every ten minuses analysis runs using the current ten minute interval and including the previous five intervals. For analysis and train the AnalysisFramingFlow specifies how the logs a -stream of data- is split into intervals. The AnalysisFramingFlow tns "oneHour" points to the description of the FramingFlow:

<tns:AnalysisFramingFlow xmlns:tns="http://flow.impl.ade.openmainframe.org/factory">oneHour</tns:AnalysisFramingFlow>  
<tns:FramingFlow consecutive="true" duration="3600000" name="oneHour" databaseId="6" xmlns:tns="http://flow.impl.ade.openmainframe.org/factory"><tns:FramerClass>ContinuousTimeFramer</tns:FramerClass>
<tns:FramerProperty Key="Permanent_Split_Factor" Value="6" /><!--60 minutes will be split into 6 permanent XML output, which is 10 minutes per output-->
<tns:FramerProperty Key="Temporary_Split_Factor" Value="5" /><!--10 minutes will be split into 5 temporary XML output, which is 2 minutes per output-->
</tns:FramingFlow>

The default flowlayout values provided with ADE were chosen to provide reasonable results for a very wide range on Linux systems.  They are based on the analysis of a significant number of production Linux systems.

Analysis Snapshot

During analysis ADE writes the results for every ten minutes (snapshot). The snapshot contains the current analysis interval. That is every interval  written to the file system contains the current ten minutes and the previous fifty minutes.

Database interval

The flowlayout.xml specifies that the outputters which update the database slice the data into ten minute intervals.