-
Notifications
You must be signed in to change notification settings - Fork 34
How does ADE organize information
ADE organizes the information for Linux logs into time slices and groups of similar systems.
ADE divides the log into days (periods) and then divides each period into time slices (intervals). It then organized the data by those time slices and applies the statistical techniques against the time slice or combinations of consecutive time slices.
ADE divides the logs into periods which by default are 24 hours long. ADE starts each period at mid night. This can be either UTC or system time depending on the system is configured. The default configuration is to use UTC.
ADE divides the periods into time slices and calls the time slices intervals. There are two reasons why ADE divides the logs into time slices.
- The statistical techniques used by ADE require that messages written to Linux logs be grouped into time slices.
- ADE assumes that the order in which messages are written to a log is not deterministic. That is message C from process 31 could arrive before or after message B from process 103 depending on how those processes were dispatched.
There are three different intervals used by ADE
- Analysis interval
- Analysis snapshot
- Database interval
ADE combines the messages that occurred during a time slice called an analysis interval when it calculates the anomaly score for that interval and the anomaly score for the message that occur during the interval. The;analysis interval used by ADE is specified in the flowlayout.xml
The following specifies that every ten minutes upload updates the
database. The TrainingIntervalFactor specified that train uses six intervals (60 minutes).
The following specifies that every ten minuses analysis runs using the
current ten minute interval and including the previous five intervals.
For analysis and train the AnalysisFramingFlow specifies how the
logs a -stream of data- is split into intervals. The
AnalysisFramingFlow tns "oneHour" points to the description of the
FramingFlow:
<tns:FramerProperty Key="Permanent_Split_Factor" Value="6" /><!--60 minutes will be split into 6 permanent XML output, which is 10 minutes per output-->
<tns:FramerProperty Key="Temporary_Split_Factor" Value="5" /><!--10 minutes will be split into 5 temporary XML output, which is 2 minutes per output-->
</tns:FramingFlow>
The default flowlayout values provided with ADE were chosen to provide reasonable results for a very wide range on Linux systems. They are based on the analysis of a significant number of production Linux systems.
During analysis ADE writes the results for every ten minutes (snapshot). The snapshot contains the current analysis interval. That is every interval written to the file system contains the current ten minutes and the previous fifty minutes.
The flowlayout.xml specifies that the outputters which update the database slice the data into ten minute intervals.