Log parsing and feature extraction was performed using the LogParser and Loglizer libraries. Principle component analysis is used for dimensionality reduction, and models developed include logistic regression, KNN, Quadratic Discriminant Analysis, Random Forests, and Gradient Boosting Classification Trees, implemented in scikit-learn in Python. A tuned gradient boosting classifier is selected, and attains a weighted binary cross entropy loss of 0.0049 and an F1 score of 0.9978 on withheld 15% test set.
-
Notifications
You must be signed in to change notification settings - Fork 2
ericphillips99/hdfs_log_anomaly_detection
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Automated anomaly detection on HDFS (Hadoop Distributed File System) log files
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published