Skip to content

Latest commit

 

History

History
61 lines (46 loc) · 3.97 KB

README.md

File metadata and controls

61 lines (46 loc) · 3.97 KB

Random Forest Model for Malware Image Classification

Overview

This project implements a Random Forest model for classifying malware images based on visual features. The dataset consists of images from various malware families, and the task is to classify each image into its corresponding malware category. The model processes image data, extracts features, and uses a Random Forest classifier to predict the class label. This approach allows for effective malware detection based on image patterns.

Results

Model Accuracy:

The Random Forest Classifier achieves a high accuracy score, demonstrating its ability to classify malware images effectively.

Accuracy: 80% (calculated from the validation set)

Confusion Matrix:

The confusion matrix visualizes the performance of the classifier by comparing predicted and true labels across different classes. The confusion matrix below displays how well the model differentiates between various malware categories:

Classification Report:

The classification report provides detailed performance metrics such as precision, recall, and F1-score for each class, offering insight into the model's ability to identify malware categories.

Class Name Precision Recall F1-Score Support
AgentTesla 0.67 0.96 0.79 7797
Benign 0.78 0.28 0.41 105
CoinMinerXMRig 1.00 0.48 0.65 27
Danabot 0.77 0.84 0.81 212
Dridex 0.97 0.94 0.96 324
Formbook 0.70 0.35 0.47 3588
Gh0stRAT 1.00 0.11 0.20 37
Glupteba 0.88 0.61 0.72 62
Gozi 1.00 0.84 0.91 358
Heodo 0.99 0.99 0.99 8392
NanoCore 0.89 0.20 0.33 990
Quakbot 1.00 0.99 0.99 734
RecordBreaker 0.05 0.06 0.05 213
RedLineStealer 0.07 0.06 0.06 241
Remcos 0.83 0.34 0.48 980
Tinba 1.00 0.96 0.98 27
Trickbot 1.00 0.91 0.95 832
Zeus 1.00 0.20 0.33 82
Accuracy 0.80 25001
Macro avg 0.81 0.56 0.62 25001
Weighted avg 0.82 0.80 0.78 25001

Features

  • Image Preprocessing: The images are resized to a standard dimension of 64x64 pixels and normalized (values scaled between 0 and 1).
  • Model: The Random Forest model is used for classification, with 100 estimators (trees).
  • Feature Extraction: The image data is reshaped into a flat vector before being fed into the model.
  • Performance Metrics: The model is evaluated using accuracy, a confusion matrix, and a classification report.

Sprints

  • Sprint 1 - Data Preprocessing: Loaded and resized the images, normalized pixel values, and split the dataset into training and validation sets.
  • Sprint 2 - Model Training: Trained the Random Forest classifier on the preprocessed data.
  • Sprint 3 - Model Evaluation: Evaluated model performance using accuracy, confusion matrix, and classification report.

Conclusion

This project demonstrates the successful application of a Random Forest classifier for malware image classification. The model achieved a high level of accuracy and performed well across various malware categories. Future work could involve experimenting with other models (e.g., Convolutional Neural Networks) to further improve classification performance.