Skip to content

leovidith/Malware-Images-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Random Forest Model for Malware Image Classification

Overview

This project implements a Random Forest model for classifying malware images based on visual features. The dataset consists of images from various malware families, and the task is to classify each image into its corresponding malware category. The model processes image data, extracts features, and uses a Random Forest classifier to predict the class label. This approach allows for effective malware detection based on image patterns.

Results

Model Accuracy:

The Random Forest Classifier achieves a high accuracy score, demonstrating its ability to classify malware images effectively.

Accuracy: 80% (calculated from the validation set)

Confusion Matrix:

The confusion matrix visualizes the performance of the classifier by comparing predicted and true labels across different classes. The confusion matrix below displays how well the model differentiates between various malware categories:

Classification Report:

The classification report provides detailed performance metrics such as precision, recall, and F1-score for each class, offering insight into the model's ability to identify malware categories.

Class Name Precision Recall F1-Score Support
AgentTesla 0.67 0.96 0.79 7797
Benign 0.78 0.28 0.41 105
CoinMinerXMRig 1.00 0.48 0.65 27
Danabot 0.77 0.84 0.81 212
Dridex 0.97 0.94 0.96 324
Formbook 0.70 0.35 0.47 3588
Gh0stRAT 1.00 0.11 0.20 37
Glupteba 0.88 0.61 0.72 62
Gozi 1.00 0.84 0.91 358
Heodo 0.99 0.99 0.99 8392
NanoCore 0.89 0.20 0.33 990
Quakbot 1.00 0.99 0.99 734
RecordBreaker 0.05 0.06 0.05 213
RedLineStealer 0.07 0.06 0.06 241
Remcos 0.83 0.34 0.48 980
Tinba 1.00 0.96 0.98 27
Trickbot 1.00 0.91 0.95 832
Zeus 1.00 0.20 0.33 82
Accuracy 0.80 25001
Macro avg 0.81 0.56 0.62 25001
Weighted avg 0.82 0.80 0.78 25001

Features

  • Image Preprocessing: The images are resized to a standard dimension of 64x64 pixels and normalized (values scaled between 0 and 1).
  • Model: The Random Forest model is used for classification, with 100 estimators (trees).
  • Feature Extraction: The image data is reshaped into a flat vector before being fed into the model.
  • Performance Metrics: The model is evaluated using accuracy, a confusion matrix, and a classification report.

Sprints

  • Sprint 1 - Data Preprocessing: Loaded and resized the images, normalized pixel values, and split the dataset into training and validation sets.
  • Sprint 2 - Model Training: Trained the Random Forest classifier on the preprocessed data.
  • Sprint 3 - Model Evaluation: Evaluated model performance using accuracy, confusion matrix, and classification report.

Conclusion

This project demonstrates the successful application of a Random Forest classifier for malware image classification. The model achieved a high level of accuracy and performed well across various malware categories. Future work could involve experimenting with other models (e.g., Convolutional Neural Networks) to further improve classification performance.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published