This project evaluates the impact of classification models when using obfuscation techniques on a malware dataset, aiming to quantify the resilience of these models against evolving malware threats. The findings provide valuable insights to inform the development of more robust, obfuscation-resilient approaches for enhancing cybersecurity.
- Aryal, K., Gupta, M., & Abdelsalam, M. (2021). A survey on adversarial attacks for malware analysis. arXiv preprint arXiv:2111.08223.
The script performs the following key tasks:
-
Data Loading and Preprocessing:
- Loads category and metadata datasets from CSV files.
- Applies various levels of obfuscation to the data.
- Implements a deobfuscation step to compare obfuscated and deobfuscated data.
-
Model Training:
- Constructs a neural network model using TensorFlow/Keras.
- Utilizes a custom
KerasClassifierWrapper
for compatibility with scikit-learn. - Optimizes model hyperparameters through grid search.
-
Evaluation and Visualization:
- Performs cross-validation using
cross_validate_model
. - Evaluates the model's performance on test sets for each obfuscation level and data type (obfuscated/deobfuscated).
- Calculates comprehensive metrics: accuracy, ROC AUC, precision, recall, F1 score.
- Plots training and validation accuracy to visualize the effects of obfuscation on model performance.
- Conducts statistical significance testing to compare obfuscated and deobfuscated results.
- Performs cross-validation using
BODMAS dataset:
Yang, L., Ciptadi, A., Laziuk, I., Ahmadzadeh, A., & Wang, G. (2021). BODMAS: An Open Dataset for Learning based Temporal Analysis of PE Malware. 4th Deep Learning and Security Workshop.
bodmas_malware_category.csv
: Contains malware samples with categories.bodmas_metadata.csv
: Contains metadata associated with malware samples.
BODMAS is a dataset released in collaboration with Blue Hexagon, containing 57,293 malware samples and 77,142 benign samples collected between August 2019 and September 2020, with detailed family information for research purposes.
- Implements feature extraction methods.
- Applies obfuscation techniques (noise addition, random characters, feature shuffling).
- Includes a deobfuscation step for comparison.
- Uses a neural network with customizable layers and learning rates.
- Employs grid search to find the best hyperparameters.
- Utilizes
KerasClassifierWrapper
for scikit-learn compatibility.
- Performs cross-validation for robust performance evaluation.
- Calculates comprehensive metrics (accuracy, ROC AUC, precision, recall, F1 score).
- Visualizes training and validation accuracy across different obfuscation levels for both obfuscated and deobfuscated data.
- Conducts statistical significance testing to compare results.
Statistical significance testing was conducted to compare the model's performance between obfuscated and deobfuscated datasets. In the test, an F-statistic of 1.588
was calculated, with a p-value of 0.254
. This suggests that the differences in performance between the obfuscated and deobfuscated datasets are not statistically significant, indicating that obfuscation did not have a large impact on the model's ability to classify the malware.
The implementation provides:
- Quantification of obfuscation impact on malware classification accuracy.
- Comparison between obfuscated and deobfuscated data performance.
- Optimized model hyperparameters through grid search.
- Insights into model robustness across varying obfuscation levels.
- Statistical analysis of the differences between obfuscated and deobfuscated results.
The table below summarizes the performance of the model across different levels of obfuscation and deobfuscation:
Obfuscation Level | Data Type | Accuracy | ROC AUC | Precision | Recall | F1 Score |
---|---|---|---|---|---|---|
none | obfuscated | 0.9999 | 1.0000 | 0.9997 | 1.0000 | 0.9998 |
none | deobfuscated | 0.9995 | 0.9999 | 0.9982 | 1.0000 | 0.9991 |
low | obfuscated | 0.9999 | 0.9999 | 0.9997 | 1.0000 | 0.9998 |
low | deobfuscated | 0.9999 | 0.9999 | 0.9997 | 1.0000 | 0.9998 |
medium | obfuscated | 0.9999 | 0.9999 | 0.9997 | 1.0000 | 0.9998 |
medium | deobfuscated | 0.9998 | 0.9999 | 0.9994 | 1.0000 | 0.9997 |
high | obfuscated | 0.9999 | 0.9999 | 0.9997 | 1.0000 | 0.9998 |
high | deobfuscated | 0.9999 | 0.9999 | 0.9997 | 1.0000 | 0.9998 |
These results demonstrate that the model performs consistently well across different obfuscation levels. The minor variations in accuracy, precision, recall, and F1 score indicate a high robustness of the model, suggesting that the obfuscation techniques used do not significantly hinder the model's ability to classify malware accurately.
This project and its associated code are provided for educational and research purposes only. The effectiveness of the techniques demonstrated may vary depending on the specific characteristics of the data and the implementation details. Use of the code in production environments or for critical applications should be approached with caution and proper validation. The author is not responsible for any issues or damages that may arise from the use of this code.
Copyright 2024 Eric Yocam
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.