AndMal-Detect

Android Malware Detection using Function Call Graphs and Graph Convolutional Networks

What?

A research work carried out by me (Vinayaka K V) during MTech (Research) degree in Department of IT, NITK.

The objectives of the research were:

To evaluate whether GCNs were effective in detecting Android Malware using FCGs, and which GCN algorithm is best for this task.
To enhance the FCGs by incorporating the callback information obtained from the framework code, and evaluate them against the normal FCGs

Code organization

The code achieving first objective is present at master (current) branch, while the code achiving second objective is present at experiment branch.

Methodology

Datasets

Stored in the /data folder. Currently, it contains SHA256 of the APKs containing in training and testing splits.

APK Size Balancer

Obtains the histogram of APK sizes, adds APKs wherever there is a huge imbalance between the number of APKs between classes.

Note: The provided dataset is already APK Size balanced 🥳

FCG Extractor

Implemented in scripts/process_dataset.py.

The class FeatureExtractors provides two public methods:

get_user_features() - Returns 15-bit feature vector for internal methods
get_api_features() - Returns a one-hot feature vector for external methods

The method process extracts the FCG and assignes node features.

Node Count Balancer

Balances the dataset so that the node count distribution of the APKs between the classes is exactly the same.

Implemmented in scripts/split_dataset.py.

Note: The provided dataset is already node-count balanced to ensure reproducibility 🤩

GCN Classifier

Multi-layer GCN with dense layer at the end.

Implemented in core/model.py

The Execution Pipeline

Obtain the APKs ug
given SHA256 from AndroZoo
Build the container (either singularity or docker), and get into its shell

Run scripts/process_dataset.py[scripts/process_dataset.py] on the downloaded dataset

 python process_dataset.py \
     --source-dir <source_directory> \
     --dest-dir <dest_directory> \
     --override # If you want to oveeride existing processed files \
     --dry # If you want to perform a dry run

Train the model! For configuration, refer to the section below.

python train_model.py

Configuration

The configuration is achieved using Hydra. Look into config/conf.yaml for available configuration options.

Any configuration option can be overridden in the command line. As an example, to change the number of convolution layers to 2, invoke the program as

python train_model.py model.convolution_count=2

You can also perform a sweep, for example,

    python train_model.py \
        model.convolution_count=0,1,2,3 \
        model.convolution_algorithm=GraphConv, SAGEConv, TAGConv, SGConv, DotGatConv \
        features=degree, method_attributes, method_summary

to train the model in all possible configurations! 🥳

Stack

androguard - For FCG extraction and Feature assignment
pytorch - for Neural networks
dgl - for GCN modules
pytorch-lightning - for organization and pipeline 💖
hydra - for configuring experiments
wandb - for tracking experiments 🔥

Cite as

The research paper corresponding to this work is available at IEEE Xplore. If you find this work helpful and use it, please cite it as

    @INPROCEEDINGS{9478141,
            author={V, Vinayaka K and D, Jaidhar C},
            booktitle={2021 2nd International Conference on Secure Cyber Computing and Communications (ICSCCC)},
            title={Android Malware Detection using Function Call Graph with Graph Convolutional Networks},
            year={2021},
            volume={},
            number={},
            pages={279-287},
            doi={10.1109/ICSCCC51823.2021.9478141}
    }

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
assets		assets
config		config
core		core
data		data
metadata		metadata
notebooks		notebooks
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Plots.ipynb		Plots.ipynb
malware-learning.def		malware-learning.def
readme.md		readme.md
requirements.txt		requirements.txt
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AndMal-Detect

What?

Code organization

Methodology

Datasets

APK Size Balancer

FCG Extractor

Node Count Balancer

GCN Classifier

The Execution Pipeline

Configuration

Stack

Cite as

About

Languages

License

vinayakakv/android-malware-detection

Folders and files

Latest commit

History

Repository files navigation

AndMal-Detect

What?

Code organization

Methodology

Datasets

APK Size Balancer

FCG Extractor

Node Count Balancer

GCN Classifier

The Execution Pipeline

Configuration

Stack

Cite as

About

Topics

Resources

License

Stars

Watchers

Forks

Languages