Skip to content

carolhanna01/binary2name

Repository files navigation

Binary2Name

Automatic detection for binary code functionality

This project was devoloped by Carol Hanna and Abdallah Yassin as a part of the Project in Computer Security course at Technion. Project Advisor: Dr. Gabi Nakibly.

Introduction:

The main motivation for this project is to be a helpful tool for researchers of binary code. We started with binary datasets as input and used Angr, a symbolic analysis tool to get intermediate representation of the code. From there, came the most extensive step in the project which was to preprocess the intermediate code in preparation to be used as input to a neural network. We used a deep neural network adopted from code2seq, which is intended for the same goal but on source code as input instead of binaries.

We suggest reading our report about this project here before starting to run the code.

Getting started:

Requirements:

-   python3
-   rouge package, version 0.3.2
-   TensorFlow, version 1.13 (pip install rouge==0.3.2)

Full preprocessing and training:

Extarct our datasets:

cd our_dataset/

tar -xzf <dataset_name>.tar.gz

Preprocessing:

We have more than one model to preprocess the data (<model_name>_main.py files). First, change the run_exps.sh file to run the desired model (default is path with constraints).

run_exps.sh <dataset name: coreutils_ds|dpdk_linux_ds|gnu_dataset>

code2seq training:

cd code2seq

./train.sh

Get the best results quickly - TBD:

We have uploaded our best models, with the preprocessed data. To run it automatically follow:

cd code2seq

continue_best_model.sh --dataset=<coreutils|coreutils_dpdk>

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages