-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Useful malware features #5
Comments
The very basic implementation of the above features is complete. |
Hi So-Cool, I read your blog post on this issue: The dataset that you distributed in another blog post does not contain the API call sequences, I'm not sure why that is. But if you run a sample through Cuckoo you will get access to the calls a process makes in the following way:
I'm currently attempting to implement the ideas expressed in the paper mentioned above on API call sequences, would be glad to discuss approaches for feature construction (how to represent an API call and a sequence of three) and how to vectorize it. Are you still working on CuckooML? |
@dueland you might want to check scikit-learn CountVectorizer |
@hgascon thanks for the link. I propose the following:
Can you spot any shortcomings with that approach? And is this approach using what is known as the hashing trick? UPDATE:
|
Sounds good. I also had an idea to build a transition network with weights representing number of transition of given type seen so far, but it's probably a bit more complicated. |
The base of ML features for binaries analysed by Cuckoo is going to be inspired by Reviewer Integration and Performance Measurement for Malware Detection by B Miller et al (available here).
They name all kind of binary features both static and dynamic which seems a good starting point for this project:
Once implemented they should be reviewed and revised with regard to usability for this project.
The text was updated successfully, but these errors were encountered: