The dataset is a .h5 file containing both the feature vector and metadata as a single Dataframe which can be loaded using pandas (key='xy').
~23k samples (inclusive of malicious and benign)
~14k samples from malware families not seen in BODMAS
Link to dataset(s): https://drive.google.com/drive/folders/1KGeUYS7SKCJprYJQoqPqew0DXnaDGQgz?usp=sharing
If you are interested in using the framework demonstrated in the study and/or would like to access the original malware samples for this dataset; kindly drop a message to 4thdsec@gmail.com using your work/academic institution email.