In this repository you will find a Python implementation of IoTDevID; a fingerprinting method for device identification.
Attention! There is an updated version of this work. Please see: IoTDevIDv2
This is the first version of IoTDevID . It is highly recommended that you check out the second version as well.
Device identification is one way to secure a network of IoT devices whereby devices identified as suspicious can subsequently be isolated from a network. We introduce a novel device identification (fingerprinting) method, IoTDevID, that uses machine learning to model the behaviour of IoT devices based on the network packets that they communicate. Our method uses an enhanced combination of features from previous work and includes an approach for dealing with unbalanced device data via data augmentation. We further demonstrate how to enhance device identification via a group-wise data aggregation. We provide a comparative evaluation of our method against two recent identification methods using five public IoT datasets ( Aalto University , UNSW-Sydney IEEE TMC , IoTFinder , UNSW-Sydney ACM SOSR*, and IoT Network Intrusion Dataset* ) which together contain data from over 100 devices, two of which include both benign and malicious data. Through our evaluation we demonstrate improved performance over previous results with F1 scores above 99%, with considerable improvement gained from data aggregation.
Python 3.6 was used to create the application files. Before running the files, it must be ensured that Python 3.6 and the following libraries are installed.
Library | Task |
---|---|
Scapy | Packet(Pcap) crafting |
Sklearn | Machine Learning & Data Preparation |
Imblearn | Data Augmentation |
Numpy | Mathematical Operations |
Pandas | Data Analysis |
Matplotlib | Graphics and Visuality |
Seaborn | Graphics and Visuality |
The technical features of the computer used for experiments are given below.
Central Processing Unit | : | Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz 2.90 GHz |
Random Access Memory | : | 8 GB (7.74 GB usable) |
Operating System | : | Windows 10 Pro 64-bit |
Graphics Processing Unit | : | AMD Readon (TM) 530 |
The implementation phase consists of 5 steps, which are:
- Fingerprinting
- Initial Fingerprint Method Evaluation
- Data Augmentation
- Augmentated and Aggregated Fingerprint Method Evaluation
- Malicious Device Dataset Evaluation*
Each of these steps contains one or more Python files. The same file was saved with both "py" and "ipynb" extensions. The code they contain is exactly the same. The file with the ipynb extension has the advantage of saving the state of the last run of that file and the screen output. Thus, screen output can be seen without re-running the files. Files with the ipynb extension can be run using the jupyter notebook program.
This step contains the 1.1 PCAP2CSV.ipynb file. This file converts the files with pcap extension to single packet-based, csv extension fingerprint files (IoT Sentinel, IoTSense, IoTDevID individual packet based feature sets) and makes labeling.
This step contains the 2.1 Classification of Individual packets for Aalto University Dataset file. This file makes machine learning application for individual packets for Aalto University and allows to compare 3 different featuresets (IoT Sentinel, IoTSense, IoTDevID individual packet based feature sets). It uses these algorithms: RF (Random Forest), NB (Naïve Bayes), kNN (k-Nearest Neighbours), GB (Gradient Boosting), DT (Decision Trees), and SVM (Support Vector Machine)
This step contains the 3.1 Data Augmentation.ipynb file. This file first divides the datasets into two as train and test. It then applies data augmentation for the required classes using resampling and SMOTE methods.
This step contains these 4 files:
4.1 Aalto university results with augmentation and aggregation.ipynb file makes machine learning (RF) application for augmented version of Aalto University dataset based individual packet level using IoTDevID method. It then produces results for 4 different group sizes (3, 6, 9, 12) using the packet aggregation method.
4.2 IoTfinder results with augmentation and aggregation file makes machine learning (RF) application for augmented version of IoTfinder dataset based individual packet level using IoTDevID method. It then produces results for 4 different group sizes (3, 6, 9, 12) using the packet aggregation method.
4.3 UNSW_benign_ results with augmentation and aggregation file makes machine learning (RF) application for augmented version of UNSW-Sydney IEEE TMC dataset based individual packet level using IoTDevID method. It then produces results for 4 different group sizes (3, 6, 9, 12) using the packet aggregation method.
4.4 Aalto university results with combined labels.ipynb file makes machine learning (RF) application for augmented version of Aalto University dataset based individual packet level using IoTDevID method. It then produces results for 4 different group sizes (3, 6, 9, 12) using the packet aggregation method. However, in this file, very similar devices are considered as a group in the Aalto University dataset and collected under the same label.
This step contains the 5.1 UNSW_Malicious_ results with augmentation and aggregation file. This file makes machine learning (RF) application for UNSW-Sydney ACM SOSR and IoT Network Intrusion datasets based individual packet level using IoTDevID method. It then produces results for 4 different group sizes (3, 6, 9, 12) using the packet aggregation method. However, unlike other steps, this step contains benign and malicious data produced by the same devices. The purpose is not to prevent these attacks, but to show that the device can be detected if it behaves differently. Therefore, not all data of malicious datasets are used. The data used includes only cases where IoT devices are attacker. Before creating the fingerprint for this process, we parsed the pcap files as benign and malicious, and then extracted the fingerprints. The information required for the filtering process are clearly stated on the datasets website. You can perform these operations using Wireshark. You can also use tshark-filter to automate this process.
The processed datasets are shared in depository. However, raw versions of the datasets used in the study and their addresses are given below.
Dataset | capture year | Number of Devices | Type |
---|---|---|---|
Aalto University | 2016 | 31 | Benign |
UNSW-Sydney IEEE TMC | 2016 | 31 | Benign |
IoTFinder | 2018 | 51 | Benign |
UNSW-Sydney ACM SOSR* | 2018 | 28 | Benign & Malicious |
IoT Network Intrusion Dataset* | 2019 | 2 | Benign & Malicious |
This project is licensed under the MIT License - see the LICENSE file for details
If you use the source code please cite the following paper:
Kahraman Kostas, Mike Just, and Michael A. Lones. IoTDevID: A behaviour-based fingerprintingmethod for device identification in the IoT, arXiv preprint, arxiv:2102.08866, 2021.
@misc{kostas2021iotdevid1,
title={{IoTDevID}: A Behaviour-Based Fingerprinting Method for Device Identification in the {IoT}},
author={Kahraman Kostas and Mike Just and Michael A. Lones},
year={2021},
eprint={2102.08866v1},
archivePrefix={arXiv},
primaryClass={cs.CR}
}
Contact: Kahraman Kostas kahramankostas@gmail.com
*Items with the * sign are not included in the paper. They have been prepared for a longer version of it.