System's main goal is to create machine learning models for anomaly detection on UAVs.
The system allows creation and loading of machine learning models by using dynamic inputs. In the next stage, each model will classify anomalies in the test observations.
Moreover, the system displays different output plots and evaluation metrics which compare between different models and the diagnosis of anomalies which were found.
Running the system with dynamic parameters will allow us to extract many different machine learning models.
Comparing them based on different evaluation metrics will lead to obtaining the best machine learning models for anomaly detection.
Those models will be used as a baseline for a real-time & light-weight anomaly detection algorithm based on streaming data from UAV sensors
in to order to get the earliest possible detection of GPS spoofing attacks on UAV’s.
Various uses of drones can be found in a variety of fields:
- Agriculture : accurate and cheap spraying.
- Security : used for patrolling and following suspects in real time.
- Rescue : locating distressed people.
- Military : intelligence operational activities.
- Commerce : From food deliveries, through improved medical services to aerial photography.
- Unmanned Aerial Systems (UAS) is vulnerable to different cyber-attacks such as GPS spoofing. In GPS spoofing attack, a malicious user transmits fake signals to the GPS receiver in the UAS.
- GPS spoofing attacks are aimed at stealing or crashing a UAV by misleading it to a different path than the original course planned by the operator.
You should run the command (before run the system) in the console:
pip install -r requirements.txt
** See explanation below - Requirements File
Train directory - should be in th following structure:
- Chosen directory
- Route_Name_1 - directory
- without_anom.csv
- without_anom.csv
- Route_Name_2 - directory
- without_anom.csv
- without_anom.csv
- Route_Name_1 - directory
Test directory - should be in th following structure:
- Chosen directory
- Route_Name_1 - directory
- Attack_Name_1 - directory
- sensors_0.csv
- sensors_0.csv
- Attack_Name_2 - directory
- sensors_0.csv
- sensors_0.csv
- Attack_Name_3 - directory
- sensors_0.csv
- sensors_0.csv
- Attack_Name_4 - directory
- sensors_0.csv
- sensors_0.csv
- Attack_Name_1 - directory
- Route_Name_2 - directory
- Attack_Name_1 - directory
- sensors_0.csv
- sensors_0.csv
- Attack_Name_2 - directory
- sensors_0.csv
- sensors_0.csv
- Attack_Name_3 - directory
- sensors_0.csv
- sensors_0.csv
- Attack_Name_4 - directory
- sensors_0.csv
- sensors_0.csv
- Attack_Name_1 - directory
- Route_Name_1 - directory
Results directory - any directory to save all model configurations and results.
First, you should clone the project to your local environment:
git clone https://github.com/liorpizman/AnomalyDetection.git
Run 'guiController.py' file in order to run the system.
Choose Between two different option:
Insert simulated data / ADS-B data set input files
Select algorithms for which you want to build anomaly detection models
Select the values for each of the following parameters or run a GridSearch with chosen parameters
The decision function of the GridSearch is max AUC and min Delay
Please choose both input and target features
** See next step under the title: Both Flows - similarity functions step
Insert input files for existing model
Insert paths for existing models
View only - Hyper parameters from existing models
** See next step under the title: Both Flows - similarity functions step
Choose similarity functions from the following options
Choose an algorithm and a flight route in order to get the results table or image plots
Click on 'Export table' button to export the results table or click on 'Display & Export best algorithm params' button to export the best params for of the GridSearch
Click on 'Export to PNG' button to export the image plots
- LSTM - Long Short-Term Memory
- SVR - Support Vector Regression
- Random Forest
- MLP Neural Network
Algorithm | Description |
---|---|
LSTM | An artificial recurrent neural network (RNN) architecture used in the field of deep learning. |
SVR | A popular machine learning tool for classification and regression. |
Random Forest | Are supervised ensemble-learning models used for classification and regression. |
MLP Neural Network | Multi-layer Perceptron regressor. This model optimizes the squared-loss using LBFGS or stochastic gradient descent. |
Data Set | Description |
---|---|
Train Set | Records containing sensors' values for non-anomalous drone flights. |
Test Set | Records containing sensors' values for flights that have been attacked in various predefined attacks. |
Attack | Description |
---|---|
Up attack | Try crushing the drone by changing his height sensor data in the dataset. |
Down attack | An attempt to raise the drone up and get him out of his real route. |
Fore attack | Randomly change sensors’ values. |
Random attack | Injection of real sensors’ data from another flight to current flight. |
Attack | Description |
---|---|
Constant attack | Constant height and constant velocity. |
Changing valocity attack | Constant height and changing velocity. |
Changing height attack | Changing height and constant velocity. |
Mixed attack | Changing height and changing velocity. |
Regression algorithms are not intended for time series predicting. Therefore, in order to make a prediction of a record based on N previous records, we will need to change the data. The data will be changed by taking the previous N records and flattening them into a vector.
Assume that the following data matrix exists: (We will mark each line with different color for convenience)
Now, let's assume we want to process this matrix to fit time series prediction problem.
We will define the window size to be 2 - that means, each record will be predicted by using 2 previous records.
For example: to predict the fourth record, we need to use records 2 and 3.
In order to do it, we should combine each record with the following record - that means, combine records 1 and 2, combine records 2 and 3, and combine records 3 and 4.
The following table will be used as training vectors:
The training vectors should look like this:
Another example with window size = 3
Example:
Algorithm: SVR
Similarity function: Cosine similarity
Route: Cross route
Normal behavior - green dots
Spoofed path - black dots
Good model prediction example:
Good model prediction example:
Good model prediction example:
Good model prediction example:
- Imbalanced data sets - the amount of data about attacks is very small compared to drone's regular behavior data.
- Duration of the attack detection - the true detection rate of GPS attacks will be high (TPR) but the duration of the attack detection will be long so the drone will be abducted even though the attack was detected.
- Results expectations – machine learning models results can be different from our initial expectations.
- Keras - the Python Deep Learning library.
- Scikit-learn - is an open source machine learning library that supports supervised and unsupervised learning.
- ADS-B data sets - automatic dependent surveillance – broadcast - data sets
- Simulated data sets - data sets which are generated by a simulator
In order to create requirements.txt file we used pipreqs package.
pipreqs - Generate pip requirements.txt file based on imports of any project. (Automatically generate python dependencies)
Why not use pip freeze ?
As the github repo of pipreqs says: pipreqs Github repo
- pip freeze saves all packages in the environment including even those that you don't use in your current project.
- pip freeze is harmful. Dependencies may be deprecated as our libraries are updated, but will then be left in our requirements.txt file with no good reason, polluting our dependency list.
See the article $ pip freeze > requirements.txt considered harmful
- PyCharm - the Python IDE for Professional Developers
- Flightradar24 - ADS-B data sets
See also the list of contributors who participated in this project.
1. run_models
Inputs | algorithm, similarity function, test set path, results path, indicator - new/existing model
Description | Execute models creation/loading process
Output | Plots + evaluation metrics per attack and algorithm
2. toggle_results
Inputs | selected_algorithm, selected_flight_route, selected_similarity_function
Description | Toggle permutation of results
Output | A results table permutation (algorithm, flight route and similarity function)
3. time_series_split
Inputs | X, test_size=.2, number=False, output_numpy=True
Description | Splits a dataset according to the time the data was taken
Output | X_train, X_test
4. clean_data
Inputs | Data
Description | Clean the data by different steps as part of data pre-processing
Output | Clean data
5. anomaly_score_multi
Inputs | input vectors, output vectors, similarity function
Description | Calculate the anomaly of a multiple output prediction
Output | anomaly score based on the similarity function