Anomaly Detection of GPS Spoofing Attacks on UAVs

System's main goal is to create machine learning models for anomaly detection on UAVs. The system allows creation and loading of machine learning models by using dynamic inputs. In the next stage, each model will classify anomalies in the test observations.

Moreover, the system displays different output plots and evaluation metrics which compare between different models and the diagnosis of anomalies which were found. Running the system with dynamic parameters will allow us to extract many different machine learning models. Comparing them based on different evaluation metrics will lead to obtaining the best machine learning models for anomaly detection.

Those models will be used as a baseline for a real-time & light-weight anomaly detection algorithm based on streaming data from UAV sensors in to order to get the earliest possible detection of GPS spoofing attacks on UAV’s.

Background & Motivation

Various uses of drones can be found in a variety of fields:

Agriculture : accurate and cheap spraying.
Security : used for patrolling and following suspects in real time.
Rescue : locating distressed people.
Military : intelligence operational activities.
Commerce : From food deliveries, through improved medical services to aerial photography.

What is GPS Spoofing ? And how it is harmful?

Unmanned Aerial Systems (UAS) is vulnerable to different cyber-attacks such as GPS spoofing. In GPS spoofing attack, a malicious user transmits fake signals to the GPS receiver in the UAS.
GPS spoofing attacks are aimed at stealing or crashing a UAV by misleading it to a different path than the original course planned by the operator.

Crashed drone

Kidnapped drone

System flow

Prerequisites

You should run the command (before run the system) in the console:

pip install -r requirements.txt

** See explanation below - Requirements File

Directories Structure

Train directory - should be in th following structure:

Chosen directory
- Route_Name_1 - directory
  - without_anom.csv
- Route_Name_2 - directory
  - without_anom.csv

Test directory - should be in th following structure:

Chosen directory
- Route_Name_1 - directory
  - Attack_Name_1 - directory
    - sensors_0.csv
  - Attack_Name_2 - directory
    - sensors_0.csv
  - Attack_Name_3 - directory
    - sensors_0.csv
  - Attack_Name_4 - directory
    - sensors_0.csv
- Route_Name_2 - directory
  - Attack_Name_1 - directory
    - sensors_0.csv
  - Attack_Name_2 - directory
    - sensors_0.csv
  - Attack_Name_3 - directory
    - sensors_0.csv
  - Attack_Name_4 - directory
    - sensors_0.csv

Results directory - any directory to save all model configurations and results.

Getting Started

First, you should clone the project to your local environment:

git clone https://github.com/liorpizman/AnomalyDetection.git

Run 'guiController.py' file in order to run the system.

Choose Between two different option:

First Flow - Create new machine learning model

Insert simulated data / ADS-B data set input files

Select algorithms for which you want to build anomaly detection models

Select the values for each of the following parameters or run a GridSearch with chosen parameters
The decision function of the GridSearch is max AUC and min Delay

Please choose both input and target features

** See next step under the title: Both Flows - similarity functions step

Second Flow - Load existing machine learning model

Insert input files for existing model

Insert paths for existing models

View only - Hyper parameters from existing models

** See next step under the title: Both Flows - similarity functions step

Both Flows - similarity functions step

Choose similarity functions from the following options

Loading model, please wait...

Choose an algorithm and a flight route in order to get the results table or image plots

Click on 'Export table' button to export the results table or click on 'Display & Export best algorithm params' button to export the best params for of the GridSearch

Click on 'Export to PNG' button to export the image plots

Generated Machine Learning Models

LSTM - Long Short-Term Memory
SVR - Support Vector Regression
Random Forest
MLP Neural Network

Algorithm	Description
LSTM	An artificial recurrent neural network (RNN) architecture used in the field of deep learning.
SVR	A popular machine learning tool for classification and regression.
Random Forest	Are supervised ensemble-learning models used for classification and regression.
MLP Neural Network	Multi-layer Perceptron regressor. This model optimizes the squared-loss using LBFGS or stochastic gradient descent.

Train & Test Explained

Data Set	Description
Train Set	Records containing sensors' values for non-anomalous drone flights.
Test Set	Records containing sensors' values for flights that have been attacked in various predefined attacks.

GPS Spoofing Attacks - ADS-B Data Sets

Attack	Description
Up attack	Try crushing the drone by changing his height sensor data in the dataset.
Down attack	An attempt to raise the drone up and get him out of his real route.
Fore attack	Randomly change sensors’ values.
Random attack	Injection of real sensors’ data from another flight to current flight.

GPS Spoofing Attacks - Simulated Data Sets

Attack	Description
Constant attack	Constant height and constant velocity.
Changing valocity attack	Constant height and changing velocity.
Changing height attack	Changing height and constant velocity.
Mixed attack	Changing height and changing velocity.

Anomaly detection process

Time Series Regression

Regression algorithms are not intended for time series predicting. Therefore, in order to make a prediction of a record based on N previous records, we will need to change the data. The data will be changed by taking the previous N records and flattening them into a vector.

Assume that the following data matrix exists: (We will mark each line with different color for convenience)

Now, let's assume we want to process this matrix to fit time series prediction problem.
We will define the window size to be 2 - that means, each record will be predicted by using 2 previous records.
For example: to predict the fourth record, we need to use records 2 and 3.
In order to do it, we should combine each record with the following record - that means, combine records 1 and 2, combine records 2 and 3, and combine records 3 and 4.

The following table will be used as training vectors:

The training vectors should look like this:

Another example with window size = 3

Metrics Comparison Results Table

Example:
Algorithm: SVR
Similarity function: Cosine similarity
Route: Cross route

Outlier Score Testing Results - Visual Illustration

Normal behavior - green dots
Spoofed path - black dots

LSTM - Results Example

Good model prediction example:

Bad model prediction example:

SVR - Results Example

Good model prediction example:

Bad model prediction example:

Random Forest - Results Example

Good model prediction example:

Bad model prediction example:

MLP Neural Network - Results Example

Good model prediction example:

Bad model prediction example:

Sensor value - Actual vs. Predicted - Results - Visual Illustration

LSTM - Results Example

Good test prediction example:

Bad test prediction example:

SVR - Results Example

Good test prediction example:

Bad test prediction example:

Random Forest - Results Example

Good test prediction example:

Bad test prediction example:

MLP Neural Network - Results Example

Good test prediction example:

Bad test prediction example:

Receiver Operating Characteristic (ROC)

MLP - Results Example

High AUC

Low AUC

Research Risks

Imbalanced data sets - the amount of data about attacks is very small compared to drone's regular behavior data.
Duration of the attack detection - the true detection rate of GPS attacks will be high (TPR) but the duration of the attack detection will be long so the drone will be abducted even though the attack was detected.
Results expectations – machine learning models results can be different from our initial expectations.

Python Libraries We Used

Keras - the Python Deep Learning library.
Scikit-learn - is an open source machine learning library that supports supervised and unsupervised learning.

Data Sets

ADS-B data sets - automatic dependent surveillance – broadcast - data sets
Simulated data sets - data sets which are generated by a simulator

Requirements File

In order to create requirements.txt file we used pipreqs package.
pipreqs - Generate pip requirements.txt file based on imports of any project. (Automatically generate python dependencies)

Why not use pip freeze ?

As the github repo of pipreqs says: pipreqs Github repo

pip freeze saves all packages in the environment including even those that you don't use in your current project.
pip freeze is harmful. Dependencies may be deprecated as our libraries are updated, but will then be left in our requirements.txt file with no good reason, polluting our dependency list.

See the article $ pip freeze > requirements.txt considered harmful

Built With

PyCharm - the Python IDE for Professional Developers
Flightradar24 - ADS-B data sets

Authors

Lior Pizman - Final project, SISE, BGU - Github
Yehuda Pashay - Final project, SISE, BGU - Github

See also the list of contributors who participated in this project.

Appendices

Main methods

1. run_models
        Inputs      | algorithm, similarity function, test set path, results path, indicator - new/existing model
        Description | Execute models creation/loading process
        Output      | Plots + evaluation metrics per attack and algorithm

2. toggle_results
        Inputs      | selected_algorithm, selected_flight_route, selected_similarity_function
        Description | Toggle permutation of results
        Output      | A results table permutation (algorithm, flight route and similarity function)

3. time_series_split
        Inputs      | X, test_size=.2, number=False, output_numpy=True
        Description | Splits a dataset according to the time the data was taken
        Output      | X_train, X_test

4. clean_data
        Inputs      | Data
        Description | Clean the data by different steps as part of data pre-processing
        Output      | Clean data

5. anomaly_score_multi
        Inputs      | input vectors, output vectors, similarity function
        Description | Calculate the anomaly of a multiple output prediction
        Output      | anomaly score based on the similarity function

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
gui		gui
models		models
scripts		scripts
test		test
utils		utils
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
main.py		main.py
requirements.txt		requirements.txt
run		run

yehudaQ/AnomalyDetection

Folders and files

Latest commit

History

Repository files navigation

Anomaly Detection of GPS Spoofing Attacks on UAVs

Background & Motivation

What is GPS Spoofing ? And how it is harmful?

System flow

Prerequisites

Directories Structure

Getting Started

First Flow - Create new machine learning model

Second Flow - Load existing machine learning model

Both Flows - similarity functions step

Generated Machine Learning Models

Train & Test Explained

GPS Spoofing Attacks - ADS-B Data Sets

GPS Spoofing Attacks - Simulated Data Sets

Anomaly detection process

Time Series Regression

Metrics Comparison Results Table

Outlier Score Testing Results - Visual Illustration

LSTM - Results Example

SVR - Results Example

Random Forest - Results Example

MLP Neural Network - Results Example

Sensor value - Actual vs. Predicted - Results - Visual Illustration

LSTM - Results Example

SVR - Results Example

Random Forest - Results Example

MLP Neural Network - Results Example

Receiver Operating Characteristic (ROC)

MLP - Results Example

Research Risks

Python Libraries We Used

Data Sets

Requirements File

Built With

Authors

Appendices

Main methods

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages