SigMahaKNN (signature_mahalanobis_knn
) combines the variance norm (a
generalisation of the Mahalanobis distance) with path signatures for anomaly
detection for multivariate streams. The signature_mahalanobis_knn
library is a
Python implementation of the SigMahaKNN method described in
Dimensionless Anomaly Detection on Multivariate Streams with Variance Norm and Path Signature.
To find the examples from the paper, please see the paper-examples folder which includes notebooks for downloading and running the experiments.
The key contributions of this library are:
- A simple and efficient implementation of the variance norm distance as
provided by the
signature_mahalanobis_knn.Mahalanobis
class. The class has two main methods:- The
fit
method to fit the variance norm distance to a training datase - The
distance
method to compute the distance between twonumpy
arraysx1
andx2
- The
- A simple and efficient implementation of the SigMahaKNN method as provided by
the
signature_mahalanobis_knn.SignatureMahalanobisKNN
class. The class has two main methods:- The
fit
method to fit a model to a training dataset- The
fit
method can take in a corpus of streams as its input (where we will compute path signatures of using thesktime
library withesig
oriisignature
) or a corpus of path signatures as its input. This also opens up the possibility of using other feature represenations and applications of using the variance norm distance for anomaly detection - Currently, the library uses either
sklearn
'sNearestNeighbors
class orpynndescent
'sNNDescent
class to efficiently compute the nearest neighbour distances of a new data point to the corpus training data
- The
- The
conformance
method to compute the conformance score for a set of new data points- Similarly to the
fit
method, theconformance
method can take in a corpus of streams as its input (where we will compute path signatures of using thesktime
library withesig
oriisignature
) or a corpus of path signatures as its input
- Similarly to the
- The
The SigMahaKNN library is available on PyPI and can be installed with pip
:
pip install signature_mahalanobis_knn
As noted above, the signature_mahalanobis_knn
library has two main classes:
Mahalanobis
, a class for computing the variance norm distance, and
SignatureMahalanobisKNN
, a class for computing the conformance score for a set
of new data points.
To compute the variance norm (a generalisation of the Mahalanobis distance) for
a pair of data points x1
and x2
given a corpus of training data X
(a
two-dimensional numpy
array), you can use the Mahalanobis
class as follows:
import numpy as np
from signature_mahalanobis_knn import Mahalanobis
# create a corpus of training data
X = np.random.rand(100, 10)
# initialise the Mahalanobis class
mahalanobis = Mahalanobis()
mahalanobis.fit(X)
# compute the variance norm distance between two data points
x1 = np.random.rand(10)
x2 = np.random.rand(10)
distance = mahalanobis.distance(x1, x2)
Here we provided an example with the default initialisation of the Mahalanobis
class. There are also a few parameters that can be set when initialising the
class (see details in
Dimensionless Anomaly Detection on Multivariate Streams with Variance Norm and Path Signature):
subspace_thres
: (float) threshold for deciding whether or not a point is in the subspace, default is 1e-3svd_thres
: (float) threshold for deciding the numerical rank of the data matrix, default is 1e-12zero_thres
: (float) threshold for deciding whether the distance should be set to zero, default is 1e-12
To use the SigMahaKNN method for anomaly detection of multivariate streams, you
can use the SignatureMahalanobisKNN
class by first initialising the class and
then using the fit
and conformance
methods to fit a model to a training
dataset of streams and compute the conformance score for a set of new data
streams, respectively:
import numpy as np
from signature_mahalanobis_knn import SignatureMahalanobisKNN
# create a corpus of training data
# X is a three-dimensional numpy array with shape (n_samples, length, channels)
X = np.random.rand(100, 10, 3)
# initialise the SignatureMahalanobisKNN class
sig_maha_knn = SignatureMahalanobisKNN()
sig_maha_knn.fit(
knn_library="sklearn",
X_train=X,
signature_kwargs={"depth": 3},
)
# create a set of test data streams
Y = np.random.rand(10, 10, 3)
# compute the conformance score for the test data streams
conformance_scores = sig_maha_knn.conformance(X_test=Y, n_neighbors=5)
Note here, we have provided an example whereby you pass in a corpus of streams
to fit and compute the conformance scores. We use the sktime
library to
compute path signatures of the streams.
However, if you already have computed signatures or you are using another
feature representation method, you can pass in the corpus of signatures to the
fit
and conformance
methods instead of the streams. You do this by passing
in arguments signatures_train
and signatures_test
to the fit
and
conformance
methods, respectively.
import numpy as np
from signature_mahalanobis_knn import SignatureMahalanobisKNN
# create a corpus of training data (signatures or other feature representations)
# X is a two-dimensional numpy array with shape (n_samples, n_features)
features = np.random.rand(100, 10)
# initialise the SignatureMahalanobisKNN class
sig_maha_knn = SignatureMahalanobisKNN()
sig_maha_knn.fit(
knn_library="sklearn",
signatures_train=features,
)
# create a set of test features
features_y = np.random.rand(10, 10)
# compute the conformance score for the test features
conformance_scores = sig_maha_knn.conformance(signatures_test=features_y, n_neighbors=5)
The core implementation of the SigMahaKNN method is in the
src/signature_mahalanobis_knn
folder:
mahal_distance.py
contains the implementation of theMahalanobis
class to compute the variance norm distancesig_maha_knn.py
contains the implementation of theSignatureMahalanobisKNN
class to compute the conformance scores for a set of new data points against a corpus of training datautils.py
contains some utility functions that are useful for the librarybaselines/
is a folder containing some of the baseline methods we look at in the paper - see paper-examples/README.md for more details
There are various examples in paper-examples
folder:
paper-examples
contains the examples used our paper Dimensionless Anomaly Detection on Multivariate Streams with Variance Norm and Path Signature where we compare the SigMahaKNN method to other baseline approaches (e.g. Isolation Forest and Local Outlier Factor) on real-world datasets- There are notebooks for downloading and preprocessing the datasets for the examples - see paper-examples/README.md for more details
To take advantage of pre-commit
, which will automatically format your code and
run some basic checks before you commit:
pip install pre-commit # or brew install pre-commit on macOS
pre-commit install # will install a pre-commit hook into the git repo
After doing this, each time you commit, some linters will be applied to format
the codebase. You can also/alternatively run pre-commit run --all-files
to run
the checks.
See CONTRIBUTING.md for more information on running the test
suite using nox
.