This page was built in support of our paper "Motiflets - Simple and Accurate Detection of Motifs in Time Series" by Patrick Schäfer and Ulf Leser, published at PVLDB, 16(4): 725 - 737, 2022.
Supporting Material
notebooks
: Please see the Jupyter Notebooks for use casescsvs
: The results of the scalability experimentsmotiflets
: Code implementing k-Motifletdatasets
: Use cases in the paperjars
: Java code of the competitors used in out paper: EMMA, Latent Motifs and Set Finder.
Intuitively speaking, k-Motiflets are the set of the exactly k most similar subsequences.
We argue that guessing k is almost always easier, as the concept of how many repetitions of a motif do you expect is easy to understand - though the guess itself need not be easy and thus we will also offer algorithms to learn
The following video highlights the ease of use of
In essence, there is no need for tuning any real-valued similarity threshold via trial-and-error, as is the case for virtually all motif set competitors.
Instead, for
showcase.mp4
The easiest is to use pip to install motiflets.
pip install motiflets
You can also install the project from source.
First, download the repository.
git clone https://github.com/patrickzib/motiflets.git
Change into the directory and build the package from source.
pip install .
Here we illustrate how to use k-Motiflets.
The following TS is an ECG from the Long Term Atrial Fibrillation (LTAF) database, which is often used for demonstrations in motif discovery (MD). The problem is particularly difficult for MD as actually two motifs exists: The first half of the TS contains a rectangular calibration signal with 6 occurrences, and the second half shows ECG heartbeats with 16 to 17 occurrences.
The major challenges in motif discovery are to learn the length of interesting motifs and to find the largest set of the same motif, i.e. all repetitions.
We first extract meaningful motif lengths (l) from this use case:
# The Motiflets-class
ml = Motiflets(
ds_name, # the name of the series
series, # the data
distance, # Distance measure used, default: z-normed ED
df_gt, # ground truth, if available
n_jobs # number of jobs (cores) to be used.
)
k_max = 20
length_range = np.arange(25,200,25)
motif_length = ml.fit_motif_length(k_max, length_range)
The plot shows that meaningful motifs are within a range of 0.8s to 1s, equal to roughly a heartbeat rate of 60-80 bpm.
To extract meaningful motif sizes (k) from this use case, we run
dists, candidates, elbow_points = ml.fit_k_elbow(
k_max,
motif_length
)
The variable elbow_points
holds characteristic motif sizes found.
Elbow points represent meaningful motif sizes. Here,
We finally plot these motifs:
The first repetitions perfectly match the calibration signal (orange), while the latter 16 repetitions perfectly match the ECG waves (green).
Data Sets: We collected challenging real-life data sets to assess the quality and scalability of MD algorithms. An overview of datasets can be found in Table 2 of our paper.
-
Jupyter-Notebook Use Cases for k-Motiflets: highlights all use cases used in the paper and shows the unique ability of k-Motiflets to learn its parameters from the data and find itneresting motif sets.
-
Jupyter-Notebook Vanilla Ice - Ice Ice Baby: This time series is a TS extracted from the pop song Ice Ice Baby by Vanilla Ice using the 2nd MFCC channel sampled at 100Hz. This TS is particularly famous pop song, as it is alleged to have copied its riff from "Under Pressure" by Queen and David Bowie. It contains 20 repeats of the riff in 5 blocks with each riff being 3.6−4s long.
-
Jupyter-Notebook Muscle Activation was collected from professional in-line speed skating on a large motor driven treadmill with Electromyo- graphy (EMG) data of multiple movements. It consists of 29.899 measurements at 100Hz corresponding to 30s in total. The known motifs are the muscle movement and a recovery phase.
-
Jupyter-Notebook ECG Heartbeats contains a patient’s (with ID 71) heartbeat from the LTAF database. It consists of 3.000 measurements at 128𝐻𝑧 corresponding to 23𝑠. The heartbeat rate is around 60 to 80 bpm. There are two motifs: A calibration signal and the actual heartbeats.
-
Jupyter-Notebook Physiodata - EEG sleep data contains a recording of an after- noon nap of a healthy, nonsmoking person, between 20 to 40 years old [10]. Data was recorded with an extrathoracic strain belt. The dataset consists of 269.286 points at 100H𝑧 corresponding to 45𝑚𝑖𝑛. Known motifs are so-called sleep spindles and 𝑘-complexes.
-
Jupyter-Notebook Industrial Winding Process is a snapshot of a process where a plastic web is unwound from a first reel (unwinding reel), goes over the second traction reel and is finally rewound on the the third rewinding reel. The recordings correspond to the traction of the second reel angular speed. The data contains 2.500 points sampled at 0.1𝑠, corresponding to 250𝑠. No documented motifs exist.
-
Jupyter-Notebook Functional near-infrared spectroscopy (fNIRS) contains brain imag- inary data recorded at 690𝑛𝑚 intensity. There are 208.028 measurements in total. The data is known to be a difficult example, as it contains four motion artifacts, due to movements of the patient, which dominate MD. No documented motifs exist.
-
Jupyter-Notebook Semi-Synthetic with implanted Ground Truth: One example series form our 25 semi-synthetic time series. To measure the precision of the different MD methods we created a semi-synthetic dataset using the first 25 datasets of an anomaly benchmark and implanted motif sets of varying sizes
$k \in [5, \dots, 10]$ of fixed length$l=500$ . -
Jupyter-Notebook Full results for the Semi-Synthetic Dataset with implanted Ground Truth: To measure the precision of the different MD methods we created a semi-synthetic dataset using the first 25 datasets of an anomaly benchmark and implanted motif sets of varying sizes
$k \in [5, \dots, 10]$ of fixed length$l=500$ .
If you use this work, please cite as:
@article{motiflets2022,
title={Motiflets - Simple and Accurate Detection of Motifs in Time Series},
author={Schäfer, Patrick and Leser, Ulf},
journal={Proceedings of the VLDB Endowment},
volume={16},
number={4},
pages={725--737},
year={2022},
publisher={PVLDB}
}
Link to the paper.