This code implements sequential pattern mining (SPM) algorithm using a breadth-first-search approach. SPM finds the frequent subsequences in a given dataset of sequences.
1. make
2. ./gsp_cpu <frequency> <input file> <Dumping candidates (yes = 1, No = 0)> <Dumping results or frequent candidates (yes = 1, No = 0)> <Allow gap between itemsets (yes = 1, No = 0)>
You can set "Allow gap between itemsets" to "0" in order to mine the frequent consequtive itemsets.
Sample input file
1 2 -1 3 4 -1 -2
5 6 -1 -2
-2
'-1' is a delimiter between itemsets.
'-2' is a delimiter betweeen sequences.
'-2' should be added to the last line.
Sample output file
1-2--1-3
5--1
'-' is a delimiter to separate items in one itemset (subsequence). '--' is a delimiter to separate itemsets in the sequence.
GPU and multi-thread CPU implementation of the code is available. Please contact "elaheh@virginia.edu" for more information.
Please cite the following papers if you are using this tool for your research.