- Author: Vaux Gomes
- Contact: vauxgomes@gmail.com
- Version: 0.1
Here we have implementations of LAC, Adaboost, a version of Conf-Rated Adaboost and SLIPPER algorithms. To mine the association rules one can use either d-peeler, multidupehack and lcm softwares without having to adapt the code. However, it is not very complicated adapting the code.
LAC (Lazy Associative Classification) is a rule-based demand-driven lazy machine learning algorithm. For each test instance, the algorithm projects the data at the region that the test instance is. As effect, the algorithm decomposes the problem of fitting a unique function that explains the whole date in many smaller problems. In deed, there is a possibility that not all the regions of data points will be explored by the algorithm for a given test set. LAC predicts the class of a test instance by averaging the confidence value of the induced rules and taking a majority vote among the rules classes.
Boosting is a method for improving the accuracy of machine learning algorithms, used for combining classifier by assigning them voting influence values (or simply, weights). Essentially, boosting builds an additive model by iteratively combining many classifiers, so called weak hypotheses all generated by a base learner.
The conf-rated Adaboost algorithm is an adaptation of the original discrete Adaboost that allow classifiers to give a notion of certainty on their predicions.
SLIPPER is a rule based algoritm that uses Adaboost's method to build sets of rules.
-
s
Traininig set files (required) -
t
Testing set file (required) -
i
Itemsets file (required) -
b
Maximum number of rounds -
z
Original sizes of each class -
w
Pre-set weights -
Z
ZERO Adaboost -
A
Associative Classifier -
D
Discrete Adaboost -
C
Confidence-rated Adaboost -
S
SLIPPER Classifier -
-free
Use free itemsets -
-rmode
Associative Classifier -
-seed
Random objects
There is a files called settings.py
in the utils
directory. Within that file there are a few variables that can be set:
RANDOM_SEED
Seed for controlling the randoming objectMIN_ROUNDS
Minimum of rounds for the Adaboost algorithmsMAX_ROUNDS
Maximum of rounds for the Adaboost algorithmsGAMMA
Gamma value for the Discrete Adaboost algorithms (see Adaboost)kICV
Number of internal cross validations of the Slipper algorithm
# Classes: TMP.class.0 TMP.class.1
# Itemsets: TMP.itemsets
# Test: TMP.testset
$ main.py -s TMP.class.* -t TMP.testset -i TMP.itemsets
# Calling Discrete Adaboost routine
$ main.py -s TMP.class.* -t TMP.testset -i TMP.itemsets -D
runner
is used to mine rules for given train and test sets and after that call the booster module
-
h
Displays help -
s
Training Files -
t
Testing files -
z
MINER: Minimum support size -
m
MINER: Use Multidupehack to mine the itemsets (default: D-peeler) -
l
MINER: Use LCM to mine the itemsets (default: D-peeler) -
e
BOOSTER: Activates Eager Mode -
Z
BOOSTER: Deactivates Zero classifier -
A
BOOSTER: Deactivates Associative Classifier -
D
BOOSTER: Activates Discrete Adaboost -
C
BOOSTER: Activates Conf-rated Adaboost -
S
BOOSTER: Activates SLIPPER Boost -
o
BOOSTER: Activates use of original train size -
j
BOOSTER: Activates use of Jaccard's index -
b
BOOSTER: Maximum number of rounds -
f
BOOSTER: Uses only free itemsets
Note: This code works only for luccskdd files.
$ ./runner.sh -s train -t test
# Calling Discrete Adaboost in the eager mode using multidupehack
$ ./runner.sh -s train -t test -emD
Note: It is interesting if you always call the option f
with the option m
.
This script runs a battery of datasets using the runner
script. See variable array in the script.
h
Displays helpa
Use multi and binary class problems datasetsx
Files extention name*m
MINER: miner options occordingly with runner script*p
Path for result outputsl
Progress log file (default: .batt)n
Show notifications (default: false)
# Calling a battery using options C and D of the main module
$ ./battery.sh -x ext -m "-CD"
The input format is formed of a series of rules in the following format:
<int_features> <class>
The LUCS-KDD
format fits very well!
The output is formed of a header in the format:
# Mode: <Lazy/Eager>
# Miner: <Multidupehack/D-peeler/Lcm>
# Train: <train file>
# Test: <test file>
#
# Date: Wed Nov 2 01:06:52 BRST 2016
# Host: ubt13z
followed for an empty line and a set of lines white space separated representing the predictions of the algorithms. Each line follows the following format:
<correct_class> ~<alg1_name> <pred1_alg1> ... <predN_alg1> ... ~<algM_name> <pred1_algM> ... <predN_algM>