Mulan is an open-source Java library for learning from multi-label datasets; including both Multi-Label Classification (MLC) and Multi-Target Regression (MTR) methods [Tso11]. The Mulan library only provides a Java API, but no any GUI or command-line interface to execute its methods.
In this repository, we created an executable .jar file to execute MLC and MTR methods from the Mulan library. The aim of this project was to have a personal way to execute experiments with the different methods as simple as possible. The .jar file is available to download at the latest release. This code was implemented for its use in the article [Moy18].
To execute it, the following command have to be executed in a console:
java -jar ExecuteMulan_1.3.jar [parameters]
The jar file allows several parameters to indicate the method to execute, the dataset, and so on. Some of the parameters are common to all or almost all algorithms, and are the following:
- With the
-t
parameter, we define the .arff filename of the training dataset. In case of performing CV procedure, it defines the full set. - With the
-T
parameter, we define the .arff filename of the test dataset. It should not be used if CV procedure is used. If not CV is performed and test data is not given, train data is used as default for testing too. - With the
-x
parameter, we define the .xml filename of the dataset (know more about the format of the datasets). - With the
-a
parameter, the MLC or MTR method to be executed is indicated. Below is provided a list with all the available methods in this jar. - With the
-f
parameter, we define the number of folds for the cross-validation procedure. If it is not set, the CV procedure is not performed, and both train and test are used. - With the
-o
parameter, we define the filename of the output file where the results are stored as a .csv file. - With the
-i
parameter, the number of different seeds for random numbers (for those algorithms that need) is indicated; i.e., the number of executions for those algorithms that use random numbers. If this parameter is passed to a method that does not use random numbers, it is omitted. By default, its value is 1. - If the
-l
flag is included, the values of the macro-averaged measures for each label are reported. By default, they are not reported.
Each method also accept a number of different parameters to tune them. The methods printDefaultParameters() and printParametersDescription() of the LearnerParameters class are very helpful to understand which parameters need each of the method. However, we present some of the most common here:
- With the
-c
parameter, we indicate the base learner to use in the given method. Allowed values for this parameter are J48 (J48/C4.5 decision tree), RT (Random Tree), SMO (SMO Support Vector) for multi-label classification algorithms, and also REPTree for multi-target regression algorithms. However, if another base learner is needed to be used, the full java classname for the desired base learner can be given, as for example:-c weka.classifiers.trees.DecisionStump
. - For ensemble-based methods, with the parameter
-n
we indicate the number of members to use in the ensemble.
The full list of methods availables in this jar is provided below. It is shown the value of the -a
parameter and the full name of the method:
- Multi-label classification. Problem Transformation methods:
BR
: Binary RelevanceCC
: Classifier ChainsCLR
: Calibrated Label RankingLP
: Label PowersetLPBR
: LP-BR (a.k.a. Chi-Dep)PS
: Pruned Sets
- Multi-label classification. Algorithm Adaptation methods:
AdaBoostMH
: AdaBoost.MHBPMLL
: Back-Propagation for Multi-Label LearningIBLR
: Instance-Based Logistic Regression for Multi-Label LearningMLkNN
: Multi-Label k-Nearest Neighbors
- Multi-label classification. Ensembles of Multi-Label Classifiers:
CDE
: Chi-Dep EnsembleEBR
: Ensemble of Binary RelevancesECC
: Ensemble of Classifier ChainsELP
: Ensemble of Label PowersetsEMLS
: Ensemble of Multi-Label SamplingEPS
: Ensemble of Pruned SetsHOMER
: Hierarchy Of Multilabel classifiERsMLS
: Multi-Label StackingRAkEL
: RAndom k-labELsetsRFPCT
: Random Forest of Predictive Clustering Trees
- Multi-target regression methods:
ERC
: Ensemble of Regressor ChainsRC
: Regressor ChainRLC
: Random Linear Combinations (Normalized)ST
: Single-TargetSST
: Stacked ST
Two datasets have been included in the repository as example: Emotions [Tso08] (for multi-label classification), and andro [Hat08] (for multi-target regression); a wide variety of dataset are available at the KDIS Research Group Repository and the Mulan webpage.
[Hat08] E. V. Hatzikos, G. Tsoumakas, G. Tzanis, N. Bassiliades, and I. Vlahavas. (2008). An empirical study on sea water quality prediction. Knowledge-Based Systems, 21(6), 471-478.
[Tso08] G. Tsoumakas, I. Katakis, and I. Vlahavas. (2008). Effective and Efficient Multilabel Classification in Domains with Large Number of Labels. In Proc. ECML/PKDD 2008 Workshop on Mining Multidimensional Data (MMD’08), 53-59.
[Tso11] G. Tsoumakas, E. Spyromitros-Xioufis, J. Vilcek, and I. Vlahavas. (2011). Mulan: A java library for multi-label learning. Journal of Machine Learning Research, 12, 2411-2414.
[Moy18] J. M. Moyano and E. L. Gibaja and K. J. Cios and S. Ventura. (2018). Review of ensembles of multi-label classifiers: Models, experimental study and prospects. Information Fusion, 44, 33-45.