A paired comparison of methods in machine learning refers to a direct comparison between two models or algorithms across multiple tasks or datasets. The goal is to determine which model performs better in a head-to-head comparison on a set of metrics, such as accuracy, precision, recall, or any other relevant performance measure. In this context, each dataset serves as a paired observation, where the performance of one model is directly compared to the performance of another. This method is particularly useful for understanding the relative strengths and weaknesses of different models in specific scenarios.
@misc{pairComparison2024,
author = {Elaine Cecília Gatto},
title = {pairComparison: A package to performing comparisons between pairs of methods. },
year = {2024},
note = {R package version 0.1.0. Licensed under CC BY-NC-SA 4.0},
doi = {10.13140/RG.2.2.28587.04642},
url = {https://github.com/cissagatto/pairComparison}
}
The goal of this approach is to generate a matrix
For evaluation metrics where a higher value indicates better performance (e.g., accuracy, F1-score):
-
Comparison Rule: If the value of
$m_{i}$ is greater than$m_{j}$ on a specific dataset, then count that dataset as a win for$m_{i}$ (assign a score of 1). Otherwise, assign a score of 0.
Formally, for each pair
Where
Here,
For evaluation metrics where a lower value indicates better performance (e.g., Hamming loss):
-
Comparison Rule: If the value of
$m_{i}$ is less than$m_{j}$ on a specific dataset, then count that dataset as a win for$m_{i}$ (assign a score of 1). Otherwise, assign a score of 0.
Formally, for each pair
Where
Here,
The code can also perform additional comparisons to calculate:
-
$m_{i} \geq m_{j}$ : The number of datasets where the performance value of$m_{i}$ is greater than or equal to$m_{j}$ . -
$m_{i} \leq m_{j}$ : The number of datasets where the performance value of$m_{i}$ is lesser than or equal to$m_{j}$ . -
$m_{i} = m_{j}$ : The number of datasets where the performance value of$m_{i}$ and$m_{j}$ is equal.
These additional comparisons can be useful for other types of analysis, such as determining ties or dominance in a set of models.
The final output is a comparison matrix
Model_1 | Model_2 | Model_3 | Model_4 | |
---|---|---|---|---|
Model_1 | 14 | 9 | 5 | 6 |
Model_2 | 7 | 14 | 2 | 4 |
Model_3 | 9 | 12 | 14 | 9 |
Model_4 | 8 | 10 | 5 | 14 |
- Interpretation:
- The value in the cell at row "Model_1" and column "Model_2" is 9. This means that Model_1 outperforms Model_2 on 9 datasets.
- Similarly, the value in the cell at row "Model_2" and column "Model_3" is 2, indicating that Model_2 outperforms Model_3 on 2 datasets.
In machine learning, paired comparisons help in:
- Model Selection: By comparing models pairwise across datasets, you can identify which model is generally better or more consistent.
- Understanding Performance Variability: Some models may perform exceptionally well on certain datasets but poorly on others. Paired comparisons allow for the identification of such patterns.
- Statistical Testing: Paired comparisons are also the basis for statistical tests, such as the Wilcoxon signed-rank test or the Friedman test, which help to determine if the observed differences in performance are statistically significant.
In summary, paired comparisons provide a systematic way to evaluate and compare the performance of multiple models across different datasets, helping practitioners make informed decisions in model selection and evaluation.
First, install the package via github
# install.packages("devtools")
library("devtools")
devtools::install_github("https://github.com/cissagatto/pairComparison")
library(pairComparison)
names.methods <- c("Model_1", "Model_2", "Model_3", "Model_4")
filename <- "~/pairComparison/data/accuracy.csv"
results = pair.comparison(filename = filename,
FolderOrigin = FolderData,
FolderDestiny = FolderResults,
measure.name = "accuracy",
names.methods = names.methods)
print(results$Accuracy$greater_or_equal)
print(results$Accuracy$greater)
print(results$Accuracy$less_or_equal)
print(results$Accuracy$less)
print(results$Accuracy$equal)
# Set working directory to the folder containing the CSV files
setwd(FolderData)
# Get list of all CSV files in the directory
files <- list.files(full.names = TRUE)
# Normalize file paths for consistency
full_paths <- sapply(files, normalizePath)
# Extract measure names from file paths
extract_measure_names <- function(file_paths) {
# Extract file names from paths
file_names <- basename(file_paths)
# Remove file extensions to get measure names
measure_names <- tools::file_path_sans_ext(file_names)
return(measure_names)
}
measure_names <- extract_measure_names(full_paths)
# Perform comparison for all measures
results = pair.comparison.all.measures(names.csvs = full_paths,
FolderOrigin = FolderData,
FolderDestiny = FolderResults,
names.methods = names.methods,
names.measures = measure_names)
print(results$greater_or_equal)
print(results$greater)
print(results$less_or_equal)
print(results$less)
print(results$equal)
# Set the base path
base_path <- "~/pairComparison/results"
setwd(base_path)
# Get directory names and paths
directories <- list.dirs(full.names = TRUE, recursive = FALSE)
directory_names <- basename(directories)
# Get measurement types
measurements <- pc.measures()
# Define the desired order of methods
desired_order <- c("Lo", "G", "H.Ra", "NH.Ra", "H.J.K1", "H.Ro.K1",
"H.J.K2", "H.Ro.K2", "H.J.K3", "H.Ro.K3",
"H.J.T0", "H.Ro.T0", "H.J.T1", "H.Ro.T1",
"NH.J.K1", "NH.Ro.K1", "NH.J.K2", "NH.Ro.K2",
"NH.J.K3", "NH.Ro.K3", "NH.J.T0", "NH.Ro.T0",
"NH.J.T1", "NH.Ro.T1")
# Function to process each directory
process_directory <- function(index) {
# Get current directory path and corresponding measurement
current_path <- directories[index]
measurement_name <- directory_names[index]
# Load the measurement type for the current directory
measurement_type <- filter(measurements, names == measurement_name)$type
suffix <- ifelse(measurement_type == 1, "greater", "less")
# Build file names and paths
file_name <- paste(measurement_name, "-", suffix, "-datasets.csv", sep = "")
file_path <- file.path(current_path, file_name)
# Read data
data <- read.csv(file_path)
names(data)[1] <- "Method1" # Rename the first column
# Generate the heatmap
heatmap_plot <- pc.plot.heatmap(data, title = "Comparison Heatmap", desired_order)
# Save the heatmap as a PDF
output_folder <- file.path("~/pairComparison/results", measurement_name)
dir.create(output_folder, showWarnings = FALSE, recursive = TRUE) # Create directory if it doesn't exist
save.heatmap.as.pdf(heatmap_plot,
file_path = output_folder,
file_name = paste(measurement_name, "-", suffix, sep = ""),
width = 10, height = 6)
gc() # Call garbage collection to free memory
}
# Process each directory
lapply(seq_along(directories), process_directory)
For a given CSV file, the result might look like this:
,Model_1,Model_2,Model_3,Model_4
Model_1,14,9,5,6
Model_2,7,14,2,4
Model_3,9,12,14,9
Model_4,8,10,5,14
In this matrix:
- The cell at row
Model_1
and columnModel_2
shows9
, meaningModel_1
outperformsModel_2
in 9 datasets.
The pair.comparison
function compares methods across a single CSV file by determining how many datasets each method outperforms another. It processes a given CSV file and saves the comparison results in a structured manner in a results folder.
-
filename
: A character string specifying the full path of the CSV file to be processed. The CSV should have a structure where each row represents a dataset and each column (except the first) represents a method's performance in that dataset. -
FolderOrigin
: A character string specifying the path to the folder where the CSV file is located. This parameter is currently unused but can be included for compatibility with other functions. -
FolderDestiny
: A character string specifying the path to the folder where results will be saved. The function will create a subfolder here for each measure. -
measure.name
: A character string specifying the name of the measure being processed. This name will be used to organize the results (seepc.mesures()
). -
names.methods
: A character vector containing the names of the methods used as column names in the data. These names will be used for labeling the results.
The function does not return any value. It writes multiple CSV files with the comparison results to the specified folder. The results are stored in the following files:
greater-or-equal-datasets.csv
: Contains the number of datasets in which each method's performance value is greater than or equal to the other method.greater-datasets.csv
: Contains the number of datasets in which each method's performance value is greater than the other method.less-or-equal-datasets.csv
: Contains the number of datasets in which each method's performance value is less or equal than the other method.less-datasets.csv
: Contains the number of datasets in which each method's performance value is less than the other method.equal-datasets.csv
: Contains the number of datasets in which each method's performance value is equal to the other method.
Ensure the following folder structure is set up:
FolderRoot
: Root directory of the project.FolderData
: Directory where CSV data files are stored.FolderResults
: Directory where results and plots are saved.
For more detailed documentation on each function, check out the ~/pairComparison/docs
folder
We welcome contributions from the community! If you have suggestions, improvements, or bug fixes, please submit a pull request or open an issue in the GitHub repository.
- This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.
- This study was financed in part by the Conselho Nacional de Desenvolvimento Científico e Tecnológico - Brasil (CNPQ) - Process number 200371/2022-3.
- The authors also thank the Brazilian research agencies FAPESP financial support.
For any questions or support, please contact:
- Prof. Elaine Cecilia Gatto (elainececiliagatto@gmail.com)
| Site | Post-Graduate Program in Computer Science | Computer Department | Biomal | CNPQ | Ku Leuven | Embarcados | Read Prensa | Linkedin Company | Linkedin Profile | Instagram | Facebook | Twitter | Twitch | Youtube |
Feel free to adjust any specific details or add additional sections based on your needs.