[SRBench 2024] New performance plots and measures #179

folivetti · 2024-08-11T13:11:05Z

folivetti
Aug 11, 2024
Maintainer

Since we will likely end discussion about datasets tomorrow, it is time to start discussing how to compare algorithms. One idea we can borrow from optimization competition is to use performance profile (or a CDF of the performance) to compare the performance of the algorithms when considering a single metric:

We can read this plot as: "what is the fraction of problems in which algorithm A achieve an R^2 greater than x?" This plot also has the convenience of handling the cases of R^2<0 better than the errorbars.
The ACU can give us an aggregated value to rank algorithms.

Besides these plots, we should still keep the pareto front of a pair of metrics and the histogram of how many times an algorithm obtained a rank k or higher.

folivetti · 2024-08-11T13:12:11Z

folivetti
Aug 11, 2024
Maintainer Author

here's the code for the above plots

from collections import Counter
from itertools import cycle
from sklearn.metrics import auc
import matplotlib.pyplot as plt

plt.style.use('seaborn-v0_8-bright')

def perfprof_plot(df, perf_measure):
    lines = ["-","--","-.",":"]
    linecycler = cycle(lines)
    
    plt.figure(figsize=(20,10))
    tab = df.pivot(index="algorithm", columns="dataset", values=perf_measure)
    n_problems = len(tab.columns)
    for name, v in tab.iterrows():
        v = v[v>=0].sort_values()
        n_gt0 = v.shape[0]
        perf_x = [0]
        perf_y = [n_gt0/n_problems]
        for k, v1 in Counter(v).items():
            if k == 0:
                n_gt0 = n_gt0 - v1
                continue
            perf_x.append(k)
            perf_y.append(n_gt0/n_problems)
            n_gt0 = n_gt0 - v1

        plt.plot(perf_x, perf_y, next(linecycler), label=f'{name}\n(AUC = {np.round(auc(perf_x, perf_y),2)})')
    
    plt.xlabel(perf_measure, fontsize=18)
    plt.ylabel("P[alg_perf >= x]", fontsize=18)
    plt.legend(loc='upper center', bbox_to_anchor=(0.5, 1.1),
              ncol=10, fancybox=True, shadow=True, prop={'size': 16})

df_plot["log_model_size"] = np.log(df_plot.model_size)
algs = ["AFP", "uDSR", "FFX", "GP-GOMEA", "Operon", "PS-Tree", "ITEA"]
perfprof_plot(df_plot, "r2_test")
plt.savefig("r2_perf_full.png")
perfprof_plot(df_plot[df_plot.algorithm.isin(algs)], "r2_test")
plt.savefig("r2_perf.png")
perfprof_plot(df_plot[df_plot.algorithm.isin(algs)], "log_model_size")
plt.savefig("size_perf.png")

1 reply

gAldeia Aug 11, 2024
Maintainer

That looks like a great idea!

We could also bootstrap the AUC calculation and plot the average/median curve. Maybe using the confidence intervals in an table as well!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SRBench 2024] New performance plots and measures #179

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

[SRBench 2024] New performance plots and measures #179

folivetti Aug 11, 2024 Maintainer

Replies: 1 comment · 1 reply

folivetti Aug 11, 2024 Maintainer Author

gAldeia Aug 11, 2024 Maintainer

folivetti
Aug 11, 2024
Maintainer

Replies: 1 comment 1 reply

folivetti
Aug 11, 2024
Maintainer Author

gAldeia Aug 11, 2024
Maintainer