Soft metrics for evaluation with disagreements

The move towards preserving judgement disagreements in NLP requires the identification of adequate evaluation metrics. We identify a set of key properties that such metrics should have, and assess the extent to which natural candidates for soft evaluation such as Cross Entropy satisfy such properties. We employ a theoretical framework, supported by a visual approach, by practical examples, and by the analysis of a real case scenario. Our results indicate that Cross Entropy can result in fairly paradoxical results in some cases, whereas other measures Manhattan distance and Euclidean distance exhibit a more intuitive behavior, at least for the case of binary classification.

Citation

If you found our work useful, please cite our papers:

Soft metrics for evaluation with disagreements: an assessment

@inproceedings{rizzi2024soft,
  title={Soft metrics for evaluation with disagreements: an assessment},
  author={Rizzi, Giulia and Leonardelli, Elisa and Poesio, Massimo and Uma, Alexandra and Pavlovic, Maja and Paun, Silviu and Rosso, Paolo and Fersini, Elisabetta},
  booktitle={Proceedings of the 3rd Workshop on Perspectivist Approaches to NLP (NLPerspectives)@ LREC-COLING 2024},
  pages={84--94},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Metrics_Evaluation.ipynb		Metrics_Evaluation.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Soft metrics for evaluation with disagreements

Citation

About

Releases

Packages

Languages

MIND-Lab/Soft-metrics-for-evaluation-with-disagreements

Folders and files

Latest commit

History

Repository files navigation

Soft metrics for evaluation with disagreements

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages