Skip to content

Code used in More Labels or Cases? Assessing Label Variation in Natural Language Inference.

Notifications You must be signed in to change notification settings

mainlp/label-variation-nli

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

More Labels or Cases? Assessing Label Variation in Natural Language Inference.

In this work, we analyze the uncertainty that is inherently present in the labels used for supervised machine learning in natural language inference (NLI). In cases where multiple annotations per instance are available, neither the majority vote nor the frequency of individual class votes is a trustworthy representation of the labeling uncertainty. We propose modeling the votes via a Bayesian mixture model to recover the data-generating process, i.e., the posterior distribution of the ``true'' latent classes, and thus gain insight into the class variations. This will enable a better understanding of the confusion happening during the annotation process. We also assess the stability of the proposed estimation procedure by systematically varying the numbers of i) instances and ii) labels. Thereby, we observe that few instances with many labels can predict the latent class borders reasonably well, while the estimation fails for many instances with only a few labels. This leads us to conclude that multiple labels are a crucial building block for properly analyzing label uncertainty.

Repository

The file structure of this project is as follows:

├── README.md
├── data
│   ├── bootstrap
│   ├── final
│   └── raw
├── figs
│   ├── appendix
│   ├── full_bootstrap.png
│   └── scatter_latent.png
├── notebooks
│   ├── 0_descriptives.ipynb
│   ├── 1_bayesian_mixture_model.ipynb
│   └── 2_stability_estimation.ipynb
└── src
    ├── __init__.py
    ├── __pycache__
    ├── bootstrap_funcs.py
    ├── config.py
    ├── load_data.py
    ├── model_funcs.py
    ├── plotting_funcs.py
    └── utils

The folder data contains

  • raw: the raw data as given in https://github.com/easonnie/ChaosNLI,
  • final: the cleaned data used for the final analysis, and
  • bootstrap: the data generated by the bootstrapping procedure, which can be used to exactly recreate the results.

The folder figs contains the figures used in the paper.

The folder notebooks contains the notebooks used to generate the results.

  • 0_descriptives.ipynb: code used for the initial descriptive analysis.
  • 1_bayesian_mixture_model.ipynb: code used for the estimation of the Bayesian mixture model.
  • 2_stability_estimation.ipynb: code used for the bootstrap estimation of the stability of the Bayesian mixture model.

The folder src contains the code files that were used throughout the project.

  • bootstrap_funcs.py: functions used for the bootstrapping procedure.
  • config.py: configuration file for the project.
  • load_data.py: load the data from data/raw, clean it and save the results to data/final.
  • model_funcs.py: functions used for the Bayesian mixture model.
  • plotting_funcs.py: functions used for plotting the results.

About

Code used in More Labels or Cases? Assessing Label Variation in Natural Language Inference.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.3%
  • Python 0.7%