Skip to content

Code for the paper "Bias Silhouette Analysis: Towards Assessing the Quality of Bias Metrics for Word Embedding Models".

License

Notifications You must be signed in to change notification settings

webis-de/IJCAI-21

Repository files navigation

Bias Silhouette Analysis

This repository contains the code to reproduce the results of the paper "Bias Silhouette Analysis: Towards Assessing the Quality of Bias Metrics for Word Embedding Models", as presented at the IJCAI 2021 conference. Please find the full reference below:

@InProceedings{spliethoever:2021,
  title     = {Bias Silhouette Analysis: Towards Assessing the Quality of Bias Metrics for Word Embedding Models},
  author    = {Maximilian Splieth{\"o}ver and Henning Wachsmuth},
  booktitle = {Proceedings of the Thirtieth International Joint Conference on
Artificial Intelligence, {IJCAI-21}},  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  editor    = {Zhi-Hua Zhou},
  pages     = {552--559},
  year      = {2021},
  month     = {aug},
  doi       = {10.24963/ijcai.2021/77},
  url       = {https://doi.org/10.24963/ijcai.2021/77},
}

Supplementary Material

You can find the paper's supplementary material in the bias-silhouette-analysis-supplementary.pdf file.

Reproduce the results

The code in this repository was tested, and the results were created with Python version: 3.8 (Linux) as defined in the Pipfile.

Install required Python packages

All required Python packages and their version are defined in the Pipfile. When using pipenv, install the requirements with:

$ pipenv install

Further, some of the scripts build on the spaCy language model. Thus, it needs to be installed as well.

$ python -m spacy download en_core_web_sm

Download the data and baseline models

The study builds on different pre-trained word embedding models. Both the GloVe embedding model and the ConceptNet Numberbatch model will be downloaded at runtime by the embeddings library (only on the first run; consecutive runs will re-use the downloaded models). Since the library did not support the latter model at the time of evaluation, we ship a custom implementation with this repository.

Run the pipeline

The central processing and evaluation pipeline is defined and commented in the run-pipeline.sh file. All critical settings are defined using variables at the beginning of the file and can be adapted if necessary. By default, all generated outputs will be placed in a sub-directory of output/metric-evaluation/, which is created at run time. When done adapting the variables, simply start the pipeline.

$ bash run_pipeline.sh

Extending the experiments

This code is built in a way that should make it fairly easy to extend and run with different metrics and word embedding models, as long as those fulfill certain requirements.

Adding a metric

  1. Most importantly, each metric needs its evaluation function. The evaluation function executes the metric evaluation with a given lexicon on a given embedding model. A custom implementation per metric is required since some metrics use a different number of input lexicons or need an input formatted in a certain way. If your metric uses four input lexicons, the evaluation function of the WEAT metric weat_evaluation can be used as a guideline (and probably almost entirely re-used). If your metric uses three inputs, the evaluation function of the RNSB metric rnsb_evaluation can be of help. For the documentation and parameters of this function, please also refer to one of the evaluation function implementations.
  2. Secondly, you need to add the metric to the metric_evaluation.py file. For this, you can follow the necessary additions from, e.g., the WEAT metric, which you'll find in line 118ff. and 183. If your metric uses either the WEAT or the RNSB lexicon format, you can use the lexicon preparation functions implemented in webias/utils.py. Otherwise, you need to supply your own lexicon preparation, which creates all the different lexicon shuffles/variations with which a metric will be evaluated.
  3. Lastly, you have to adapt the run-pipeline.sh file to actually run the evaluation with your new metric.

Adding a word embedding model

  1. If your embedding model is in the standard word2vec format that is interpretable with the gensim library (basically a .txt file where each line represents a token and its vector, space separated), simply copy the file to the data/word-vectors directory and adapt the variables at the top of the run-pipeline.sh file accordingly (namely, you need to adjust the contents of the MODEL_PATHS and MODEL_LOWERCASED variables).
  2. If the model is not in that format, a new embedding model reader might need to be implemented. You can add a new class in webias/word_vectors.py that inherits from the BaseEmbeddings abstract class. Furthermore, an additional special case needs to be added to the metric_evaluation.py file in line 33ff and to the fitler_bias_lexicons.py file in line 48ff.

Notes on the metric (re-)implementations

WEAT

Since, at the time of conducting the experiments, there was no official WEAT implementation available publicly, we re-implemented the approach from the information available in the original paper and its supplementary material (you can find both here). While the evaluation results of the pre-trained word embeddings models with our implementation are not exactly the same, we attribute those smaller changes to implementation details. You can run the score replications with $ python -m unittest webias.tests.weat_score_replication_w2v for the word2vec embedding model and $ python -m unittest webias.tests.weat_score_replication_glove for the GloVe embedding model. Passing tests are within a boundary specified in the webias/constants.py file. The tests use the word lists published in the original paper, which can also be found at webias/data/weat_tests.json.

The tests additionally require you to download the word2vec embedding model from here and place the extracted file (GoogleNews-vectors-negative300.bin) into the data/word-vectors directory. As described above, the GloVe embedding will be downloaded automatically during runtime (if they are not already present).

RNSB

Since, at the time of conducting the experiments, there was no official RNSB implementation available publicly, we re-implemented the approach from the information available in the original paper (you can find it here). While the evaluation results of the pre-trained word embeddings models with our implementation are not exactly the same, we attribute those smaller changes to implementation details. You can run the score replications with $ python -m unittest webias.tests.rnsb_score_replication_w2v for the word2vec embedding model, $ python -m unittest webias.tests.rnsb_score_replication_conceptnet for the numberbatch embedding model and $ python -m unittest webias.tests.weat_score_replication_glove for the GloVe embedding model. Passing tests are within a boundary specified in the webias/constants.py file. The tests use the word lists published in the original paper, which can also be found at webias/data/rnsb_tests.json.

The tests additionally require you to download the word2vec embedding model from here and place the extracted file (GoogleNews-vectors-negative300.bin) into the data/word-vectors directory. As described above, the GloVe and Numberbatch embedding models will be downloaded automatically during runtime (if they are not already present).

ECT

The implementation of the ECT metric was taken from the code published by the authors. A more detailed description can be found in the comments of the code file. As our implementation uses more or less the same code as published by the authors, we didn't see a need to additionally verify the implementation.

Notes on word lists

The file data/social-bias-lexicon.json contains a multitude of word lists compiled from different related works. The sources are referenced in the respective "source" field of each list. Only a selection of those were utilized in our original publication, though.

Contributing

Contributions and PRs are welcome! Please try to follow the flake8 and editorconfig rules specified in the respective files (there are editor plugins for both). :)

About

Code for the paper "Bias Silhouette Analysis: Towards Assessing the Quality of Bias Metrics for Word Embedding Models".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published