This is the official page for the paper:
BioNLI: Generating a Biomedical NLI Dataset Using Lexico-semantic Constraints for Adversarial Examples
accepted at EMNLP2022 (Findings).
BioNLI is the first dataset in biomedical natural language inference. This dataset contains abstracts from biomedical literature and mechanistic premises generated with nine different strategies.
In the following example we see an example of an entry in the BioNLI dataset. Some supporting text was removed to save space. The premise is a set of sentences talking about two biomedical entiteis. The consistent hypothesis is the original conclusion sentence from the abstract paper, the inconsistent hypothesis is the generated sentence with one of the different nine strategies.
There are two different versions of this dataset. One is the large distribution which contains all possible perturbations and the other is the balanced distirbution. They both share the same test set. For the full distribution, we generate as many perturbations as possible for dev and test set, but for training each instance is perturbed once.
The dataset can be downloaded here:
The full set can be downloaded from here.
The balanced set can be downloaded from here.
To access the test set please contact me.
BioNLI is distributed under CC BY 4.0 License.
Please use the following bibtex entry:
@inproceedings{bastan-etal-2022-bionli,
title = "{B}io{NLI}: Generating a Biomedical {NLI} Dataset Using Lexico-semantic Constraints for Adversarial Examples",
author = "Bastan, Mohaddeseh and
Surdeanu, Mihai and
Balasubramanian, Niranjan",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",
month = dec,
year = "2022",
address = "Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.findings-emnlp.374",
pages = "5093--5104",
}