CREST: A Causal Relation Schema for Text 🚀

CREST is created to help researchers who work on causal/counterfactual relation extraction/classification, commonsense reasoning, and reading comprehension in natural language to communicate easier and leverage the scattered data resources around this topic. CREST is a user-friendly machine-readable format stored as pandas DataFrame.

`CREST` format

Each relation in a CREST-formatted DataFrame has the following fields/values:

original_id: the id of a relation in the original dataset, if such an id exists.
span1: a list of strings of the first span/argument of the relation.
span2: a list of strings of the second span/argument of the relation
signal: a list of strings of signals/markers of the relation in context, if any.
context: a text string of the context in which the relation appears.
idx: a dictionary in the form of {'span1': [], 'span2': [], 'signal': []} to store indexes of span1, span2, and signal in context. Each value in the idx dictionary is a list of lists of start and end indexes of spans and the signal. For example, if span1 has multi tokens in context with start:end indexes 2:5 and 10:13, respectively, span1's value in idx will be [[2, 5], [10, 13]]. Lists are sorted based on the start indexes of tokens.
label: label of the relation, 0: non-causal, 1: causal
direction: direction between span1 and span2. 0: span1 => span2, 1: span1 <= span2, -1: not-specified
source: id of the source dataset (ids are listed in a table below)
split: 0: train, 1: dev, test: 2. This is the split to which the relation belongs in the original dataset. If there is no split specified for a relation in the original dataset, we assign the relation to the train split by default.

Note: The reason we save a list of strings instead of a single string for span1, span2, and signal is that these text spans may contain multiple non-consecutive sub-spans in context.

`CREST` conversion

We provide helper methods to convert CREST-formatted data to popular formats and annotation schemes, mainly formats that are used across relation extraction/classification tasks. In the following, there is a list of formats for which we have already developed CREST converter methods:

brat: we have provided helper methods to convert CREST-formatted data frames to brat (brat to CREST converters will be added soon). brat is a popular web-based annotation tool that has been used for a variety of relation extraction NLP tasks. We use brat for two main reasons: 1) better visualization of causal and non-causal relations and their arguments, and 2) modifying relations annotations if needed and adding new annotations to provided context. In the following, there is a sample of a converted version of CREST-formatted relation to brat (example is taken from CaTeRS dataset):
TACRED: TACRED is a large-scale relation extraction dataset. We convert samples from CREST to TACRED since TACRED-formatted data can be easily used as input to many transformers-based language models.

Available Data Resources

List of data resources already converted to CREST format:

Id	Data resource	Samples	Causal	Non-causal	Availability
1	SemEval 2007 task 4	1,529	114	1,415	Public
2	SemEval 2010 task 8	10,717	1,331	9,386	Public
3	EventCausality	485	485	-	Public
4	Causal-TimeBank	318	318	-	Public
5	EventStoryLine v1.5	2,608	2,608	-	Public
6	CaTeRS	2,502	308	2,194	Public
7	BECauSE v2.1	729	554	175	Partially Public
8	Choice of Plausible Alternatives (COPA)	2,000	1,000	1,000	Public
9	The Penn Discourse Treebank (PDTB) 3.0	7,991	7,991	-	Not Public
10	BioCause Corpus	844	844	-	Public

How you can contribute:

Are there any related datasets you don’t see in the list? Let us know or feel free to submit a Pull Request (PR), we actively check the PRs and appreciate it ☺️
Is there a well-known or widely-used machine-readable format you think can be added? We can add the helper methods for conversion or we appreciate PRs.

How to cite CREST?

For now, please cite our arXiv paper:

@article{hosseini2021predicting,
  title={Predicting Directionality in Causal Relations in Text},
  author={Hosseini, Pedram and Broniatowski, David A and Diab, Mona},
  journal={arXiv preprint arXiv:2103.13606},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
crest		crest
data		data
notebooks		notebooks
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CREST: A Causal Relation Schema for Text 🚀

`CREST` format

`CREST` conversion

Available Data Resources

How you can contribute:

How to cite CREST?

About

Releases

Packages

Languages

Zhangbeibei1991/CREST

Folders and files

Latest commit

History

Repository files navigation

CREST: A Causal Relation Schema for Text 🚀

CREST format

CREST conversion

Available Data Resources

How you can contribute:

How to cite CREST?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`CREST` format

`CREST` conversion

Packages