Skip to content

hsci-r/fi-quote-coref-corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quote and coreference corpus of Finnish news

This repository contains a corpus of quote and coreference annotations annotated in publicly available Finnish news media articles.

The directories a01-a10 contain data produced by each of the 10 annotators. Each directory contains:

  • the source texts in CoNLL format,
  • the annotated files in WebAnno-TSV format,
  • the text and annotations converted to a single CSV file per annotator using the attached conversion script.

Note that the repository uses Git LFS to manage the data files.

The conversion script convert.sh requires flopo-formats.

The dataset is described in the following upcoming publication:

Maciej Janicki, Antti Kanner and Eetu Mäkelä. Detection and attribution of quotes in Finnish news media: BERT vs. rule-based approach. In: Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), Tórshavn, Faroe Islands, May 2023.

About

Corpus of Finnish quote attributions and coreferences.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages