Skip to content

A dataset of sentences with continuous formality annotations

Notifications You must be signed in to change notification settings

ee-2/in_formal_sentences

Repository files navigation

In_Formal Sentences

This repository contains a dataset with sentences from a wide range of genres with human formality assessments on a continuous formal-informal scale. Each sentence has a score between +1 (most formal) and -1 (most informal).

We provide the following files:

  • in_formal_sentences.tsv: the dataset
  • in_formal_sentences.json: the dataset with further information (e.g., reference, twitter id, or url)
  • annotator_instructions.md: the annotation guidelines in German

Altogether, the dataset includes 3,000 German sentences from 12 domains (tweets, Reddit posts, subtitles, comments, emails, blogs, fiction, news, Wikipedia, political speeches, legal documents, and scientific texts) to cover the broad spectrum of formality. Formality annotations were obtained using crowdsourcing (data workers were German native speakers) and Best-worst scaling.

Citation

When using the data, please cite:

@inproceedings{eder-etal-2023,
    title = "A Question of Style: A Dataset for Analyzing Formality on Different Levels",
    author = "Eder, Elisabeth  and
      	      Krieg-Holz, Ulrike  and
      	      Wiegand, Michael",
    booktitle = "Findings of the Association for Computational Linguistics: EACL 2023",
    month = may,
    year = "2023",
    address = "Dubrovnik, Croatia",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-eacl.42",
    pages = "580--593"
}

License

Sentences with an id containing comments are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0) since they originate from a corpus restricted to non-commercial use only. For the other sentences a Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0) applies.

About

A dataset of sentences with continuous formality annotations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published