Skip to content

Corpus of Portuguese Dialogues extracted from Twitter and annotated for Sentiment.

Notifications You must be signed in to change notification settings

NLP-CISUC/TwitterDialogueSAPT

Repository files navigation

Twitter Dialogues for Sentiment Analysis

The Twitter Dialogues for Sentiment Analysis corpus contains Twitter IDs from dialogues, to be used in the evaluation of Sentiment in dialogue systems.

The Twitter accounts used are related to TeleCommunications, Healthcare, and e-Commerce.

The annotation was performed manually and achieved moderate agreement between the annotators. The multi-class labels range from Very Negative (-2) to Positive (1), and have an average of 0.48+/-0.13 using the Fleiss metric, and an average of 0.66+/-0.16 using the Krippendorff. The binary labels represent Negative and Non-Negative sentiment and have an average of 0.67+/-0.16 in both metrics.

They are represented in an .xlsx file containing the following eleven headers:

  • Tweet_ID: ID of each Tweet
  • Dialog_ID: ID representing the Tweets belonging to the same dialogue
  • Median_Multiclass: Median of the 3 annotations for the multi-class scenario
  • Median_Binary: Median of the 3 annotations for the binary scenario
  • Annot_1_M: Annotation of annotator 1 for the multi-class scenario
  • Annot_2_M: Annotation of annotator 2 for the multi-class scenario
  • Annot_3_M: Annotation of annotator 3 for the multi-class scenario
  • Annot_1_B: Annotation of annotator 1 for the binary scenario
  • Annot_2_B: Annotation of annotator 2 for the binary scenario
  • Annot_3_B: Annotation of annotator 3 for the binary scenario
  • Speaker: Identification of the author of the tweet as USER or SERVICE

Versions

twitter_full_dataset_v1_sharable (october 2022): First version of the corpus, comprising 381 dialogues and 954 utterances, involving accounts related to TeleCommunications, Healthcare, and e-Commerce. Dialogues collected during April and May 2022.

twitter_full_dataset_v2_sharable (february 2023): Second version of the corpus, expanded to include 916 dialogues and 2,285 utterances. New dialogues collected during November and December 2022.

How to Cite

A paper on the creation of the first version of this corpus and some experiments with this corpus was published in the Proceedings of IberSPEECH 2022. See BibTex:

@inproceedings{carvalho22_iberspeech,
  author={Isabel Carvalho and Hugo Gonçalo Oliveira and Catarina Silva},
  title={{Sentiment Analysis in Portuguese Dialogues }},
  year=2022,
  booktitle={Proc. IberSPEECH 2022},
  pages={176--180},
  doi={10.21437/IberSPEECH.2022-36}
}

About

Corpus of Portuguese Dialogues extracted from Twitter and annotated for Sentiment.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published