Frame²

This repo contains the scripts and data related to the Frame² dataset.

The dataset comprises of three JSON Lines files. One for visual objects data/VO.jsonl and two for textual annotation: data/PT.jsonl and data/EN.jsonl.

Visual annotations

The visual annotations file consists of a list of objects similar to this:

{
  "episode": 1,
  "objectId": 313136,
  "objectTimespan": [850.36, 852.76],
  "frame": "People_by_origin",
  "frameElement": "Person",
  "boundingBoxes": [
    [850.36, 120.0, 22.0, 617.0, 444.0],
    ...
  ]
}

Transcription Annotations

The textual annotation follow the following format:

{
  "episode": 2,
  "sentenceId": 214710,
  "sentenceTimespan": [899.3, 902.2],
  "sentence": "Two hot dogs .",
  "tokens": ["Two", "hot", "dogs", "."],
  "frames": [
    {
      "id": "Cardinal_numbers",
      "span": [0, 0],
      "frameElements": [{ "id": "Entity", "span": [1, 2] }]
    }
  ]
}

The episode field identifies the Pedro Pelo Mundo episode where that object appears. The objectTimespan and sentenceTimeSpan fields are a tuple that representing the start and end miliseconds of the video where that object or trasncription appears/is spoken. frame and frameElement are the actual FrameNet entities that the visual object represents. The frames field in text annotation represents a list of all frames evoked by that sentence and their frame elements. Their labels are identified by the id field and the span field informs the tokens that evoked the frame or are the frame elements. Finally, boundingBoxes is an array of variable size (with at least one element). Each element is a 5-tuple representing a fixed time point where that bounding box appears and the four other numbers to represent the box itself. It's a tuple of (milisecond, x, y, width, heigh). Where the video resolution is 854 x 480 (or 480p).

How to cite

Frederico Belcavello, Tiago Timponi Torrent, Ely E. Matos, Adriana S. Pagano, Maucha Gamonal, Natalia Sigiliano, Lívia Vicente Dutra, Helen de Andrade Abreu, Mairon Samagaio, Mariane Carvalho, Franciany Campos, Gabrielly Azalim, Bruna Mazzei, Mateus Fonseca de Oliveira, Ana Carolina Loçasso Luz, Lívia Pádua Ruiz, Júlia Bellei, Amanda Pestana, Josiane Costa, et al.. 2024. Frame2: A FrameNet-based Multimodal Dataset for Tackling Text-image Interactions in Video. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 7429–7437, Torino, Italia. ELRA and ICCL.

License

This dataset is shared under a CC BY-NC 4.0 DEED license.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
readme.MD		readme.MD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Frame²

Visual annotations

Transcription Annotations

How to cite

License

About

Releases

Packages

Contributors 2

FrameNetBrasil/frame-squared

Folders and files

Latest commit

History

Repository files navigation

Frame²

Visual annotations

Transcription Annotations

How to cite

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages