Skip to content

Commit

Permalink
Small README note added.
Browse files Browse the repository at this point in the history
  • Loading branch information
jponttuset committed Jan 23, 2024
1 parent 2789aa6 commit ad44c38
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion web-data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,8 @@ We label the validation split as `dev`.

The original COCO dataset has five captions per `image_id`. We flattened it by
converting each COCO record into five records with one caption each and with
`image_id` set to `image_id_N` for the Nth caption where N=\(1,2,3,4,5\).
`image_id` set to `image_id_N` for the Nth caption where N=\(1,2,3,4,5\). The
captions were tokenized and lowercased.

The published CC3M data does not provide an `image_id` hence we use `rec_num` to
allow our users to identify the corresponding image and caption in the published
Expand Down

0 comments on commit ad44c38

Please sign in to comment.