Name		Name	Last commit message	Last commit date
parent directory ..
code_sft		code_sft
nextqa		nextqa
nlvr2		nlvr2
refcoco		refcoco
README.md		README.md
download_data_hf.py		download_data_hf.py

README.md

This document provides examples to fine-tune Aria on three different datasets: single-image data, multi-image data and video data.

Fine-tune on single-image dataset

We use a 30k subset of the RefCOCO dataset as an example. RefCOCO is a visual grounding task. Given an image and a description of the reference object as input, the model is expected to output corresponding bounding box. For a given bounding box, we normalize its coordinates to [0,1000) and transform it into "(x1,y1), (x2,y2)". Please refer to RefCOCO_Example for more details!

Fine-tune on multi-image dataset

We use the NLVR2 dataset as an example. NLVR2 (Natural Language for Visual Reasoning) is a task where given two images, the model needs to determine whether a claim is true by answering yes or no. Please refer to NLVR2_Example for details!

Fine-tune on video dataset

We use the NextQA dataset as an example. NextQA requires the model to select an answer from several options according to the video input and question. The model is expected to output the correct option's character. Please refer to NextQA_Example for details!

Fine-tune on code dataset

We use the Magicoder-Evol-Instruct-110k dataset as an example to further finetune Aria for generating high-quality code. Please refer to Code-SFT_Example for details!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

README.md

Fine-tune on single-image dataset

Fine-tune on multi-image dataset

Fine-tune on video dataset

Fine-tune on code dataset

Files

examples

Directory actions

More options

Directory actions

More options

Latest commit

History

examples

Folders and files

parent directory

README.md

Fine-tune on single-image dataset

Fine-tune on multi-image dataset

Fine-tune on video dataset

Fine-tune on code dataset