Enhancing-High-Vocabulary-IA-with-a-Novel-Attention-Based-Pooling

Official Pytorch Implementation of: "Enhancing High-Vocabulary Image Annotation with a Novel Attention-Based Pooling"

Datasets

Three well-known datasets are mostly used in AIA tasks. In addition, we have utilized a dataset with a significantly larger number of images and a vocabulary list consisting of 500 words, which has a very high level of complexity. The table below provides details about these datasets. It is also possible to download them from the links provided. (After downloading each dataset, replace its 'images' folder with the corresponding 'images' folder in the 'datasets' folder).

Dataset	Num of images	Num of training images	Num of testing images	Num of vocabularies	Labels per image	Image per label
Corel 5k	5,000	4,500	500	260	3.4	58.6
ESP Game	20,770	18,689	2081	268	4.7	362.7
IAPR TC-12	19,627	17,665	1962	291	5.7	347.7
VG-500	92,904	82,904	10,000	500	13.6	2256.6

We employed the SSGRL settings when working with the VG 500 dataset, which involves selecting images from the 500 most common categories and then dividing the data into training and testing subsets. We also attempted to identify the names of labels (vocabulary) for the mentioned dataset. Please let us know if there are any errors.

model

Attention Maps

Train and Evaluation

To train the model in Spyder IDE use the code below:

run main.py --data {select training dataset} --loss-function {select loss function}

Please note that:

You should put Corel-5k, ESP-Game, IAPR-TC-12, or VG-500 in {select training dataset}.
You should put the proposedLoss in {select loss function}.
When using the VG-500 dataset, change the "image-size" to 576, change the "gamma_neg" in proposedLoss to 2, and set batch size to 128.

To evaluate the model in Spyder IDE use the code below:

run main.py --data {select training dataset} --loss-function {select loss function} --evaluate

Results

Proposed method:

data	precision	recall	f1-score	N+	mAP
Corel 5k	0.453	0.611	0.520	202	-
IAPR TC-12	0.515	0.584	0.547	287	-
ESP Game	0.442	0.500	0.470	262	-
VG-500	0.409	0.502	0.451	477	42.515

Citation

Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follows:

@article{salar2024enhancing,
  title={Enhancing high-vocabulary image annotation with a novel attention-based pooling},
  author={Salar, Ali and Ahmadi, Ali},
  journal={The Visual Computer},
  pages={1--15},
  year={2024},
  publisher={Springer}
}

Contact

I would be happy to answer any questions you may have - Ali Salar (parham1998resume@gmail.com)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.spyproject/config		.spyproject/config
checkpoints		checkpoints
datasets		datasets
pretrained_model		pretrained_model
Corel 5k - log.txt		Corel 5k - log.txt
README.md		README.md
datasets.py		datasets.py
engine.py		engine.py
evaluation_metrics.py		evaluation_metrics.py
loss_functions.py		loss_functions.py
main.py		main.py
models.py		models.py
transformer.py		transformer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enhancing-High-Vocabulary-IA-with-a-Novel-Attention-Based-Pooling

Datasets

model

Attention Maps

Train and Evaluation

Results

Citation

Contact

About

Languages

parham1998/Enhancing-High-Vocabulary-IA-with-a-Novel-Attention-Based-Pooling

Folders and files

Latest commit

History

Repository files navigation

Enhancing-High-Vocabulary-IA-with-a-Novel-Attention-Based-Pooling

Datasets

model

Attention Maps

Train and Evaluation

Results

Citation

Contact

About

Topics

Resources

Stars

Watchers

Forks

Languages