The visual relationship detection is identifying the relations between objects in an image. The machine
learning algorithms used, require a large database as the models were trained for almost every type of
relationship, which this project tries to tackle. This project has two unique factors, 1) A model was
trained to extract the features of the objects and their relationship. 2)These extracted features were,
with a word embedding algorithm, used to train a model to detect the relationships. Hence, by creating
a language bias, the dataset required for training will be relatively small and be useful in cases where
the occurrence of a relationship is infrequent.
This project deals with a visual relationship dataset that contains 5000 images, including 37,993 relations, 100 object categories, and 70 predicate categories.
The concept of ‘Transfer learning’ was introduced to utilize the knowledge from previously trained models to train
new models, which will in turn help in solving problems like large datasets in the new model. This
concept has been used in this project.
To run this model,
- Download the train files from https://drive.google.com/drive/folders/10TxM6vc8xUrSxeiX8CnwD6YFNc1nYgGR?usp=sharing.
- Open the "Image features" file usng google colab (if creating new files for image feature).
- Open the "Word Embedding + Image features" using google colab.