Scene_Explainer using AI

"A picture is worth a thousand words" is an idiom that conveys its meaning as 'Seeing something is better for learning than having it described'. But what if you see something(image/scene) for the first time and your brain can't analyze what is it?

Don't worry! An automatic AI model which generators caption or explain the scene is all you need to analyze smth you see for the first time.

This project is all about generating captions by extracting the features from the images and predicting the captions from the model.

Dataset

The MS COCO (Microsoft Common Objects in Context) dataset is large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images. The 2014 training/validation split is 118K/5K and the test set is a subset of 41K images. The dataset has annotations for captioning: natural language descriptions of the images. You can download the dataset here https://cocodataset.org/#download

Model Overview

Resnet50 is used for image classification and for the extraction of features.

The attention Based Mechanism is used for caption generation. I have used Bahdanau’s Attention or Local Attention. The attention mechanism is focusing on the relevant part of the image, so the decoder only uses specific parts of the image.

Result

Note :

This is a phase 1 model, as I will be improving it using transforms or GPT3 model

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
codes		codes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scene_Explainer using AI

Dataset

Model Overview

Result

Note :

About

Releases

Packages

Languages

Sree118/Scene_description

Folders and files

Latest commit

History

Repository files navigation

Scene_Explainer using AI

Dataset

Model Overview

Result

Note :

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages