Image Caption Generator using VGG16 CNN and LSTM

This repository contains the code and resources for building an Image Caption Generator using the VGG16 Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) architecture. It uses the Flickr8k dataset to train the model.

Overview

The Image Caption Generator is a deep learning model that generates textual descriptions for images. It combines the power of a pre-trained VGG16 CNN to extract image features and an LSTM network to generate captions. This project showcases how to preprocess images, build and train the model, and generate captions for new images.

Features

Utilizes the VGG16 model pre-trained on ImageNet for image feature extraction.
Uses LSTM (Long Short-Term Memory) for sequence generation.
Trains on the Flickr8k dataset, a widely used dataset for image captioning tasks.
Provides a user-friendly interface for generating captions for custom images.
Easily customizable for different datasets and model architectures.

Prerequisites

Before you begin, ensure you have met the following requirements:

Python 3.x
TensorFlow 2.x
Keras
NumPy
Matplotlib
Flickr8k dataset
Flickr8k text descriptions

Installation

Clone this repository:

git clone https://github.com/ashishyadav2/SeptaSEM.git

Install the required packages:
```
pip install -r requirements.txt
```
Download and unzip the Flickr8k dataset and place it in the data directory.

Usage

Train the model using the following command:
```
python train.py
```
Once the model is trained, you can generate captions for your own images using:
```
python generate_caption.py --image <path_to_image>
```

Model Architecture

Results

Here are some sample results from the model:

Contributing

Contributions are welcome! If you have any ideas, enhancements, or bug fixes, please open an issue or create a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

This project is inspired by the Show and Tell: A Neural Image Caption Generator paper by Oriol Vinyals, et al.
The Flickr8k dataset was originally compiled by Samy Bengio, Pierre-Emmanuel Leni, and others.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
notebooks		notebooks
trained_models		trained_models
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
major_project.py		major_project.py
model.png		model.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Caption Generator using VGG16 CNN and LSTM

Overview

Features

Prerequisites

Installation

Usage

Model Architecture

Results

Contributing

License

Acknowledgments

About

Releases

Packages

Languages

ashishyadav2/SeptaSEM

Folders and files

Latest commit

History

Repository files navigation

Image Caption Generator using VGG16 CNN and LSTM

Overview

Features

Prerequisites

Installation

Usage

Model Architecture

Results

Contributing

License

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages