This repository contains the code and resources for building an Image Caption Generator using the VGG16 Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) architecture. It uses the Flickr8k dataset to train the model.
The Image Caption Generator is a deep learning model that generates textual descriptions for images. It combines the power of a pre-trained VGG16 CNN to extract image features and an LSTM network to generate captions. This project showcases how to preprocess images, build and train the model, and generate captions for new images.
- Utilizes the VGG16 model pre-trained on ImageNet for image feature extraction.
- Uses LSTM (Long Short-Term Memory) for sequence generation.
- Trains on the Flickr8k dataset, a widely used dataset for image captioning tasks.
- Provides a user-friendly interface for generating captions for custom images.
- Easily customizable for different datasets and model architectures.
Before you begin, ensure you have met the following requirements:
- Python 3.x
- TensorFlow 2.x
- Keras
- NumPy
- Matplotlib
- Flickr8k dataset
- Flickr8k text descriptions
-
Clone this repository:
git clone https://github.com/ashishyadav2/SeptaSEM.git
-
Install the required packages:
pip install -r requirements.txt
-
Download and unzip the Flickr8k dataset and place it in the
data
directory.
-
Train the model using the following command:
python train.py
-
Once the model is trained, you can generate captions for your own images using:
python generate_caption.py --image <path_to_image>
Here are some sample results from the model:
Contributions are welcome! If you have any ideas, enhancements, or bug fixes, please open an issue or create a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.
- This project is inspired by the Show and Tell: A Neural Image Caption Generator paper by Oriol Vinyals, et al.
- The Flickr8k dataset was originally compiled by Samy Bengio, Pierre-Emmanuel Leni, and others.