Skip to content

In this project, we use a Deep Recurrent Architecture, which uses CNN (VGG-16 Net) pretrained on ImageNet to extract 4096-Dimensional image feature Vector and an LSTM which generates a caption from these feature vectors.

License

Notifications You must be signed in to change notification settings

aviralchharia/Neural-Image-Captioning

Repository files navigation

Neural Image Captioning

Image Captioning using advanced Computer Vision and Natural language processing methods. Implemented VGG-16 Net Convolutional Neural Network for extracting 4096-Dimensional feature vector & LSTM for novel sentence generation in natural language. Trained the model on Flickr8k dataset and obtained BLEU Score at par with state-of-the-art papers which uses a similiar implementation.

References:

[1] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. arXiv preprint arXiv:1411.4555, 2014.

[2] Andrej Karpathy, Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3128–3137.A.

[3] Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. arXiv preprint arXiv:1502.03044, 2015.

About

In this project, we use a Deep Recurrent Architecture, which uses CNN (VGG-16 Net) pretrained on ImageNet to extract 4096-Dimensional image feature Vector and an LSTM which generates a caption from these feature vectors.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published