Image Captioning using advanced Computer Vision and Natural language processing methods. Implemented VGG-16 Net Convolutional Neural Network for extracting 4096-Dimensional feature vector & LSTM for novel sentence generation in natural language. Trained the model on Flickr8k dataset and obtained BLEU Score at par with state-of-the-art papers which uses a similiar implementation.
[1] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. arXiv preprint arXiv:1411.4555, 2014.
[2] Andrej Karpathy, Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3128–3137.A.
[3] Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. arXiv preprint arXiv:1502.03044, 2015.