Taking the image description task on the MS-COCO data set as an example, the template code of Image_to_Text is shown.
以在MS-COCO 数据集上的图片描述任务为例,展示了Image_to_Text的模板代码。
The model consists of CNN-Encoder and RNN-Decoder. The CNN-Encoder is used to extract the information of the input image to generate the intermediate representation H, and then use RNN-Decode to gradually decode the H (using Bahdanau Attention) to generate a text description corresponding to the image.
模型由CNN-Encoder和RNN-Decoder组成,首先使用CNN-Encoder提取输入图片的信息生成中间表示H,然后使用RNN-Decode对H逐步解码(使用了BahdanauAttention)生成图片对应的文本描述。
Input: image_features.shape (16, 299, 299, 3)
---------------Pass by cnn_encoder---------------
Output: image_features_encoder.shape (16, 64, 256)
Input: batch_words.shape (16, 1)
Input: rnn state shape (16, 512)
---------------Pass by rnn_decoder---------------
Output: out_batch_words.shape (16, 5031)
Output: out_state.shape (16, 512)
Output: attention_weights.shape (16, 64, 1)
- python 3+
- tensorflow version 2
python dataset_utils.py
python train_image2text_model.py
python inference_by_image2text_model.py
EPOCHS=20
loss
inference_image_caption outputs