Image_to_Text

Taking the image description task on the MS-COCO data set as an example, the template code of Image_to_Text is shown.

以在MS-COCO 数据集上的图片描述任务为例，展示了Image_to_Text的模板代码。

Main principle

The model consists of CNN-Encoder and RNN-Decoder. The CNN-Encoder is used to extract the information of the input image to generate the intermediate representation H, and then use RNN-Decode to gradually decode the H (using Bahdanau Attention) to generate a text description corresponding to the image.

模型由CNN-Encoder和RNN-Decoder组成，首先使用CNN-Encoder提取输入图片的信息生成中间表示H，然后使用RNN-Decode对H逐步解码（使用了BahdanauAttention）生成图片对应的文本描述。

Input: image_features.shape (16, 299, 299, 3)
---------------Pass by cnn_encoder---------------
Output: image_features_encoder.shape (16, 64, 256)

Input: batch_words.shape (16, 1)
Input: rnn state shape (16, 512)
---------------Pass by rnn_decoder---------------
Output: out_batch_words.shape (16, 5031)
Output: out_state.shape (16, 512)
Output: attention_weights.shape (16, 64, 1)

Require

python 3+
tensorflow version 2

Code usage

1. Prepare Data

python dataset_utils.py

2. Train Model

python train_image2text_model.py

3. Model Inference

python inference_by_image2text_model.py

Experimental result

EPOCHS=20

loss

inference_image_caption outputs

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
output_inference_image		output_inference_image
README.md		README.md
dataset_utils.py		dataset_utils.py
example_tfrecord_data_1.png		example_tfrecord_data_1.png
example_tfrecord_data_2.png		example_tfrecord_data_2.png
example_tfrecord_data_3.png		example_tfrecord_data_3.png
image2text_model.py		image2text_model.py
image_captioning.ipynb		image_captioning.ipynb
inference_by_image2text_model.py		inference_by_image2text_model.py
loss.png		loss.png
train_image2text_model.py		train_image2text_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image_to_Text

Main principle

Require

Code usage

1. Prepare Data

2. Train Model

3. Model Inference

Experimental result

About

Releases

Packages

Languages

wuqi7812/WU

Folders and files

Latest commit

History

Repository files navigation

Image_to_Text

Main principle

Require

Code usage

1. Prepare Data

2. Train Model

3. Model Inference

Experimental result

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages