一个情感分析的练手项目
酒店的评价文本,正面和负面各1000个,本实验在这基础上做情感分类。数据集来源: http://www.datatang.com/data/11936.
预处理包括以下几个步骤:
- 替换不相关符号
- 去掉标点
- 分词
- 去掉停用词(停用词见停用词表)
参数设置
embedding_unit = 200
lstm_unit = 120
hidden_units = [80, ]
结构
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_4 (Embedding) (None, None, 200) 2381000
_________________________________________________________________
lstm_1 (LSTM) (None, 120) 154080
_________________________________________________________________
dense_6 (Dense) (None, 80) 9680
_________________________________________________________________
dense_7 (Dense) (None, 1) 81
=================================================================
Total params: 2,544,841
Trainable params: 2,544,841
Non-trainable params: 0
结果
Epoch 30/40
1757/1757 [==============================]1757/1757 [==============================] - 7s 4ms/step - loss: 4.2043e-05 - acc: 1.0000 - val_loss: 4.0901e-05 - val_acc: 1.0000
Epoch 31/40
1757/1757 [==============================]1757/1757 [==============================] - 7s 4ms/step - loss: 4.0512e-05 - acc: 1.0000 - val_loss: 3.9406e-05 - val_acc: 1.0000
Epoch 32/40
1757/1757 [==============================]1757/1757 [==============================] - 7s 4ms/step - loss: 3.9030e-05 - acc: 1.0000 - val_loss: 3.7960e-05 - val_acc: 1.0000
Epoch 33/40
1757/1757 [==============================]1757/1757 [==============================] - 7s 4ms/step - loss: 3.7592e-05 - acc: 1.0000 - val_loss: 3.6552e-05 - val_acc: 1.0000
Epoch 34/40
1757/1757 [==============================]1757/1757 [==============================] - 7s 4ms/step - loss: 3.6195e-05 - acc: 1.0000 - val_loss: 3.5186e-05 - val_acc: 1.0000
Epoch 35/40
1757/1757 [==============================]1757/1757 [==============================] - 8s 5ms/step - loss: 3.4838e-05 - acc: 1.0000 - val_loss: 3.3860e-05 - val_acc: 1.0000
Epoch 36/40
1757/1757 [==============================]1757/1757 [==============================] - 7s 4ms/step - loss: 3.3522e-05 - acc: 1.0000 - val_loss: 3.2575e-05 - val_acc: 1.0000
Epoch 37/40
1757/1757 [==============================]1757/1757 [==============================] - 7s 4ms/step - loss: 3.2247e-05 - acc: 1.0000 - val_loss: 3.1331e-05 - val_acc: 1.0000
Epoch 38/40
1757/1757 [==============================]1757/1757 [==============================] - 8s 5ms/step - loss: 3.1013e-05 - acc: 1.0000 - val_loss: 3.0130e-05 - val_acc: 1.0000
Epoch 39/40
1757/1757 [==============================]1757/1757 [==============================] - 7s 4ms/step - loss: 2.9836e-05 - acc: 1.0000 - val_loss: 2.9014e-05 - val_acc: 1.0000
Epoch 40/40
1757/1757 [==============================]1757/1757 [==============================] - 7s 4ms/step - loss: 2.8728e-05 - acc: 1.0000 - val_loss: 2.7935e-05 - val_acc: 1.0000
超参数
embedding_size = 200
conv_size = 5
filters = 4
hidden_units = [80, ]
结构
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_10 (Embedding) (None, None, 200) 2381000
_________________________________________________________________
conv1d_9 (Conv1D) (None, None, 4) 4004
_________________________________________________________________
global_max_pooling1d_5 (Glob (None, 4) 0
_________________________________________________________________
dense_17 (Dense) (None, 80) 400
_________________________________________________________________
dense_18 (Dense) (None, 1) 81
=================================================================
Total params: 2,385,485
Trainable params: 2,385,485
Non-trainable params: 0
结果
Train on 1757 samples, validate on 196 samples
2s 1ms/step - loss: 0.0012 - acc: 1.0000 - val_loss: 0.0012 - val_acc: 1.0000
Epoch 21/30
1757/1757 [==============================]1757/1757 [==============================] - 2s 1ms/step - loss: 0.0012 - acc: 1.0000 - val_loss: 0.0012 - val_acc: 1.0000
Epoch 22/30
1757/1757 [==============================]1757/1757 [==============================] - 2s 1ms/step - loss: 0.0011 - acc: 1.0000 - val_loss: 0.0011 - val_acc: 1.0000
Epoch 23/30
1757/1757 [==============================]1757/1757 [==============================] - 2s 1ms/step - loss: 0.0011 - acc: 1.0000 - val_loss: 0.0011 - val_acc: 1.0000
Epoch 24/30
1757/1757 [==============================]1757/1757 [==============================] - 2s 1ms/step - loss: 0.0010 - acc: 1.0000 - val_loss: 0.0010 - val_acc: 1.0000
Epoch 25/30
1757/1757 [==============================]1757/1757 [==============================] - 2s 1ms/step - loss: 9.8992e-04 - acc: 1.0000 - val_loss: 9.6657e-04 - val_acc: 1.0000
Epoch 26/30
1757/1757 [==============================]1757/1757 [==============================] - 2s 1ms/step - loss: 9.4789e-04 - acc: 1.0000 - val_loss: 9.2600e-04 - val_acc: 1.0000
Epoch 27/30
1757/1757 [==============================]1757/1757 [==============================] - 2s 1ms/step - loss: 9.0847e-04 - acc: 1.0000 - val_loss: 8.8791e-04 - val_acc: 1.0000
Epoch 28/30
1757/1757 [==============================]1757/1757 [==============================] - 2s 1ms/step - loss: 8.7143e-04 - acc: 1.0000 - val_loss: 8.5210e-04 - val_acc: 1.0000
Epoch 29/30
1757/1757 [==============================]1757/1757 [==============================] - 2s 994us/step - loss: 8.3659e-04 - acc: 1.0000 - val_loss: 8.1839e-04 - val_acc: 1.0000
Epoch 30/30
1757/1757 [==============================]1757/1757 [==============================] - 2s 995us/step - loss: 8.0378e-04 - acc: 1.0000 - val_loss: 7.8663e-04 - val_acc: 1.0000