Multi-task Solution for Aspect Category Sentiment Analysis (ACSA) on Vietnamese Datasets
We solved 2 tasks in the Vietnamese Aspect-based Sentiment Analysis problem: Aspect Category Detection (ACD) and Sentiment Polarity Classification (SPC). Besides, we proposed end-to-end models to handle the above tasks simultaneously for 2 domains (Restaurant and Hotel) in the VLSP 2018 ABSA dataset using PhoBERT as Pre-trained language models for Vietnamese in 2 ways:
The models achieved good results when concatenating the last 4 layers of BERT together. You can download the model weights here.
- The VLSP 2018 Aspect-based Sentiment Analysis dataset:
Domain | Dataset | Reviews | Aspects | AvgLength | VocabSize | DiffVocab |
---|---|---|---|---|---|---|
Training | 2,961 | 9,034 | 54 | 5,168 | - | |
Restaurant | Dev | 1,290 | 3,408 | 50 | 3,398 | 1,702 |
Test | 500 | 2,419 | 163 | 3,375 | 1,729 | |
Training | 3,000 | 13,948 | 47 | 3,908 | - | |
Hotel | Dev | 2,000 | 7,111 | 23 | 2,745 | 1,059 |
Test | 600 | 2,584 | 30 | 1,631 | 346 |
- Preprocessing:
flowchart LR
A[Remove\nHTML] --> B[Standardize\nUnicode] --> C[Normalize\nAcronym] --> D[Word\nSegmentation] --> E[Remove\nunnecessary\ncharacters]
Task | Method | Hotel | Restaurant | ||||
---|---|---|---|---|---|---|---|
Precision | Recall | F1-score | Precision | Recall | F1-score | ||
Aspect Detection |
VLSP best submission | 76.00 | 66.00 | 70.00 | 79.00 | 76.00 | 77.00 |
Bi-LSTM+CNN | 84.03 | 72.52 | 77.85 | 82.02 | 77.51 | 79.70 | |
BERT-based Hierarchical | - | - | 82.06 | - | - | 84.23 | |
Multi-task | 87.45 | 78.17 | 82.55 | 81.09 | 85.61 | 83.29 | |
Multi-task Multi-branch | 63.21 | 57.86 | 60.42 | 80.81 | 87.39 | 83.97 | |
Aspect + Polarity |
VLSP best submission | 66.00 | 57.00 | 61.00 | 62.00 | 60.00 | 61.00 |
Bi-LSTM+CNN | 76.53 | 66.04 | 70.90 | 66.66 | 63.00 | 64.78 | |
BERT-based Hierarchical | - | - | 74.69 | - | - | 71.30 | |
Multi-task | 81.90 | 73.22 | 77.32 | 69.66 | 73.54 | 71.55 | |
Multi-task Multi-branch | 57.55 | 52.67 | 55.00 | 68.69 | 74.29 | 71.38 |