Skip to content

HuynhXuanLam-IT44/VLSP-2018-ABSA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Aspect-based Sentiment Analysis for Vietnamese

Multi-task Solution for Aspect Category Sentiment Analysis (ACSA) on Vietnamese Datasets

Overview

We solved 2 tasks in the Vietnamese Aspect-based Sentiment Analysis problem: Aspect Category Detection (ACD) and Sentiment Polarity Classification (SPC). Besides, we proposed end-to-end models to handle the above tasks simultaneously for 2 domains (Restaurant and Hotel) in the VLSP 2018 ABSA dataset using PhoBERT as Pre-trained language models for Vietnamese in 2 ways:

  • Multi-task (ACSA-v1):
  • Multi-task with Multi-branch approach (ACSA-v2):

The models achieved good results when concatenating the last 4 layers of BERT together. You can download the model weights here.

Dataset

  • The VLSP 2018 Aspect-based Sentiment Analysis dataset:
Domain Dataset Reviews Aspects AvgLength VocabSize DiffVocab
Training 2,961 9,034 54 5,168 -
Restaurant Dev 1,290 3,408 50 3,398 1,702
Test 500 2,419 163 3,375 1,729
Training 3,000 13,948 47 3,908 -
Hotel Dev 2,000 7,111 23 2,745 1,059
Test 600 2,584 30 1,631 346
  • Preprocessing:
flowchart LR
A[Remove\nHTML] --> B[Standardize\nUnicode] --> C[Normalize\nAcronym] --> D[Word\nSegmentation] --> E[Remove\nunnecessary\ncharacters]
Loading

Results

Task Method Hotel Restaurant
Precision Recall F1-score Precision Recall F1-score
Aspect
Detection
VLSP best submission 76.00 66.00 70.00 79.00 76.00 77.00
Bi-LSTM+CNN 84.03 72.52 77.85 82.02 77.51 79.70
BERT-based Hierarchical - - 82.06 - - 84.23
Multi-task 87.45 78.17 82.55 81.09 85.61 83.29
Multi-task Multi-branch 63.21 57.86 60.42 80.81 87.39 83.97
Aspect +
Polarity
VLSP best submission 66.00 57.00 61.00 62.00 60.00 61.00
Bi-LSTM+CNN 76.53 66.04 70.90 66.66 63.00 64.78
BERT-based Hierarchical - - 74.69 - - 71.30
Multi-task 81.90 73.22 77.32 69.66 73.54 71.55
Multi-task Multi-branch 57.55 52.67 55.00 68.69 74.29 71.38