This baseline system performs relevance classification, sentiment analysis, aspect categorization and opinion target extraction from a document. It was designed for the GermEval 2017 Shared Task on Aspect Based Sentiment Analysis (https://sites.google.com/view/germeval2017-absa/) JavaDoc documentation is available on the documentation page.
You can download the compiled project from: https://ltnas1.informatik.uni-hamburg.de:8081/owncloud/index.php/s/0dht3oSyBhTZDzL
The baseline system contains two classifiers. A SVM is used for relevance classification, sentiment analysis and aspect categorization. For opinion target identification, it uses a CRF classifier.
The SVM uses the following features:
- term frequency
- German sentiment lexica [1]
The CRF classifier uses these features:
- the token (without standardization/lemmatization/lowercasing)
- the POS tag
Both features are unigram features on the current token, no preceding/following tokens are taken into account.
The download package and the project come with pre-trained models. To train the models, use the following command:
java -cp ABSA-Baseline-0.0.1-SNAPSHOT.jar uhh_lt.GermEval2017.baseline.TrainAllClassifiers path/to/train.xml
If you want to train from the TSV file, you can do so as well. Note, that in this case, no model for Aspect Target Identification is being trained.
java -cp ABSA-Baseline-0.0.1-SNAPSHOT.jar uhh_lt.GermEval2017.baseline.TrainAllClassifiers path/to/train.tsv
You can test the models by executing:
java -cp ABSA-Baseline-0.0.1-SNAPSHOT.jar uhh_lt.GermEval2017.baseline.Classify path/to/dev.xml
or
java -cp ABSA-Baseline-0.0.1-SNAPSHOT.jar uhh_lt.GermEval2017.baseline.Classify path/to/dev.tsv
This will produce a file with "_classified" added to its name. This file contains the predictions.
To evaluate your classifier performance, you should ue the official GermEval2017 evaluation script: https://github.com/muchafel/GermEval2017
You can compile the project using Maven:
- go into the project folder
- execute
mvn clean compile package
- the compiled project is under ./target/
You can use the system to get a prediction that can be used in an ensemble system. A quick way to do this is to add the following repository to your projects POM file:
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
Now, you can add the project's dependency:
<dependency>
<groupId>com.github.uhh-lt</groupId>
<artifactId>GermEval2017-Baseline</artifactId>
<version>-SNAPSHOT</version>
</dependency>
To access the models, you need to copy them into the data folder; you can get models from the compiled project.
Now, you can add the 'AbSentiment' classifier to your code and analyze text documents:
import uhh_lt.GermEval2017.baseline.type.Result;
import uhh_lt.GermEval2017.baseline.AbSentiment;
public class MyClass {
public static void main(String[] args) {
AbSentiment analyzer = new AbSentiment();
Result result = analyzer.analyzeText("This is the input string");
// get Sentiment of text
System.out.println(result.getSentiment());
System.out.println(result.getSentimentScore());
}
}
This software is published under the Apache Software Licence 2.0.
[1] Waltinger, U. (2010): Sentiment Analysis Reloaded: A Comparative Study On Sentiment Polarity Identification Combining Machine Learning And Subjectivity Features, In: Proceedings of the 6th International Conference on Web Information Systems and Technologies (WEBIST '10), Valencia, Spain.