Mining stylometric representation for authorship analysis. This repository contains several multi-language NLP utilities for text proccessing and several models for authorship analysis. The ca.mcgill.sis.dmas.nlp.model.astyle package contains the implementation of the following models:
- Joint Topical-Lexical Modality
- Character Modality
- Syntactic Modality
- LDA, LSA
- N-grams, static features, typed N-grams
- Two baselines from PAN2016
Example runs are included in the ca.mcgill.sis.dmas.nlp.exp package. You can refer to the source code for API usage.
- PAN2014 Authorship Verification
- IMDB62 Authorship Identification
- ICWSM2012 Authorship Characterization
- PAN2013 Authorship Characterization
StyloMatrix was developed by Steven H. H. Ding under the supervision of Benjamin C. M. Fung of the Data Mining and Security Lab at McGill University in Canada. If you find StyloMatrix useful, please cite our paper:
- S. H. H. Ding, B. C. M. Fung, F. Iqbal, and W. K. Cheung. Learning stylometric representations for authorship analysis. IEEE Transactions on Cybernetics (CYB), 49(1):107-121, January 2019. IEEE Systems, Man, and Cybernetics Society.
This project is purely written in Java with Maven. You need the following dependencies:
- [Required] The latest x64 8.x/9.x JRE/JDK distribution from Oracle.
- [Required] The latest Maven distribution. Its 'bin' folder should be in your system's 'Path' environment.
The following commands will compile this project (executed at the root directory of the source code).
pushd lib/
# Install the POS tagger for Greek and its resources.
mvn install:install-file -Dfile=${basedir}/lib/GreekTagger-0.0.1.jar -DgroupId=local -DartifactId=greek-tagger -Dversion=0.0.1 -Dpackaging=jar
# Install the hunspell spell checking package.
mvn install:install-file -Dfile=${basedir}/lib/hunspell.jar -DgroupId=local -DartifactId=hunspell -Dversion=0.0.1 -Dpackaging=jar
# Install the AUROC calculation package.
mvn install:install-file -Dfile=${basedir}/lib/auc.jar -DgroupId=local -DartifactId=auc -Dversion=0.0.1 -Dpackaging=jar
popd
# Build the final jar with all dependencies:
mvn package
# The compiled jar file target/authorship-0.0.1-SNAPSHOT-jar-with-dependencies.jar contains all the dependencies.
# We suggest to append this jar file into your systems' 'CLASSPATH' environment variable for this session:
SET CLASSPATH=absolute_path_of_the_authorship-0.0.1-SNAPSHOT-jar-with-dependencies.jar
This project is written with Eclipse. You can import it as an existing eclipse maven project. Other Java IDEs that support maven projects are compatible. Please refer to the instruction of your chosen IDE to import this project. You would also need to execute the following maven commands in your IDE to resolve local dependencies:
# Install the POS tagger for Greek and its resources.
mvn install:install-file -Dfile=${basedir}/lib/GreekTagger-0.0.1.jar -DgroupId=local -DartifactId=greek-tagger -Dversion=0.0.1 -Dpackaging=jar
# Install the hunspell spell checking package.
mvn install:install-file -Dfile=${basedir}/lib/hunspell.jar -DgroupId=local -DartifactId=hunspell -Dversion=0.0.1 -Dpackaging=jar
# Install the AUROC calculation package.
mvn install:install-file -Dfile=${basedir}/lib/auc.jar -DgroupId=local -DartifactId=auc -Dversion=0.0.1 -Dpackaging=jar
The software was developed by Steven H. H. Ding under the supervision of Benjamin C. M. Fung at the McGill Data Mining and Security Lab. It is distributed under the Creative Commons Attribution-NonCommercial-NoDerivatives License. Please refer to LICENSE.txt for details.
Copyright 2017 McGill University. All rights reserved.