Character-level Text Classification

Implementation of character-level deep neural networks for text classification. Three models (CNN, VDCNN and GRU) are evaluated on four binary text classification datasets (Blog Authorship Corpus, PAN13 and PAN14 and Enron Email Dataset). Results:

	Blogs	PAN13	PAN14	Enron
CNN	65%	55%	69%	57%
VDCNN	66%	74%	67%	64%
GRU	62%	60%	63%	62%

Overall, the VDCNN model is the most accurate, but the GRU model displays more consistent results.

Installation

A working Python 3 installation is assumed. Install the required packages using:

pip install -r requirements.txt

Note that requirements.txt references the tensorflow-gpu package. It is recommended to use a GPU to train the models. If no GPU is used, install the tensorflow package instead.

Usage

Download the training data using:

./download.sh

Run the preprocessing steps using:

./process.sh

Now, you can train a model using:

./train.py -a vdcnn -d blogs pan13_tr_en

Use train.py -h for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
blogs.py		blogs.py
common.py		common.py
download.sh		download.sh
enron.py		enron.py
pan13.py		pan13.py
pan14.py		pan14.py
process.sh		process.sh
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Character-level Text Classification

Installation

Usage

About

Releases

Packages

Languages

nopperl/text-char-dnn

Folders and files

Latest commit

History

Repository files navigation

Character-level Text Classification

Installation

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages