AraNLP library is a Java-based toolkit for the processing of Arabic text. It supports the most important preprocessing steps, such as diacritic and punctuation removal, tokenization, sentence segmentation, part-of-speech tagging, root stemming, light stemming, and word segmentation. These tools are usually required to prepare the text for more advanced NLP tasks.
The goal of AraNLP is to gather most of the vital Arabic text preprocessing tools into one library that can be accessed easily. Therefore, We incorporated missing tools and included existing algorithmic resources.
AraNLP has already been used in many experiments to prepare the Arabic text and it successfully preprocessed the corpus.
Available in http://www.lrec-conf.org/proceedings/lrec2014/pdf/621_Paper.pdf.
Please cite our paper in any published work using this resource:
@inproceedings{Althobaiti14AraNLP,
title={{AraNLP: a Java-Based Library for the Processing of Arabic Text}},
author={M. Althobaiti and U. Kruschwitz and M. Poesio},
booktitle={Proceedings of the 9th Language Resources and Evaluation Conference (LREC)},
year={2014},
address = {Reykjavik}
}