Provides a simple Ruby API for parsing citations from plain text strings or HTML.
require 'excite'
Excite.parse_string("Wilcox, Rhonda V. 1991. Shifting roles and synthetic women in Star trek: The next generation. Studies in Popular Culture 13 (June): 53-65.")
Excite.parse_html("<span>Devine, PG, & Sherman, SJ</span><span>(1992)</span><strong>Intuitive versus rational judgment and the role of stereotyping in the human condition: Kirk or Spock?</strong><em>Psychological Inquiry</em><span>3(2), 153-159</span>")
Derived from FreeCite, minus Rails and all UI elements. The most up-to-date fork of FreeCite of which I am aware is rsinger's. FreeCite in turn is inspired by ParsCit.
The main changes are:
- No UI, just a gem;
- New model for parsing HTML;
- Tokenization and part-of-speech features from EngTagger.
Credit is due to the authors of all the linked projects, as well as Laura Durkay who marked up the HTML training data.
wget http://crfpp.googlecode.com/files/CRF%2B%2B-0.57.tar.gz
tar xvzf CRF++-0.57.tar.gz
cd CRF++-0.57
./configure
make
sudo make install
sudo apt-add-repository 'deb http://cl.naist.jp/~eric-n/ubuntu-nlp oneiric all'
sudo apt-get update
sudo apt-get install libcrf++
brew install crf++