Skip to content

a machine learning approach for processing mathematical language in scientific documents

Notifications You must be signed in to change notification settings

TU-Berlin/project-mlp

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository is retired. New developments happen in the mathosphere.

Mathematical Language Processing

Build Status

Run

  • compile the maven project
  • adapt the paths to your stratosphere environment in the file cluster-run.sh
  • setup the right values for the parameters of the ranking algorithm also in cluster-run.sh
  • execute the script

Notice

To start the processor, an additional model file is needed. Download the Stanford POS tagger from http://nlp.stanford.edu/software/tagger.shtml. Within this archive is a directory called pos-tagger-models/, containing a variaty of model files for a couple of languages.

If uncertain, the english-left3words-distsim.tagger model is a good starting point.

Tested with http://nlp.stanford.edu/software/stanford-postagger-2012-11-11.zip ... the most recent version http://nlp.stanford.edu/software/stanford-postagger-2014-01-04.zip is currently beeing tested.

Log

To trace was was done on the MLP server install stratosphere via debian package physikerwelt@mlp:~/stanford-postagger-2014-01-04/models$ cp english-left3words-distsim.tagger ~

About

a machine learning approach for processing mathematical language in scientific documents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 44.7%
  • Java 36.6%
  • XSLT 18.0%
  • Shell 0.7%