librAIry NLP toolkit

nlp service provides an efficient and easy way to analyze large amounts of texts through standard HTTP and TCP APIs.

Features

Built on top of several NLP open-source tools it offers:

Part-of-Speech Tagger (and filter)
Lemmatizer
N-Grams Identifier
Wikipedia Relations
Wordnet Synsets

And all this by means of json-based queries via HTTP or TCP request deployed with Docker containers.

Quick Start

Demo

An online service is available at: http://librairy.linkeddata.es/nlp

This is oriented to small load tests. If you need more resources it is recommended to start an instance on your servers.

Run locally

Install Docker

Run the service by:

$ docker run --rm -p 7777:7777 -e "REST_PATH=/nlp" librairy/nlp:latest

Once started, the service should be available at: http://localhost:7777/nlp

Run in distributed mode

Create a Swarm and configure as services as you need.

Configuration

It can be tuned using the following environment variables when defining the Docker container:

Variable	Description
`REST_PATH`	Service Namespace. (`/` by default)
`REST_PORT`	HTTP listening port (`7777` by default)
`NLP_AVRO_PORT`	TCP listening port (`65111` by default)
`SPOTLIGHT_ENDPOINT`	DBpedia Spotlight url (required to discover Wikipedia references)
`SPOTLIGHT_THRESHOLD`	Confidence (required to discover Wikipedia references)

Services

All services can include lemmatizer actions, part-of-speech tagging and even n-grams identifications.

Given a text, it:

/annotations : adds grammatical info.
/groups: creates a bag-of-words.
/tokens: filters words from the request.

DBpedia Spotlight

The service uses DBpedia Spotlight to identify Wikipedia resources referenced in the text.

When you enable the option references in the requests to the service, it is required to have made the deployment of the corresponding module:

Install Docker-Compose
Create and move into a directory named nlp/

Create a file named docker-compose.yml with the following content:

version: '2'
services:
  dbpedia-en-spotlight:
    image: dbpedia/spotlight-english:latest
    command: java -Dfile.encoding=UTF-8 -Xmx15G -Dthreads.max=15 -Dthreads.core=15 -jar /opt/spotlight/dbpedia-spotlight-nightly-build.jar /opt/spotlight/en  http://0.0.0.0:80/rest
    restart: always
  nlp:
    image: librairy/nlp:latest
    restart: always
    ports:
     - "7777:7777"
    environment:
      - REST_PATH=/nlp
      - JAVA_OPTS=-Xmx32768m

Make sure you have added the DBpedia Spotlight modules for the languages you are going to use.
Check that you have the necessary resources (e.g. memory, cpu, disk...), as these modules are very demanding.
Run the service by:
```
$ docker-compose up
```

Check that the following traces appear (depending on the environment it may take a few minutes)

nlp_1 | [main] INFO  e.u.o.l.n.Application - Started Application in 16.847 seconds (JVM running for 18.964)
...
dbpedia-en-spotlight_1  | Server started in / listening on http://0.0.0.0:80/rest

Once started, the service should be available at: http://localhost:7777/nlp

The above commands run two services: DBpedia Spotlight and librAIry NLP, and uses the settings specified within docker-compose.yml.
The DBpedia service has a lazy start. This means that first requests will be slower until all resources are initialized.

Reference

You can use the following to cite the service:

@inproceedings{Badenes-Olmedo:2017:DTM:3103010.3121040,
 author = {Badenes-Olmedo, Carlos and Redondo-Garcia, Jos{\'e} Luis and Corcho, Oscar},
 title = {Distributing Text Mining Tasks with librAIry},
 booktitle = {Proceedings of the 2017 ACM Symposium on Document Engineering},
 series = {DocEng '17},
 year = {2017},
 isbn = {978-1-4503-4689-4},
 pages = {63--66},
 numpages = {4},
 url = {http://doi.acm.org/10.1145/3103010.3121040},
 doi = {10.1145/3103010.3121040},
 acmid = {3121040},
 publisher = {ACM},
 keywords = {data integration, large-scale text analysis, nlp, scholarly data, text mining},
}

Contact

This repository is maintained by Carlos Badenes-Olmedo. Please send me an e-mail or open a GitHub issue if you have questions.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

librAIry NLP toolkit

Features

Quick Start

Demo

Run locally

Run in distributed mode

Configuration

Services

DBpedia Spotlight

Reference

Contact

About

Releases

Packages

Contributors 2

Languages

License

librairy/nlp

Folders and files

Latest commit

History

Repository files navigation

librAIry NLP toolkit

Features

Quick Start

Demo

Run locally

Run in distributed mode

Configuration

Services

DBpedia Spotlight

Reference

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages