This repository contains all the required files to personalise the spotlight to be used within OpeNER
** Requirements: **
Java 1.7
Scala 2.9+
Maven 3
git clone git@github.com:ialdabe/IXA-EHU-DBpedia-spotlight.git
OR
git clone git://github.com/ialdabe/IXA-EHU-DBpedia-spotlight.git
The command will create a directory called IXA-EHU-DBpedia-spotlight. This repository contains all the necessary information to create a modified version of the dbpedia spotlight. The modified version is focused on the disambiguation of given entities.
sh install.bash
The script install.sh obtains the latest version of the dbpedia spotlight and it modifies some of the files to run "our" version of the spotlight. It is possible to do it step by step in order to manually control each step of the process.
Steps:
git clone https://github.com/dbpedia-spotlight/dbpedia-spotlight.git
The latest version of the dbpedia-spotlight is obtained and it is stored in the "dbpedia-spotlight" directory
Copy from IXA-EHU-DBpedia-spotlight to dbpedia-spotlight the following files:
- pom.xml
- core/pom.xml
- conf/server_en.properties
- conf/server_es.properties
server_en.properties and server_es.properties files contain the necessary information to run two different services: one to work with English and another one to work with Spanish
The two pom.xml files are modified versions of the original ones to adapt the depedencies to our needs.
cd dbpedia-spotlight
mvn clean install
cd dbpedia-spotlight/dist
mvn clean package
This command creates (among others)
dbpedia-spotlight-0.6-jar-with-dependencies.jar
It is a jar containing all the necessary classes to run the dbpedia-spotlight with all the required dependencies. This jar is obtained with two purposes: a) to run the server, and b) to be used by other programs
mkdir data
wget 'https://siuc05.si.ehu.es/~ragerri/index-spotlight/index-en.tgz'
wget 'https://siuc05.si.ehu.es/~ragerri/index-spotlight/index-es.tgz'
tar xvzf index-en.tgz
tar xvzf index-es.tgz
Find the pos-en-general-brown.HiddenMarkovModel
Although this file is not used, it is necessary to set its locations in the server.properties files to the correct working of the spotlight.
find . -name "*HiddenMarkovModel"
Please, change the value manually.
The properties files contain the default value:
org.dbpedia.spotlight.tagging.hmm = pos-en-general-brown.HiddenMarkovModel
It is necessary to change the value for the one obtained by the command:
find . -name "pos-en-general-brown.HiddenMarkovModel"
Before runing the servers, verify that the dbpedia-spotlight directory contains:
data/index-en directory
data/index-es directory
the correct location of the pos-en-general-brown.HiddenMarkovModel model
dist/target/dbpedia-spotlight-0.6-jar-with-dependencies.jar
If something is missing, go step by step in the install.sh script.
Once everything is correct, go to the conf directory.
-
Run the server to disambiguate English entities (go to the conf directory)
java -jar ../dist/target/dbpedia-spotlight-0.6-jar-with-dependencies.jar server_en.properties
-
Run the server to disambiguate Spanish entities (go to the conf directory)
java -jar ../dist/target/dbpedia-spotlight-0.6-jar-with-dependencies.jar server_es.properties
The English version works on: http://localhost:2020/rest
The Spanish version works on: http://localhost:2222/rest
curl http://localhost:2020/rest/disambiguate?spotter=SpotXmlParser +
"&confidence=" + CONFIDENCE
+ "&support=" + SUPPORT
+ "&text" + text
The system requires the following type of input to disambiguate the given entities:
<annotation text="Brazilian oil giant Petrobras and U.S. oilfield service company Halliburton have signed a technological cooperation agreement, Petrobras announced Monday. The two companies agreed on three projects: studies on contamination of fluids in oil wells, laboratory simulation of well production, and research on solidification of salt and carbon dioxide formations, said Petrobras. Twelve other projects are still under negotiation.">
<surfaceForm name="oil" offset="10"/>
<surfaceForm name="company" offset="56"/>
<surfaceForm name="Halliburton" offset="64"/>
<surfaceForm name="oil" offset="237"/>
<surfaceForm name="other" offset="383"/>
</annotation>