Skip to content

ElasticSearch package for Cross-lingual Information Retrieval

Notifications You must be signed in to change notification settings

Sam3448/elastic4clir

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

elastic4clir

An ElasticSearch package for Cross-Lingual Information Retrieval

The goal of elastic4clir is to provide a flexible framework for running cross-lingual information retrieval (CLIR) experiments. It implements various retrieval techniques and benchmarks while using ElasticSearch/Lucene as the backend index and search components.

Installation

It's assumed that you have Anaconda installed. If not, download and setup your conda path:

wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh
bash Miniconda2-latest-Linux-x86_64.sh
export PATH=$HOME/miniconda2/bin:$PATH

Now, clone this repo, and create a conda environment that contains the dependencies for this package:

cd ~/your-working-directory
git clone (this repo)
cd elastic4clir
conda env create -f conda-clir-env.yml

Also, you need to get ElasticSearch separately. This version of elastic4clir is based on elasticsearch-5.6.3. Various version can be downloaded from: https://www.elastic.co/downloads/past-releases

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.3.tar.gz
gunzip elasticsearch-5.6.3.tar
tar -xvf elasticsearch-5.6.3.tar

That's it! To start the ElasticSearch server as a daemon,

./elasticsearch-5.6.3/bin/elasticsearch -d

To stop the ElasticSearch server,

jps | grep Elasticsearch

Then kill the process with the PID. Note that the indices persist in the data directory of elasticsearch. To clean-up, you can just delete the directory.

Under this directory:

  • index_research: all scripts for indexing and searching. Right now two datasets are included: TREC and CACM. And now we have Wiki_sw dataset ready for indexing and searching.

  • evaluation: all evaluation scripts.

  • web: for future development. (ignore it for now, just for test and fun)

Runing an experiment (index, query, then evaluate)

About

ElasticSearch package for Cross-lingual Information Retrieval

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 47.1%
  • Jupyter Notebook 39.9%
  • Python 12.0%
  • Other 1.0%