Skip to content

librairy/harvester

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Harvester

Codacy Badge Release Status

Collect and process unstructured files to retrieve the full-text content and derived tokens from them.

Get Started!

A prerequisite to consider is to have installed Docker-Compose in your system.

You can run this service in a isolated way (see Distibuted Deployment section) or as extension of the api. In that case, add the following services to the existing docker-compose.yml file:

ftp:
  container_name: ftp
  image: librairy/ftp:1.0
  ports:
    - "5051:21"
  volumes:
    - ./data:/home/ftpusers/librairy
harvester:
  container_name: harvester
  image: librairy/harvester
  volumes:
    - ./data:/librairy/files/custom
  links:
      - column-db
      - document-db
      - graph-db
      - event-bus

and then, deploy it by typing:

$ docker-compose up

That's all!! harvester should be run in your system now along with librairy.

Distributed Deployment

Instead of deploy all containers as a whole, you can deploy each of them independently. It is useful to run the service in a distributed way deployed in several host-machines.

  • FTP Server:

    $ docker run -it --rm --name ftp -p 5051:21 -v ./ftp:/librairy/files/custom librairy/ftp:1.0
  • Harvester:

    $ docker run -it --rm --name harvester -v ./documents:/librairy/files librairy/harvester

Remember that by using the flags: -it --rm, the services runs in foreground mode. Instead, you can deploy it in background mode as a domain service by using: -d --restart=always