Skip to content
matteoredaelli edited this page Aug 28, 2010 · 25 revisions

Architecture

Features

Database

  • the urls are saved to a NOSQL database (apache couchdb or riak) that support map/reduce queries
  • external/internal url referrals can be saved

Crawlers

  • many crawlers can be run concurrently, also remotely
  • urls to be analysed are divided in several queues (depending on their depth/priority)
  • you can run any custom method over the body of visited urls
  • urls/domains can be filtered using regular expressions
  • urls can be normalized/rewrited using many options (max_depth, remove_queries, … )
  • many other options: see files ebot.app file and ebot_local.config

Statistics

ebot statistics are saved to Round Robin Databases (using rrdtool)

Web Services

  • web REST interface sfor
  • managing start/stop of crawlers
  • submitting urls to crawlers (sync or async)
  • showing ebot statistics
Clone this wiki locally