Skip to content

Latest commit

 

History

History
33 lines (30 loc) · 1.23 KB

TODO.md

File metadata and controls

33 lines (30 loc) · 1.23 KB

General

  • Change all os.Getenv to os.LookupEnv
  • Implement graph
  • Improve env variables loading
  • Use logger object
  • Log each service to a different file
  • Implement service
  • Periodically check proxy health
  • Add configuration structs for services

Concurrency

  • Fix msgChan and errChan sizes in order to prevent deadlock
  • Improve goroutines tracking with waitgroups because right now it's a mess

Storage

  • Make PageStorage an interface
  • Refactor ElasticPageStorage
  • Save all data on SIGINT
  • Make ElasticPageStorage concurrent
  • Make MongoJobsStorage concurrent
  • Store responses headers
  • Save pages in case of error
  • Save timed out links and the number of times it timed out, use it to revisit pages
  • Open connections only when needed
  • Organize data by domain

Collectors

  • Make another collector for URLs added from the webserver, in order to be able to crawl clearnet and subreddits
  • Merge getCollector function and use a flag to get an onion one or a normal one
  • Make some periodic collectors for places where links gets published