Skip to content

Python crawler with an HTML dashboard. It checks statuses of URLs for specific rules

Notifications You must be signed in to change notification settings

DmitryKey/url-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

System description

This crawler system allows for configurable periodic check of URLs for two requirements:

  1. URL must respond with status code 200
  2. URL must match configured rules

The crawler is implemented as a dashboard with the UI. Backend is Python 3.6 and frontend is html + ajax.

The rules and intervals for periodic check are configured on URL level in the file conf/urls_rules.tab. It is TAB separated file with the following fields:

URL check interval in milliseconds string / regual expression to look for

There could be multiple rules for a given URL: separate them with TAB

At each check log file is appended, it is located in logs/status_checks.log

The logic is implemented using dashboard approach with Flask.

How to start

In order to start the system, navigate to the terminal under the root of the crawler project:

  1. export FLASK_APP=src/crawler.py
  2. python -m flask run
  3. open http://127.0.0.1:5000/check in browser

You should see a dashboard like this: Crawler Dashboard

The dashboard will auto-recheck the URLs at the configured intervals (URL specific).

About

Python crawler with an HTML dashboard. It checks statuses of URLs for specific rules

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published