Skip to content

rodrigodelmonte/site_checker

Repository files navigation

site_checker

Python Version Code Style Black Checked with mypy

Overview

site_checker monitors website availability over the network, produces metrics about this and passes these events through Kafka instance into PostgreSQL database. site_checker is divided into two components:

  • producer: collect website metrics and publish results to Kafka
  • consumer: consume metrics from Kafka topics and save metrics into a PostgreSQL database.

architecture diagram

Demo

asciicast

Getting Started

This section will guide you through the steps to reproduce locally the demo presented above.

This tutorial expects you have already a Kafka and Postgres instance running. For more details please check, How to set up managed Apache Kafka and How to deploy an open source database.

Last request before start 🙏, On the Aiven Kafka dashboard, please go to Overview tab to the Advanced configuration section and enable the kafka.auto_create_topics_enable parameter which will allow you to produce messages to Kafka without needing to create a topic beforehand.

  1. Checkout this repository:
$ git clone git@github.com:aiven-recruitment/site_checker.git
$ cd site_checker
  1. Setup credentials
$ vim example/example.consumer.env # replace <CHANGEME> entries
$ vim example/example.producer.env # replace <CHANGEME> entries
# Copy the Kafka and Postgres SSL certificates to example/ folder
$  cp ~/Downloads/ca.pem example/
$  cp ~/Downloads/service.* example/
  1. Start docker-compose
$ make run
  1. Check application logs.
$ make logs
  1. Check the data saved into Postgres.
$ psql -U <USER> -h <HOSTNAME> -p <PORT> defaultdb
$ select * from site_checker.apache;
  1. Clean
$ make clean

Production deployment

Configuration approaches

  • config.ini, can check multiple websites, for more details please check the [example/example.
  • docker .env or CLI parameters, checks a single website, for more details please check the example/example.producer.env file.

Scalability considerations

  • The config.ini configuration approach will create one thread per website. In case you want to run site_checker in a single host to check multiple websites the performance will be limited by the host resources, too many websites or threads can cause too many switch context operations leading to performance impacts. config.ini](example/example.config.ini) file. site_checker_config_ini diagram

  • The CLI parameters configuration approach also used in the Demo, will create a single python process. In case you want to monitor more than one website using this approach, you can build a container using the Dockerfile definition as an starting point, and launch it in your container orchestrator system, e.g: Kubernetes, AWS ECS or Mesos. site_checker_cli_parameters diagram

Security considerations

The site_checker consumer will create automatically one table per topic, for example, topic name apache creates the table name apache under the schema site_checker into the Postgres instance. The Demo and Getting Started are using admin credentials to keep the steps simpler. However, this approach is not suitable for production workloads.

For production it is recommended to create an application user limiting the usage only to the site_checker schema. Example:

CREATE USER site_checker WITH PASSWORD '<STRONGPASSWORD>';
GRANT USAGE ON SCHEMA site_checker TO site_checker;
GRANT CREATE ON SCHEMA site_checker TO site_checker;
GRANT INSERT ON ALL TABLES IN SCHEMA site_checker TO site_checker;

Contribute

For more detail, please check CONTRIBUTING.md guide.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published