Skip to content
This repository has been archived by the owner on Oct 11, 2020. It is now read-only.

Healthbot rules in this repo

Khelil Sator edited this page Mar 12, 2019 · 13 revisions

Here is a short description of the Healthbot rules available in this repository

Cross devices correlation

The rule check-bgp-state-using-openconfig collects BGP details using Openconfig and monitor sessions state.

The rule compare-peer-type queries the database updated by the rule check-bgp-state-using-openconfig, and compares the BGP peer type configured on 2 BGP peers (same peer type should be used)

The rule compare-as queries the database updated by the rule check-bgp-state-using-openconfig, and compares the local-as configured on a router with the peer-as configured on one of his BGP peer (same AS should be used)

Closed loop automation

The rule enforce-interfaces-state monitors interfaces status using openconfig.
In case one of the interfaces monitored by this rule is disabled, Healthbot re-enabled it

Cross devices correlation and closed loop automation

The rule collect-interfaces-mtu uses openconfig to collect interfaces mtu and update the database with this.

The rule compare-interfaces-mtu queries the database updated by the rule collect-interfaces-mtu, compares the mtu of 2 connected interfaces, and in case of mtu mismacth, it changes the junos configuration to enforce the same mtu across the 2 connected interfaces

Automated zoom in for root cause analysis

The devices rule check-bgp-state-with-automatic-zoom collects BGP details, store data in the database, and monitor sessions state. This rule doesnt run advanced tests (no cross devices correlation for root cause analysis, just BGP sessions state monitoring). If a BGP session state moves to a non established state, this rule automatically instanciates a playbook with BGP troubleshooting rules.

These BGP troubleshooting rules do not collect data from devices. They process the data stored in the database, with a cross devices correlation. These rules help to understand the root cause of BGP issues:

Machine learning

The rule check-bgp-routes-with-3-sigma uses Openconfig to monitor the number of bgp prefixes received, and uses 3 sigma machine learning algorithm to indicate if the current value is normal or not.

The rule check-bgp-routes-with-k-means uses Openconfig to monitor the number of bgp prefixes received and uses k-means machine learning algorithm to indicate if the current value is normal or not.

Physical topology validation

The rule collect-lldp uses Netconf and collects LLDP details and updates the database

The rule check-lldp validates the physical topology. This rule doesnt collect data from devices. This rule compares the desired/expected LLDP details against LLDP details collected by the rule collect-lldp

Simple rules

These simple rules:

  • collect data from the network devices (using snmp, netconf, openconfig telemetry)
  • store data in the database
  • query continously the healthbot database to compare the actual state (i.e last datapoint in the database) against the desired state (described in the healtbot rule)
  • indicate the result in healthbot GUI and database

As example: to continously checking if bgp sessions are established, ...

Product name and software version analyzer

The rule check-system-info uses netconf to collect devices details and indicates if the product name and Junos version are the ones we expected or not

The rule check-hw-sw uses netconf collect devices details and indicates if the product name and Junos version are are the ones we expected or not

So the two rules aboves achieve the same thing, using differents netconf rpc.

OSPF

The rule check-ospf-state-using-netconf monitors ospf state using netconf

The rule check-ospf-state-using-snmp monitors ospf state using snmp

Interfaces

The rule check-interfaces-status-using-snmp monitors interfaces status using snmp

The rule check-interfaces-status-using-openconfig monitors interfaces status using openconfig

The rule check-interfaces-description uses openconfig to collect interfaces description and indicates which interfaces has no description configured

BGP

The rule check-bgp-state-using-netconf monitors bgp state using netconf

The rule check-bgp-state-using-snmp monitors bgp state using snmp

The rule check-bgp-state-using-openconfig monitors bgp state using openconfig with recent Junos releases

The rule check-bgp-state-using-openconfig-bgp-path monitors bgp state using openconfig with old Junos releases

The rule check-bgp-routes monitors the number of bgp prefixes using Openconfig

Rules checking if a value increased compared to a previous value

The rule bgp-flap-detection-using-netconf uses Netconf and monitors if BGP flaps increased

The rule monitor-interfaces-errors-using-openconfig uses Openconfig and monitors if errors increased

Load these files to Healthbot

The directory rules has rules
The directory playbooks has playbooks
The directory tables_and_views has tables and views
The directory functions has functions

Load these files (rules, playbooks, functions, tables and views) to healthbot using healthbot API with python:

requirements

$ pip install requests
$ pip install pyyaml

update this file with healthbot ip address and credentials

$ vi add_files_to_healthbot.py

Run this command to load these tables and views, functions, rules and playbooks to your healthbot

$ python add_files_to_healthbot.py
Clone this wiki locally