Automated zoom in and zoom out for root cause analysis (demo with BGP)

monitoring rule/playbook

The monitoring playbook bgp-monitoring-with-automatic-zoom uses the device rule check-bgp-state-with-automatic-zoom

The devices rule check-bgp-state-with-automatic-zoom collects BGP details, store data in the database, and monitor sessions state. This rule doesnt run advanced tests (no cross devices correlation for root cause analysis, just BGP sessions state monitoring).
If a BGP session state moves to a non established state, the device rule check-bgp-state-with-automatic-zoom uses the python script bgp_zoom_in.py to automatically instanciate a BGP troubleshooting playbook.

troubleshooting rules/playbook

These rules do not collect data from devices. They process the data stored in the database, with a cross devices correlation. These rules help to understand the root cause of BGP issues:

The network rule troubleshooting-peer-type queries the database updated by the rule check-bgp-state-with-automatic-zoom, and compares the BGP peer type configured on 2 BGP peers (same peer type should be used)
The network rule troubleshooting-as queries the database updated by the rule check-bgp-state-with-automatic-zoom, and compares the local-as configured on a router with the peer-as configured on one of his BGP peer (same AS should be used)

The troubleshooting playbook bgp-zoom uses the network rule troubleshooting-as and the network rule troubleshooting-peer-type

Workflow overview

Instanciate the BGP monitoring playbook bgp-monitoring-with-automatic-zoom.
Do not instanciate the BGP troubleshooting playbook bgp-zoom
If a BGP session state moves to a non established state, the devices rule check-bgp-state-with-automatic-zoom will automatically instanciate a BGP troubleshooting playbook bgp-zoom

Automated zoom in demo

Instanciate the BGP monitoring playbook bgp-monitoring-with-automatic-zoom.

All BGP sessions are established. Healthbot GUI shows all devices are in a good state. Also there is no network group configured.

Let's break a BGP session. Let's connect on the vMX1 and apply a bad configuration change in order to break the BGP session session between vMX1 and vMX4

jcluser@vMX-addr-0# show | compare
[edit protocols bgp group underlay neighbor 192.168.1.1]
-     peer-as 104;
+     peer-as 200;

[edit]
jcluser@vMX-addr-0# commit and-quit
commit complete
Exiting configuration mode

jcluser@vMX-addr-0> show bgp summary
Groups: 1 Peers: 4 Down peers: 1
Table          Tot Paths  Act Paths Suppressed    History Damp State    Pending
inet.0
                      33         12          0          0          0          0
Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
192.168.1.1             200          0          0       0       0           8 Active
192.168.1.3             105        423        426       0       0     3:08:56 4/11/11/0            0/0/0/0
192.168.1.5             106        423        428       0       0     3:08:53 4/11/11/0            0/0/0/0
192.168.1.7             107        423        428       0       0     3:08:51 4/11/11/0            0/0/0/0

jcluser@vMX-addr-0>

The monitoring rule shows the issue:

The monitoring rule uses UDA (user defined action) to automatically instantiated the troubleshooting playbook.
Healthbot shows the root cause of the issue (AS configuration mismatch between vMX4 local-as and vMX1 remote-as).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated zoom in and zoom out for root cause analysis (demo with BGP)

monitoring rule/playbook

troubleshooting rules/playbook

Workflow overview

Automated zoom in demo

Clone this wiki locally