-
Notifications
You must be signed in to change notification settings - Fork 20
Automated zoom in and zoom out for root cause analysis (demo with BGP)
The monitoring playbook bgp-monitoring-with-automatic-zoom uses the device rule check-bgp-state-with-automatic-zoom
The devices rule check-bgp-state-with-automatic-zoom collects BGP details, store data in the database, and monitor sessions state. This rule doesnt run advanced tests (no cross devices correlation for root cause analysis, just BGP sessions state monitoring).
If a BGP session state moves to a non established state, the device rule check-bgp-state-with-automatic-zoom uses the python script bgp_zoom_in.py to automatically instanciate a BGP troubleshooting playbook.
These rules do not collect data from devices. They process the data stored in the database, with a cross devices correlation. These rules help to understand the root cause of BGP issues:
- The network rule troubleshooting-peer-type queries the database updated by the rule check-bgp-state-with-automatic-zoom, and compares the BGP peer type configured on 2 BGP peers (same peer type should be used)
- The network rule troubleshooting-as queries the database updated by the rule check-bgp-state-with-automatic-zoom, and compares the local-as configured on a router with the peer-as configured on one of his BGP peer (same AS should be used)
The troubleshooting playbook bgp-zoom uses the network rule troubleshooting-as and the network rule troubleshooting-peer-type
- Instanciate the BGP monitoring playbook bgp-monitoring-with-automatic-zoom.
- Do not instanciate the BGP troubleshooting playbook bgp-zoom
- If a BGP session state moves to a non established state, the devices rule check-bgp-state-with-automatic-zoom will automatically instanciate a BGP troubleshooting playbook bgp-zoom
Instanciate the BGP monitoring playbook bgp-monitoring-with-automatic-zoom.
All BGP sessions are established. Healthbot GUI shows all devices are in a good state. Also there is no network group configured.
Let's break a BGP session. Let's connect on the vMX1 and apply a bad configuration change in order to break the BGP session session between vMX1 and vMX4
jcluser@vMX-addr-0# show | compare
[edit protocols bgp group underlay neighbor 192.168.1.1]
- peer-as 104;
+ peer-as 200;
[edit]
jcluser@vMX-addr-0# commit and-quit
commit complete
Exiting configuration mode
jcluser@vMX-addr-0> show bgp summary
Groups: 1 Peers: 4 Down peers: 1
Table Tot Paths Act Paths Suppressed History Damp State Pending
inet.0
33 12 0 0 0 0
Peer AS InPkt OutPkt OutQ Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
192.168.1.1 200 0 0 0 0 8 Active
192.168.1.3 105 423 426 0 0 3:08:56 4/11/11/0 0/0/0/0
192.168.1.5 106 423 428 0 0 3:08:53 4/11/11/0 0/0/0/0
192.168.1.7 107 423 428 0 0 3:08:51 4/11/11/0 0/0/0/0
jcluser@vMX-addr-0>
The monitoring rule shows the issue:
The monitoring rule uses UDA
(user defined action) to automatically instantiated the troubleshooting playbook.
Healthbot shows the root cause of the issue (AS configuration mismatch between vMX4 local-as and vMX1 remote-as).