Skip to content

Commit

Permalink
New option to trigger alert on failed restarts
Browse files Browse the repository at this point in the history
Current versions of puppet treat resources that failed to restart as different
kind of failures and they are not necessarily reported, there is puppet bug
PUP-2280 that explains it in more detail. Until it is fixed in version 5 of
puppet it would be useful to see such failures in monitoring, so I added option
to report this failure. Option is by default off to not change usual behavior.
  • Loading branch information
antonidabek authored and majormoses committed May 31, 2017
1 parent 0c98425 commit 7a433dd
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 1 deletion.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ This project adheres to [Semantic Versioning](http://semver.org/).
This CHANGELOG follows the format listed at [Keep A Changelog](http://keepachangelog.com/)

## [Unreleased]
### Added
- `check-puppet-last-run.rb`: Added option for reporting failed restarts (@antonidabek)

## [1.1.0] - 2017-01-30
### Added
Expand Down
18 changes: 17 additions & 1 deletion bin/check-puppet-last-run.rb
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,13 @@ class PuppetLastRun < Sensu::Plugin::Check::CLI
default: '/opt/puppetlabs/puppet/cache/state/agent_disabled.lock',
description: 'Path to agent disabled lock file'

option :report_restart_failures,
short: '-r',
long: '--report-restart-failures',
boolean: true,
default: false,
description: 'Raise alerts if restart failures have happened'

def run
unless File.exist?(config[:summary_file])
unknown "File #{config[:summary_file]} not found"
Expand All @@ -78,6 +85,11 @@ def run
else
critical "#{config[:summary_file]} is missing information about the events"
end
@restart_failures = if config[:report_restart_failures] && summary['resources']
summary['resources']['failed_to_restart'].to_i
else
0
end
rescue
unknown "Could not process #{config[:summary_file]}"
end
Expand All @@ -95,7 +107,11 @@ def run
@message += " with #{@failures} failures"
end

if @now - @last_run > config[:crit_age] || @failures > 0
if @restart_failures > 0
@message += " with #{@restart_failures} restart failures"
end

if @now - @last_run > config[:crit_age] || @failures > 0 || @restart_failures > 0
critical @message
elsif @now - @last_run > config[:warn_age]
warning @message
Expand Down

0 comments on commit 7a433dd

Please sign in to comment.