Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disk-usage: Verbosity parameter which includes only FS in an error state. #782

Open
drapiti opened this issue Oct 7, 2024 · 2 comments
Open
Labels
enhancement New feature or request
Milestone

Comments

@drapiti
Copy link

drapiti commented Oct 7, 2024

Describe the solution you'd like

Please allow the disk-usage plugin a verbosity parameter which when set to 0 for example only outputs FS which are effectively in an error state. (warning or critical). At them momento all FS which are checked are included in the output with no possibility to filter them.
This is a standard feature which is normally present in most checks. We have systems with hundreds of FS and when we publish an alert on to an event console for example this creates issues with the output length.

Additional context

No response

@drapiti drapiti added the enhancement New feature or request label Oct 7, 2024
@markuslf
Copy link
Member

markuslf commented Oct 7, 2024

In other checks we use the --lengthy switch to explicitly show more information. In this check we could introduce the opposite, a --brief switch, as for most systems a full output makes sense.

@markuslf markuslf changed the title [disk-usage] Verbosity parameter which includes only FS in an error state. disk-usage: Verbosity parameter which includes only FS in an error state. Oct 7, 2024
@markuslf markuslf added this to the M006 milestone Oct 7, 2024
@drapiti
Copy link
Author

drapiti commented Dec 6, 2024

Hi providing improved code to add --brief functionality and the graceful handling of disk accessibility disk-usage: handle disk accessibility #792:

#!/usr/bin/env python3
# -*- coding: utf-8; py-indent-offset: 4 -*-

import argparse
import re
import sys

import lib.args
import lib.base
import lib.human
from lib.globals import (STATE_CRIT, STATE_OK, STATE_UNKNOWN, STATE_WARN)

try:
    import psutil
except ImportError:
    lib.base.cu('Python module "psutil" is not installed.')

__author__ = 'Linuxfabrik GmbH, Zurich/Switzerland'
__version__ = '2024120404'

DESCRIPTION = 'Checks the used disk space, for each partition with accessibility check.'

DEFAULT_WARN = '90%USED'
DEFAULT_CRIT = '95%USED'

def parse_args():
    """Parse command line arguments using argparse."""
    parser = argparse.ArgumentParser(description=DESCRIPTION)

    parser.add_argument(
        '-V', '--version',
        action='version',
        version='{0}: v{1} by {2}'.format('%(prog)s', __version__, __author__)
    )

    parser.add_argument(
        '--always-ok',
        help='Always returns OK.',
        dest='ALWAYS_OK',
        action='store_true',
        default=False,
    )

    parser.add_argument(
        '--brief',
        help='Only print disks/mount points above warning or critical thresholds.',
        dest='BRIEF',
        action='store_true',
        default=False,
    )

    parser.add_argument(
        '-c', '--critical',
        help='Critical threshold, '
             'of the form "<number>[unit][method]", where unit is '
             'one of `%%|K|M|G|T|P` and method is one of `USED|FREE`. If "unit" is omitted, '
             '`%%` is assumed. `K` means `kibibyte` etc. If "method" is omitted, '
             '`USED` is assumed. `USED` means "number ore more", `FREE` means "number or less". '
             'Examples: '
             '`95` = alert at 95%% usage or more. '
             '`9.5M` = alert when 9.5 MiB or more is used. '
             'Other self-explanatory examples are '
             '`95%%USED`, `5%%FREE`, `9.5GFREE`, `1400GUSED`. '
             'Default: %(default)s',
        dest='CRIT',
        type=lib.args.number_unit_method,
        default=DEFAULT_CRIT,
    )

    parser.add_argument(
        '--exclude-pattern',
        help='Any line matching this pattern (case-insensitive) will count as a exclude. '
             'The mountpoint is excluded if it contains the specified value. '
             'Example: "boot" excludes "/boot" as well as "/boot/efi". '
             'Can be specified multiple times. '
             'On Windows, use drive letters without backslash ("Y:" or "Y"). '
             'Includes are matched before excludes.',
        dest='EXCLUDE_PATTERN',
        action='append',
        default=[],
    )

    parser.add_argument(
        '--exclude-regex',
        help='Any line matching this python regex (case-insensitive) will count as a exclude. '
             'Can be specified multiple times. '
             'On Windows, use drive letters without backslash ("Y:" or "Y"). '
             'Includes are matched before excludes.',
        dest='EXCLUDE_REGEX',
        action='append',
        default=[],
    )

    parser.add_argument(
        '--include-pattern',
        help='Any line matching this pattern (case-insensitive) will count as a include. '
             'The mountpoint is included if it contains the specified value. '
             'Example: "boot" includes "/boot" as well as "/boot/efi". '
             'Can be specified multiple times. '
             'On Windows, use drive letters without backslash ("Y:" or "Y"). '
             'Includes are matched before excludes.',
        dest='INCLUDE_PATTERN',
        action='append',
        default=[],
    )

    parser.add_argument(
        '--include-regex',
        help='Any line matching this python regex (case-insensitive) will count as a include. '
             'Can be specified multiple times. '
             'On Windows, use drive letters without backslash ("Y:" or "Y"). '
             'Includes are matched before excludes.',
        dest='INCLUDE_REGEX',
        action='append',
        default=[],
    )

    parser.add_argument(
        '--perfdata-regex',
        help='Only print perfdata keys matching this python regex. '
             'Can be specified multiple times.',
        action='append',
        dest='PERFDATA_REGEX',
        default=[],
    )

    parser.add_argument(
        '-w', '--warning',
        help='Warning threshold, '
             'of the form "<number>[unit][method]", where unit is '
             'one of `%%|K|M|G|T|P` and method is one of `USED|FREE`. If "unit" is omitted, '
             '`%%` is assumed. `K` means `kibibyte` etc. If "method" is omitted, '
             '`USED` is assumed. `USED` means "number ore more", `FREE` means "number or less". '
             'Examples: '
             '`95` = alert at 95%% usage. '
             '`9.5M` = alert when 9.5 MiB is used. '
             'Other self-explanatory examples are '
             '`95%%USED`, `5%%FREE`, `9.5GFREE`, `1400GUSED`. '
             'Default: %(default)s',
        dest='WARN',
        type=lib.args.number_unit_method,
        default=DEFAULT_WARN,
    )

    return parser.parse_args()

def compile_regex(regex, what):
    """Return a compiled regex."""
    try:
        return [re.compile(item, re.IGNORECASE) for item in regex]
    except re.error as e:
        lib.base.oao(
            'Your {} "{}" contains one or more errors: {}'.format(
                what,
                regex,
                e,
            ),
            STATE_UNKNOWN,
        )

def check_disk_accessibility(mountpoint):
    """Check if a disk is accessible."""
    try:
        psutil.disk_usage(mountpoint)
        return True
    except (PermissionError, FileNotFoundError, OSError):
        return False

def main():
    """The main function."""
    try:
        args = parse_args()
    except SystemExit:
        sys.exit(STATE_UNKNOWN)

    try:
        float(args.WARN[0])
        float(args.CRIT[0])
    except ValueError:
        lib.base.oao(
            'Invalid parameter value.',
            STATE_UNKNOWN,
        )

    compiled_include_regex = compile_regex(args.INCLUDE_REGEX, 'include-regex')
    compiled_exclude_regex = compile_regex(args.EXCLUDE_REGEX, 'exclude-regex')
    compiled_perfdata_regex = compile_regex(args.PERFDATA_REGEX, 'perfdata-regex')

    state = STATE_OK
    perfdata = ''
    table_data = []

    try:
        parts = psutil.disk_partitions(all=False)
    except AttributeError:
        lib.base.oao(
            'Did not find physical devices (e.g. hard disks, cd-rom drives, USB keys).',
            STATE_UNKNOWN,
        )

    for part in parts:
        if part.fstype in ['CDFS', 'iso9660', 'squashfs', 'UDF'] or part.opts in ['cdrom']:
            continue

        mountpoint = part.mountpoint.lower()
        if args.INCLUDE_PATTERN or args.INCLUDE_REGEX:
            if not any(include_pattern.lower() in mountpoint for include_pattern in args.INCLUDE_PATTERN) \
            and not any(item.search(mountpoint) for item in compiled_include_regex):
                continue
        if args.EXCLUDE_PATTERN or args.EXCLUDE_REGEX:
            if any(exclude_pattern.lower() in mountpoint for exclude_pattern in args.EXCLUDE_PATTERN) \
            or any(item.search(mountpoint) for item in compiled_exclude_regex):
                continue

        is_accessible = check_disk_accessibility(part.mountpoint)
        
        if not is_accessible:
            table_data.append({
                'mountpoint': '{}'.format(part.mountpoint),
                'type': '{}'.format(part.fstype),
                'used': 'N/A',
                'avail': 'N/A',
                'size': 'N/A',
                'percent': 'N/A',
                'accessible': 'No'
            })
            continue

        try:
            usage = psutil.disk_usage(part.mountpoint)
        except Exception:
            table_data.append({
                'mountpoint': '{}'.format(part.mountpoint),
                'type': '{}'.format(part.fstype),
                'used': 'N/A',
                'avail': 'N/A',
                'size': 'N/A',
                'percent': 'N/A',
                'accessible': 'No'
            })
            continue

        disk_state = STATE_OK
        
        if args.WARN[1] == '%' and args.WARN[2] == 'USED':
            disk_state = lib.base.get_state(
                usage.percent,
                args.WARN[0],
                None,
                'ge',
            )
        elif args.WARN[1] == '%' and args.WARN[2] == 'FREE':
            disk_state = lib.base.get_state(
                100.0 - usage.percent,
                args.WARN[0],
                None,
                'le',
            )
        elif args.WARN[1] != '%' and args.WARN[2] == 'USED':
            disk_state = lib.base.get_state(
                usage.used,
                lib.human.human2bytes(''.join(args.WARN[:2])),
                None,
                'ge',
            )
        elif args.WARN[1] != '%' and args.WARN[2] == 'FREE':
            disk_state = lib.base.get_state(
                usage.free,
                lib.human.human2bytes(''.join(args.WARN[:2])),
                None,
                'le',
            )

        if args.CRIT[1] == '%' and args.CRIT[2] == 'USED':
            disk_state = lib.base.get_worst(
                disk_state,
                lib.base.get_state(
                    usage.percent,
                    None,
                    args.CRIT[0],
                    'ge',
                ),
            )
        elif args.CRIT[1] == '%' and args.CRIT[2] == 'FREE':
            disk_state = lib.base.get_worst(
                disk_state,
                lib.base.get_state(
                    100.0 - usage.percent,
                    None,
                    args.CRIT[0],
                    'le',
                ),
            )
        elif args.CRIT[1] != '%' and args.CRIT[2] == 'USED':
            disk_state = lib.base.get_worst(
                disk_state,
                lib.base.get_state(
                    usage.used,
                    None,
                    lib.human.human2bytes(''.join(args.CRIT[:2])),
                    'ge',
                ),
            )
        elif args.CRIT[1] != '%' and args.CRIT[2] == 'FREE':
            disk_state = lib.base.get_worst(
                disk_state,
                lib.base.get_state(
                    usage.free,
                    None,
                    lib.human.human2bytes(''.join(args.CRIT[:2])),
                    'le',
                ),
            )

        state = lib.base.get_worst(state, disk_state)

        perfdata_key = '{}-usage'.format(part.mountpoint)
        if not args.PERFDATA_REGEX \
        or any(item.search(perfdata_key) for item in compiled_perfdata_regex):
            perfdata += lib.base.get_perfdata(
                perfdata_key,
                usage.used,
                uom='B',
                warn=None,
                crit=None,
                _min=0,
                _max=usage.total,
            )
        perfdata_key = '{}-total'.format(part.mountpoint)
        if not args.PERFDATA_REGEX \
        or any(item.search(perfdata_key) for item in compiled_perfdata_regex):
            perfdata += lib.base.get_perfdata(
                perfdata_key,
                usage.total,
                uom='B',
                warn=None,
                crit=None,
                _min=0,
                _max=usage.total,
            )
        perfdata_key = '{}-percent'.format(part.mountpoint)
        if not args.PERFDATA_REGEX \
        or any(item.search(perfdata_key) for item in compiled_perfdata_regex):
            perfdata += lib.base.get_perfdata(
                perfdata_key,
                usage.percent,
                uom='%',
                warn=None,
                crit=None,
                _min=0,
                _max=100,
            )

        # Only add to table data if not in brief mode or if disk is warning/critical
        if not args.BRIEF or disk_state != STATE_OK:
            table_data.append({
                'mountpoint': '{}'.format(part.mountpoint),
                'type': '{}'.format(part.fstype),
                'used': lib.human.bytes2human(usage.used),
                'avail': lib.human.bytes2human(usage.free),
                'size': lib.human.bytes2human(usage.total),
                'percent': '{}%{}'.format(usage.percent, lib.base.state2str(disk_state, prefix=' ')),
                'accessible': 'Yes'
            })

    if not table_data:
        msg = 'Everything is ok.'
    elif len(table_data) == 1:
        msg = '{} {} - total: {}, free: {}, used: {} (warn={} crit={}) - Accessible: {}'.format(
            table_data[0]['mountpoint'],
            table_data[0]['percent'],
            table_data[0]['size'],
            table_data[0]['avail'],
            table_data[0]['used'],
            ''.join(args.WARN),
            ''.join(args.CRIT),
            table_data[0]['accessible']
        )
    else:
        if state == STATE_CRIT:
            msg = 'There are critical errors.'
        elif state == STATE_WARN:
            msg = 'There are warnings.'
        else:
            msg = 'Everything is ok.'
        msg = '{} (warn={} crit={})\n\n{}'.format(
            msg,
            ''.join(args.WARN),
            ''.join(args.CRIT),
            lib.base.get_table(
                table_data,
                ['mountpoint', 'type', 'size', 'used', 'avail', 'percent', 'accessible'],
                ['Mountpoint', 'Type', 'Size', 'Used', 'Avail', 'Use%', 'Accessible'],
                'percent',
            ),
        )

    lib.base.oao(msg, state, perfdata, always_ok=args.ALWAYS_OK)

if __name__ == '__main__':
    try:
        main()
    except Exception:
        lib.base.cu()
           

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants