Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disk-usage: handle disk accessibility #792

Open
6 of 8 tasks
drapiti opened this issue Dec 4, 2024 · 2 comments
Open
6 of 8 tasks

disk-usage: handle disk accessibility #792

drapiti opened this issue Dec 4, 2024 · 2 comments
Labels
bug Something isn't working
Milestone

Comments

@drapiti
Copy link

drapiti commented Dec 4, 2024

This issue respects the following points:

Which variant of the Monitoring Plugins do you use?

  • .rpm/.deb package from repo.linuxfabrik.ch
  • Compiled for Linux (.tar/.zip from download.linuxfabrik.ch)
  • Compiled for Windows (from download.linuxfabrik.ch)
  • Source Code from GitHub

Bug description

At the moment any disk which is not accessible throws an ugly error such as:

Traceback (most recent call last):

  File "C:\PROGRA~3\icinga2\usr\lib64
agios\plugins\disk-usage", line 393, in 'module'

  File "C:\PROGRA~3\icinga2\usr\lib64
agios\plugins\disk-usage", line 233, in main

  File "C:\PROGRA~3\icinga2\usr\lib64
agios\plugins\psutil\__init__.py", line 2040, in disk_usage

  File "C:\PROGRA~3\icinga2\usr\lib64
agios\plugins\psutil\_pswindows.py", line 290, in disk_usage

FileNotFoundError: [WinError 3] The system cannot find the path specified: 'C:\\ClusterStorage\\A_001_PURE_BRONZE_SYSCLUHVA21S\\'

Please add some error checking if the disk is not accessible in terms of disk-usage everything is fine so please skip the non accessible disks. Would like the check to be ok with maybe a not accessible message but not what currently happens where the check goes into unknown for all disks with Traceback errors. We get this a lot on windows systems.

Steps to reproduce - Plugin call

'C:\ProgramData\icinga2\usr\lib64\nagios\plugins\disk-usage.exe' '--critical' '96%USED' '--include-pattern' 'C:\ClusterStorage'

Steps to reproduce - Data

Run check on any disk which is present but not accessible.

Environment

Mainly windows servers

Plugin Version

any

Python version

3.12

List of Python modules

No response

Additional Information

No response

@drapiti drapiti added the bug Something isn't working label Dec 4, 2024
@drapiti
Copy link
Author

drapiti commented Dec 4, 2024

I am providing the fixed code which would resolve this issue:

#!/usr/bin/env python3
# -*- coding: utf-8; py-indent-offset: 4 -*-

import argparse
import re
import sys

import lib.args
import lib.base
import lib.human
from lib.globals import (STATE_CRIT, STATE_OK, STATE_UNKNOWN, STATE_WARN)

try:
    import psutil
except ImportError:
    lib.base.cu('Python module "psutil" is not installed.')

__author__ = 'Linuxfabrik GmbH, Zurich/Switzerland'
__version__ = '2024120404'

DESCRIPTION = 'Checks the used disk space, for each partition with accessibility check.'

DEFAULT_WARN = '90%USED'
DEFAULT_CRIT = '95%USED'

def parse_args():
    """Parse command line arguments using argparse."""
    parser = argparse.ArgumentParser(description=DESCRIPTION)

    parser.add_argument(
        '-V', '--version',
        action='version',
        version='{0}: v{1} by {2}'.format('%(prog)s', __version__, __author__)
    )

    parser.add_argument(
        '--always-ok',
        help='Always returns OK.',
        dest='ALWAYS_OK',
        action='store_true',
        default=False,
    )

    parser.add_argument(
        '-c', '--critical',
        help='Critical threshold, '
             'of the form "<number>[unit][method]", where unit is '
             'one of `%%|K|M|G|T|P` and method is one of `USED|FREE`. If "unit" is omitted, '
             '`%%` is assumed. `K` means `kibibyte` etc. If "method" is omitted, '
             '`USED` is assumed. `USED` means "number ore more", `FREE` means "number or less". '
             'Examples: '
             '`95` = alert at 95%% usage or more. '
             '`9.5M` = alert when 9.5 MiB or more is used. '
             'Other self-explanatory examples are '
             '`95%%USED`, `5%%FREE`, `9.5GFREE`, `1400GUSED`. '
             'Default: %(default)s',
        dest='CRIT',
        type=lib.args.number_unit_method,
        default=DEFAULT_CRIT,
    )

    parser.add_argument(
        '--exclude-pattern',
        help='Any line matching this pattern (case-insensitive) will count as a exclude. '
             'The mountpoint is excluded if it contains the specified value. '
             'Example: "boot" excludes "/boot" as well as "/boot/efi". '
             'Can be specified multiple times. '
             'On Windows, use drive letters without backslash ("Y:" or "Y"). '
             'Includes are matched before excludes.',
        dest='EXCLUDE_PATTERN',
        action='append',
        default=[],
    )

    parser.add_argument(
        '--exclude-regex',
        help='Any line matching this python regex (case-insensitive) will count as a exclude. '
             'Can be specified multiple times. '
             'On Windows, use drive letters without backslash ("Y:" or "Y"). '
             'Includes are matched before excludes.',
        dest='EXCLUDE_REGEX',
        action='append',
        default=[],
    )

    parser.add_argument(
        '--include-pattern',
        help='Any line matching this pattern (case-insensitive) will count as a include. '
             'The mountpoint is included if it contains the specified value. '
             'Example: "boot" includes "/boot" as well as "/boot/efi". '
             'Can be specified multiple times. '
             'On Windows, use drive letters without backslash ("Y:" or "Y"). '
             'Includes are matched before excludes.',
        dest='INCLUDE_PATTERN',
        action='append',
        default=[],
    )

    parser.add_argument(
        '--include-regex',
        help='Any line matching this python regex (case-insensitive) will count as a include. '
             'Can be specified multiple times. '
             'On Windows, use drive letters without backslash ("Y:" or "Y"). '
             'Includes are matched before excludes.',
        dest='INCLUDE_REGEX',
        action='append',
        default=[],
    )

    parser.add_argument(
        '--perfdata-regex',
        help='Only print perfdata keys matching this python regex. '
             'Can be specified multiple times.',
        action='append',
        dest='PERFDATA_REGEX',
        default=[],
    )

    parser.add_argument(
        '-w', '--warning',
        help='Warning threshold, '
             'of the form "<number>[unit][method]", where unit is '
             'one of `%%|K|M|G|T|P` and method is one of `USED|FREE`. If "unit" is omitted, '
             '`%%` is assumed. `K` means `kibibyte` etc. If "method" is omitted, '
             '`USED` is assumed. `USED` means "number ore more", `FREE` means "number or less". '
             'Examples: '
             '`95` = alert at 95%% usage. '
             '`9.5M` = alert when 9.5 MiB is used. '
             'Other self-explanatory examples are '
             '`95%%USED`, `5%%FREE`, `9.5GFREE`, `1400GUSED`. '
             'Default: %(default)s',
        dest='WARN',
        type=lib.args.number_unit_method,
        default=DEFAULT_WARN,
    )

    return parser.parse_args()

def compile_regex(regex, what):
    """Return a compiled regex."""
    try:
        return [re.compile(item, re.IGNORECASE) for item in regex]
    except re.error as e:
        lib.base.oao(
            'Your {} "{}" contains one or more errors: {}'.format(
                what,
                regex,
                e,
            ),
            STATE_UNKNOWN,
        )

def check_disk_accessibility(mountpoint):
    """Check if a disk is accessible."""
    try:
        psutil.disk_usage(mountpoint)
        return True
    except (PermissionError, FileNotFoundError, OSError):
        return False

def main():
    """The main function."""
    try:
        args = parse_args()
    except SystemExit:
        sys.exit(STATE_UNKNOWN)

    try:
        float(args.WARN[0])
        float(args.CRIT[0])
    except ValueError:
        lib.base.oao(
            'Invalid parameter value.',
            STATE_UNKNOWN,
        )

    compiled_include_regex = compile_regex(args.INCLUDE_REGEX, 'include-regex')
    compiled_exclude_regex = compile_regex(args.EXCLUDE_REGEX, 'exclude-regex')
    compiled_perfdata_regex = compile_regex(args.PERFDATA_REGEX, 'perfdata-regex')

    state = STATE_OK
    perfdata = ''
    table_data = []

    try:
        parts = psutil.disk_partitions(all=False)
    except AttributeError:
        lib.base.oao(
            'Did not find physical devices (e.g. hard disks, cd-rom drives, USB keys).',
            STATE_UNKNOWN,
        )

    for part in parts:
        if part.fstype in ['CDFS', 'iso9660', 'squashfs', 'UDF'] or part.opts in ['cdrom']:
            continue

        mountpoint = part.mountpoint.lower()
        if args.INCLUDE_PATTERN or args.INCLUDE_REGEX:
            if not any(include_pattern.lower() in mountpoint for include_pattern in args.INCLUDE_PATTERN) \
            and not any(item.search(mountpoint) for item in compiled_include_regex):
                continue
        if args.EXCLUDE_PATTERN or args.EXCLUDE_REGEX:
            if any(exclude_pattern.lower() in mountpoint for exclude_pattern in args.EXCLUDE_PATTERN) \
            or any(item.search(mountpoint) for item in compiled_exclude_regex):
                continue

        is_accessible = check_disk_accessibility(part.mountpoint)
        
        if not is_accessible:
            table_data.append({
                'mountpoint': '{}'.format(part.mountpoint),
                'type': '{}'.format(part.fstype),
                'used': 'N/A',
                'avail': 'N/A',
                'size': 'N/A',
                'percent': 'N/A',
                'accessible': 'No'
            })
            continue

        try:
            usage = psutil.disk_usage(part.mountpoint)
        except Exception:
            table_data.append({
                'mountpoint': '{}'.format(part.mountpoint),
                'type': '{}'.format(part.fstype),
                'used': 'N/A',
                'avail': 'N/A',
                'size': 'N/A',
                'percent': 'N/A',
                'accessible': 'No'
            })
            continue

        disk_state = STATE_OK
        
        if args.WARN[1] == '%' and args.WARN[2] == 'USED':
            disk_state = lib.base.get_state(
                usage.percent,
                args.WARN[0],
                None,
                'ge',
            )
        elif args.WARN[1] == '%' and args.WARN[2] == 'FREE':
            disk_state = lib.base.get_state(
                100.0 - usage.percent,
                args.WARN[0],
                None,
                'le',
            )
        elif args.WARN[1] != '%' and args.WARN[2] == 'USED':
            disk_state = lib.base.get_state(
                usage.used,
                lib.human.human2bytes(''.join(args.WARN[:2])),
                None,
                'ge',
            )
        elif args.WARN[1] != '%' and args.WARN[2] == 'FREE':
            disk_state = lib.base.get_state(
                usage.free,
                lib.human.human2bytes(''.join(args.WARN[:2])),
                None,
                'le',
            )

        if args.CRIT[1] == '%' and args.CRIT[2] == 'USED':
            disk_state = lib.base.get_worst(
                disk_state,
                lib.base.get_state(
                    usage.percent,
                    None,
                    args.CRIT[0],
                    'ge',
                ),
            )
        elif args.CRIT[1] == '%' and args.CRIT[2] == 'FREE':
            disk_state = lib.base.get_worst(
                disk_state,
                lib.base.get_state(
                    100.0 - usage.percent,
                    None,
                    args.CRIT[0],
                    'le',
                ),
            )
        elif args.CRIT[1] != '%' and args.CRIT[2] == 'USED':
            disk_state = lib.base.get_worst(
                disk_state,
                lib.base.get_state(
                    usage.used,
                    None,
                    lib.human.human2bytes(''.join(args.CRIT[:2])),
                    'ge',
                ),
            )
        elif args.CRIT[1] != '%' and args.CRIT[2] == 'FREE':
            disk_state = lib.base.get_worst(
                disk_state,
                lib.base.get_state(
                    usage.free,
                    None,
                    lib.human.human2bytes(''.join(args.CRIT[:2])),
                    'le',
                ),
            )

        state = lib.base.get_worst(state, disk_state)

        perfdata_key = '{}-usage'.format(part.mountpoint)
        if not args.PERFDATA_REGEX \
        or any(item.search(perfdata_key) for item in compiled_perfdata_regex):
            perfdata += lib.base.get_perfdata(
                perfdata_key,
                usage.used,
                uom='B',
                warn=None,
                crit=None,
                _min=0,
                _max=usage.total,
            )
        perfdata_key = '{}-total'.format(part.mountpoint)
        if not args.PERFDATA_REGEX \
        or any(item.search(perfdata_key) for item in compiled_perfdata_regex):
            perfdata += lib.base.get_perfdata(
                perfdata_key,
                usage.total,
                uom='B',
                warn=None,
                crit=None,
                _min=0,
                _max=usage.total,
            )
        perfdata_key = '{}-percent'.format(part.mountpoint)
        if not args.PERFDATA_REGEX \
        or any(item.search(perfdata_key) for item in compiled_perfdata_regex):
            perfdata += lib.base.get_perfdata(
                perfdata_key,
                usage.percent,
                uom='%',
                warn=None,
                crit=None,
                _min=0,
                _max=100,
            )

        table_data.append({
            'mountpoint': '{}'.format(part.mountpoint),
            'type': '{}'.format(part.fstype),
            'used': lib.human.bytes2human(usage.used),
            'avail': lib.human.bytes2human(usage.free),
            'size': lib.human.bytes2human(usage.total),
            'percent': '{}%{}'.format(usage.percent, lib.base.state2str(disk_state, prefix=' ')),
            'accessible': 'Yes'
        })

    if not table_data:
        msg = 'Nothing checked.'
    elif len(table_data) == 1:
        msg = '{} {} - total: {}, free: {}, used: {} (warn={} crit={}) - Accessible: {}'.format(
            table_data[0]['mountpoint'],
            table_data[0]['percent'],
            table_data[0]['size'],
            table_data[0]['avail'],
            table_data[0]['used'],
            ''.join(args.WARN),
            ''.join(args.CRIT),
            table_data[0]['accessible']
        )
    else:
        if state == STATE_CRIT:
            msg = 'There are critical errors.'
        elif state == STATE_WARN:
            msg = 'There are warnings.'
        else:
            msg = 'Everything is ok.'
        msg = '{} (warn={} crit={})\n\n{}'.format(
            msg,
            ''.join(args.WARN),
            ''.join(args.CRIT),
            lib.base.get_table(
                table_data,
                ['mountpoint', 'type', 'size', 'used', 'avail', 'percent', 'accessible'],
                ['Mountpoint', 'Type', 'Size', 'Used', 'Avail', 'Use%', 'Accessible'],
                'percent',
            ),
        )

    lib.base.oao(msg, state, perfdata, always_ok=args.ALWAYS_OK)

if __name__ == '__main__':
    try:
        main()
    except Exception:
        lib.base.cu()
           

@drapiti
Copy link
Author

drapiti commented Dec 4, 2024

Example output:

Mountpoint     ! Type ! Size      ! Used     ! Avail    ! Use% ! Accessible
---------------+------+-----------+----------+----------+------+-----------
/              ! ext4 ! 50.0GiB   ! 20.0GiB  ! 30.0GiB  ! 40.0%           ! Yes       
/var           ! ext4 ! 10.0GiB   ! 9.5GiB   ! 0.5GiB   ! 95.0% [WARNING] ! Yes       
/mnt/data      ! xfs  ! N/A       ! N/A      ! N/A      ! N/A             ! No        

There are critical errors. (warn=90%USED crit=450MFREE)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants