Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend capabilities of journald-query #795

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 39 additions & 29 deletions check-plugins/journald-query/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Fact Sheet

.. csv-table::
:widths: 30, 70

"Check Plugin Download", "https://github.com/Linuxfabrik/monitoring-plugins/tree/main/check-plugins/journald-query"
"Check Interval Recommendation", "Once a minute"
"Can be called without parameters", "Yes"
Expand All @@ -39,9 +39,11 @@ Help
usage: journald-query [-h] [-V] [--always-ok] [--facility FACILITY]
[--identifier IDENTIFIER]
[--ignore-pattern IGNORE_PATTERN]
[--ignore-regex IGNORE_REGEX] [--priority PRIORITY]
[--severity {warn,crit}] [--since SINCE] [--test TEST]
[--unit UNIT] [--user-unit USER_UNIT]
[--ignore-regex IGNORE_REGEX] [--grep GREP]
[--priority PRIORITY] [--severity {warn,crit}]
[--since SINCE] [--test TEST] [--unit UNIT]
[--user-unit USER_UNIT] [--count COUNT]
[--match MATCH]

Query the systemd journal and alert on any events found. For help on any of
the journalctl-specific parameters, see `man journalctl`.
Expand All @@ -67,6 +69,10 @@ Help
`journalctl`, you can easily use a regex to ignore
certain messages. Example: '(?i)linuxfabrik' for a
case-insensitive search for "linuxfabrik".
--grep GREP journalctl: Filter output to entries where the
MESSAGE= field matches the specified regular
expression. PERL-compatible regular expressions are
used
--priority PRIORITY journalctl: Filter output by message priorities or
priority ranges. Default: emerg..err
--severity {warn,crit}
Expand All @@ -83,6 +89,10 @@ Help
journalctl: Show messages for the specified user
session unit. This parameter can be specified multiple
times. Default: None
--count COUNT Number of events to trigger the state. Default: 1
--match MATCH journalctl: Filter journal entries by specific fields'
values. Should be in the format "FIELD=VALUE", see
`man journalctl` for details.


Usage Examples
Expand All @@ -98,21 +108,21 @@ Output:

.. code-block:: text

27 events. Latest event at 2022-07-28 15:08:04 from systemd-resolved, level err: `Failed to send hostname reply: Transport endpoint is not connected` [WARNING].
27 events. Latest event at 2022-07-28 15:08:04 from systemd-resolved, level err: `Failed to send hostname reply: Transport endpoint is not connected` [WARNING].
Attention: Table below is shortened and just shows the 5 newest and the 5 oldest messages.

Timestamp ! Unit ! Prio ! Message
Timestamp ! Unit ! Prio ! Message
--------------------+------------------+------+-------------------------------------------------------------------------------------------------------------------------------------------
2022-07-28 15:08:04 ! systemd-resolved ! err ! Failed to send hostname reply: Transport endpoint is not connected
2022-07-28 09:27:03 ! dnf-makecache ! err ! Failed to start dnf makecache.
2022-07-28 09:10:55 ! session-c1.scope ! err ! GLib-GObject: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
2022-07-28 09:10:51 ! user@1000 ! err ! Failed to start Application launched by gnome-session-binary.
2022-07-28 09:10:51 ! user@1000 ! err ! Failed to start Application launched by gnome-session-binary.
2022-07-27 20:36:52 ! user@1000 ! err ! Ignoring duplicate name 'org.freedesktop.FileManager1' in service file '/usr/share//dbus-1/services/org.freedesktop.FileManager1.service'
2022-07-27 20:36:36 ! user@1000 ! err ! Ignoring duplicate name 'org.freedesktop.FileManager1' in service file '/usr/share//dbus-1/services/org.freedesktop.FileManager1.service'
2022-07-27 20:36:36 ! user@1000 ! err ! Ignoring duplicate name 'org.freedesktop.FileManager1' in service file '/usr/share//dbus-1/services/org.freedesktop.FileManager1.service'
2022-07-27 20:36:34 ! user@1000 ! err ! Ignoring duplicate name 'org.freedesktop.FileManager1' in service file '/usr/share//dbus-1/services/org.freedesktop.FileManager1.service'
2022-07-27 20:36:34 ! user@1000 ! err ! Ignoring duplicate name 'org.freedesktop.FileManager1' in service file '/usr/share//dbus-1/services/org.freedesktop.FileManager1.service'
2022-07-28 15:08:04 ! systemd-resolved ! err ! Failed to send hostname reply: Transport endpoint is not connected
2022-07-28 09:27:03 ! dnf-makecache ! err ! Failed to start dnf makecache.
2022-07-28 09:10:55 ! session-c1.scope ! err ! GLib-GObject: g_object_unref: assertion 'G_IS_OBJECT (object)' failed
2022-07-28 09:10:51 ! user@1000 ! err ! Failed to start Application launched by gnome-session-binary.
2022-07-28 09:10:51 ! user@1000 ! err ! Failed to start Application launched by gnome-session-binary.
2022-07-27 20:36:52 ! user@1000 ! err ! Ignoring duplicate name 'org.freedesktop.FileManager1' in service file '/usr/share//dbus-1/services/org.freedesktop.FileManager1.service'
2022-07-27 20:36:36 ! user@1000 ! err ! Ignoring duplicate name 'org.freedesktop.FileManager1' in service file '/usr/share//dbus-1/services/org.freedesktop.FileManager1.service'
2022-07-27 20:36:36 ! user@1000 ! err ! Ignoring duplicate name 'org.freedesktop.FileManager1' in service file '/usr/share//dbus-1/services/org.freedesktop.FileManager1.service'
2022-07-27 20:36:34 ! user@1000 ! err ! Ignoring duplicate name 'org.freedesktop.FileManager1' in service file '/usr/share//dbus-1/services/org.freedesktop.FileManager1.service'
2022-07-27 20:36:34 ! user@1000 ! err ! Ignoring duplicate name 'org.freedesktop.FileManager1' in service file '/usr/share//dbus-1/services/org.freedesktop.FileManager1.service'

Use `journalctl --reverse --priority=emerg..err --since=-24h` as a starting point for debugging. Be aware of the fact that you might see even more messages then, as we apply a lot of unit filters to only get messages from basic system services.
The full command used was:
Expand All @@ -131,18 +141,18 @@ Output:
994 events. Latest event at 2022-07-28 18:00:04 from httpd, level err: `[proxy_fcgi:error] [pid 896:tid 929] [client 127.0.0.1:50256] AH01071: Got error 'Primary script unknown'` [CRITICAL].
Attention: Table below is shortened and just shows the 5 newest and the 5 oldest messages.

Timestamp ! Unit ! Prio ! Message
Timestamp ! Unit ! Prio ! Message
--------------------+-------+------+-----------------------------------------------------------------------------------------------------------
2022-07-28 18:00:04 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 929] [client 127.0.0.1:50256] AH01071: Got error 'Primary script unknown'
2022-07-28 17:59:55 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 927] [client 127.0.0.1:57732] AH01071: Got error 'Primary script unknown'
2022-07-28 17:59:04 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 945] [client 127.0.0.1:53908] AH01071: Got error 'Primary script unknown'
2022-07-28 17:58:55 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 943] [client 127.0.0.1:56074] AH01071: Got error 'Primary script unknown'
2022-07-28 17:58:04 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 936] [client 127.0.0.1:44684] AH01071: Got error 'Primary script unknown'
2022-07-28 09:45:55 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 947] [client 127.0.0.1:52536] AH01071: Got error 'Primary script unknown'
2022-07-28 09:45:04 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 940] [client 127.0.0.1:53256] AH01071: Got error 'Primary script unknown'
2022-07-28 09:44:55 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 938] [client 127.0.0.1:44544] AH01071: Got error 'Primary script unknown'
2022-07-28 09:44:04 ! httpd ! err ! [proxy_fcgi:error] [pid 897:tid 904] [client 127.0.0.1:40142] AH01071: Got error 'Primary script unknown'
2022-07-28 09:43:55 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 931] [client 127.0.0.1:34050] AH01071: Got error 'Primary script unknown'
2022-07-28 18:00:04 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 929] [client 127.0.0.1:50256] AH01071: Got error 'Primary script unknown'
2022-07-28 17:59:55 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 927] [client 127.0.0.1:57732] AH01071: Got error 'Primary script unknown'
2022-07-28 17:59:04 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 945] [client 127.0.0.1:53908] AH01071: Got error 'Primary script unknown'
2022-07-28 17:58:55 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 943] [client 127.0.0.1:56074] AH01071: Got error 'Primary script unknown'
2022-07-28 17:58:04 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 936] [client 127.0.0.1:44684] AH01071: Got error 'Primary script unknown'
2022-07-28 09:45:55 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 947] [client 127.0.0.1:52536] AH01071: Got error 'Primary script unknown'
2022-07-28 09:45:04 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 940] [client 127.0.0.1:53256] AH01071: Got error 'Primary script unknown'
2022-07-28 09:44:55 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 938] [client 127.0.0.1:44544] AH01071: Got error 'Primary script unknown'
2022-07-28 09:44:04 ! httpd ! err ! [proxy_fcgi:error] [pid 897:tid 904] [client 127.0.0.1:40142] AH01071: Got error 'Primary script unknown'
2022-07-28 09:43:55 ! httpd ! err ! [proxy_fcgi:error] [pid 896:tid 931] [client 127.0.0.1:34050] AH01071: Got error 'Primary script unknown'

The full command used was:
journalctl --reverse --priority=emerg..err --since=-24h --unit="httpd.service"
Expand All @@ -160,8 +170,8 @@ Perfdata / Metrics
.. csv-table::
:widths: 25, 15, 60
:header-rows: 1
Name, Type, Description

Name, Type, Description
journald-query, Number, Number of events found in journald


Expand Down
41 changes: 39 additions & 2 deletions check-plugins/journald-query/journald-query
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ DEFAULT_SERVERITY = 'warn'
DEFAULT_SINCE = '-8h'
DEFAULT_UNIT = None
DEFAULT_USER_UNIT = None
DEFAULT_COUNT = 1
DEFAULT_GREP = None

# don't sort JOURNALD_PRIOS alphabetically, we need the indexes (0 = emerg etc.)
JOURNALD_PRIOS = [
Expand Down Expand Up @@ -110,6 +112,15 @@ def parse_args():
dest='IGNORE_REGEX',
)

parser.add_argument(
'--grep',
help='journalctl: Filter output to entries where the MESSAGE= field '
'matches the specified regular expression. PERL-compatible '
'regular expressions are used',
default=DEFAULT_GREP,
dest='GREP',
)

parser.add_argument(
'--priority',
help='journalctl: Filter output by message priorities or priority '
Expand Down Expand Up @@ -166,6 +177,24 @@ def parse_args():
action='append',
)

parser.add_argument(
'--count',
help='Number of events to trigger the state. Default: %(default)d',
dest='COUNT',
default=DEFAULT_COUNT,
type=lib.args.int_or_none,
)

parser.add_argument(
'--match',
help='journalctl: Filter journal entries by specific fields\' values.'
' Should be in the format "FIELD=VALUE", see `man journalctl` for '
'details.',
action='append',
default=[],
dest='MATCH',
)

return parser.parse_args()


Expand Down Expand Up @@ -243,6 +272,13 @@ def main():
if args.USER_UNIT is not None:
for unit in args.USER_UNIT:
cmd += '--user-unit="{}" '.format(unit)
if args.GREP is not None:
cmd += ' --grep="{}" '.format(args.GREP)
if args.MATCH is not None:
for match in args.MATCH:
if match.find('=') == -1:
lib.base.cu('Invalid match specification: {}'.format(match))
cmd += ' "{}" '.format(match)
cmd = cmd.strip()
stdout, stderr, retc = lib.base.coe(lib.shell.shell_exec(cmd)) # pylint: disable=W0612
if stderr:
Expand All @@ -259,8 +295,8 @@ def main():
# analyze data
if stdout:
# found something, so nothing good
state = lib.base.str2state(args.SEVERITY)
result = stdout.splitlines()
threshold = args.COUNT or 1

compiled_ignore_regex = [re.compile(item) for item in args.IGNORE_REGEX]
for item in result:
Expand Down Expand Up @@ -302,14 +338,15 @@ def main():

# build the message
if table_data:
state = lib.base.str2state(args.SEVERITY) if cnt >= threshold else STATE_OK
msg = '{} {}. Latest event at {} from {}, level {}: `{}`{}'.format(
cnt,
lib.txt.pluralize('event', cnt),
table_data[0]['timestamp'],
table_data[0]['unit'],
table_data[0]['priority'],
table_data[0]['MESSAGE'],
lib.base.state2str(lib.base.str2state(args.SEVERITY), prefix=' '),
lib.base.state2str(state, prefix=' '),
)
if shortened:
msg += '\nAttention: Table below is truncated, showing the 5 newest and ' \
Expand Down
28 changes: 28 additions & 0 deletions check-plugins/journald-query/unit-test/run
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,34 @@ class TestCheck(unittest.TestCase):
self.assertEqual(stderr, '')
self.assertEqual(retc, STATE_CRIT)

def test_if_check_respects_ignores_EXAMPLE02c(self):
stdout, stderr, retc = lib.base.coe(lib.shell.shell_exec(self.check + ' --ignore-pattern="iSCSI" --test=stdout/EXAMPLE02,,0'))
self.assertIn('Everything is ok.', stdout)
self.assertEqual(stderr, '')
self.assertEqual(retc, STATE_OK)


def test_events_below_threshold_EXAMPLE03a(self):
stdout, stderr, retc = lib.base.coe(lib.shell.shell_exec(self.check + ' --count=5 --test=stdout/EXAMPLE03,,0'))
self.assertIn('3 events. Latest event at 2022-07-28 14:29:48 from iscsid, level err: `iSCSI daemon with pid=865 started!`', stdout)
self.assertIn('Timestamp ! Unit ! Prio ! Message', stdout)
self.assertIn('--------------------+--------+------+------------------------------------', stdout)
self.assertIn('2022-07-28 14:29:48 ! iscsid ! err ! iSCSI daemon with pid=865 started!', stdout)
self.assertIn('2022-07-28 14:29:48 ! iscsid ! err ! iSCSI daemon with pid=866 started!', stdout)
self.assertIn('2022-07-28 14:29:48 ! iscsid ! err ! iSCSI daemon with pid=867 started!', stdout)
self.assertEqual(stderr, '')
self.assertEqual(retc, STATE_OK)

def test_events_above_threshold_EXAMPLE03b(self):
stdout, stderr, retc = lib.base.coe(lib.shell.shell_exec(self.check + ' --count=2 --test=stdout/EXAMPLE03,,0'))
self.assertIn('3 events. Latest event at 2022-07-28 14:29:48 from iscsid, level err: `iSCSI daemon with pid=865 started!` [WARNING]', stdout)
self.assertIn('Timestamp ! Unit ! Prio ! Message', stdout)
self.assertIn('--------------------+--------+------+------------------------------------', stdout)
self.assertIn('2022-07-28 14:29:48 ! iscsid ! err ! iSCSI daemon with pid=865 started!', stdout)
self.assertIn('2022-07-28 14:29:48 ! iscsid ! err ! iSCSI daemon with pid=866 started!', stdout)
self.assertIn('2022-07-28 14:29:48 ! iscsid ! err ! iSCSI daemon with pid=867 started!', stdout)
self.assertEqual(stderr, '')
self.assertEqual(retc, STATE_WARN)

if __name__ == '__main__':
unittest.main()
3 changes: 3 additions & 0 deletions check-plugins/journald-query/unit-test/stdout/EXAMPLE03
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{ "__CURSOR" : "s=8ba8080b764946c2b09652c8f6f6d573;i=2e3;b=5aa7e94ae99b4bcab2e262a17cfeeb2a;m=25a642;t=5e4dcb07afe37;x=1705c95a45c73ab2", "__REALTIME_TIMESTAMP" : "1659011388341815", "__MONOTONIC_TIMESTAMP" : "2467394", "_BOOT_ID" : "5aa7e94ae99b4bcab2e262a17cfeeb2a", "SYSLOG_FACILITY" : "3", "_UID" : "0", "_GID" : "0", "_SYSTEMD_SLICE" : "system.slice", "_MACHINE_ID" : "80e8db6b3ccf05cef005708f62ceaaf7", "_HOSTNAME" : "ubuntu1604.localdomain", "_CAP_EFFECTIVE" : "3fffffffff", "_TRANSPORT" : "syslog", "SYSLOG_IDENTIFIER" : "iscsid", "_COMM" : "iscsid", "PRIORITY" : "3", "MESSAGE" : "iSCSI daemon with pid=865 started!", "_PID" : "863", "_EXE" : "/sbin/iscsid", "_CMDLINE" : "/sbin/iscsid", "_SYSTEMD_CGROUP" : "/system.slice/iscsid.service", "_SYSTEMD_UNIT" : "iscsid.service", "_SOURCE_REALTIME_TIMESTAMP" : "1659011388341744" }
{ "__CURSOR" : "s=8ba8080b764946c2b09652c8f6f6d573;i=2e3;b=5aa7e94ae99b4bcab2e262a17cfeeb2a;m=25a642;t=5e4dcb07afe37;x=1705c95a45c73ab2", "__REALTIME_TIMESTAMP" : "1659011388341815", "__MONOTONIC_TIMESTAMP" : "2467394", "_BOOT_ID" : "5aa7e94ae99b4bcab2e262a17cfeeb2a", "SYSLOG_FACILITY" : "3", "_UID" : "0", "_GID" : "0", "_SYSTEMD_SLICE" : "system.slice", "_MACHINE_ID" : "80e8db6b3ccf05cef005708f62ceaaf7", "_HOSTNAME" : "ubuntu1604.localdomain", "_CAP_EFFECTIVE" : "3fffffffff", "_TRANSPORT" : "syslog", "SYSLOG_IDENTIFIER" : "iscsid", "_COMM" : "iscsid", "PRIORITY" : "3", "MESSAGE" : "iSCSI daemon with pid=866 started!", "_PID" : "863", "_EXE" : "/sbin/iscsid", "_CMDLINE" : "/sbin/iscsid", "_SYSTEMD_CGROUP" : "/system.slice/iscsid.service", "_SYSTEMD_UNIT" : "iscsid.service", "_SOURCE_REALTIME_TIMESTAMP" : "1659011388341744" }
{ "__CURSOR" : "s=8ba8080b764946c2b09652c8f6f6d573;i=2e3;b=5aa7e94ae99b4bcab2e262a17cfeeb2a;m=25a642;t=5e4dcb07afe37;x=1705c95a45c73ab2", "__REALTIME_TIMESTAMP" : "1659011388341815", "__MONOTONIC_TIMESTAMP" : "2467394", "_BOOT_ID" : "5aa7e94ae99b4bcab2e262a17cfeeb2a", "SYSLOG_FACILITY" : "3", "_UID" : "0", "_GID" : "0", "_SYSTEMD_SLICE" : "system.slice", "_MACHINE_ID" : "80e8db6b3ccf05cef005708f62ceaaf7", "_HOSTNAME" : "ubuntu1604.localdomain", "_CAP_EFFECTIVE" : "3fffffffff", "_TRANSPORT" : "syslog", "SYSLOG_IDENTIFIER" : "iscsid", "_COMM" : "iscsid", "PRIORITY" : "3", "MESSAGE" : "iSCSI daemon with pid=867 started!", "_PID" : "863", "_EXE" : "/sbin/iscsid", "_CMDLINE" : "/sbin/iscsid", "_SYSTEMD_CGROUP" : "/system.slice/iscsid.service", "_SYSTEMD_UNIT" : "iscsid.service", "_SOURCE_REALTIME_TIMESTAMP" : "1659011388341744" }