-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GA support for reading from journald #37086
Comments
@rdner I am interested to get your opinion on this given the amount of time you are spending trying to migrate and drive consistency between the log, filestream, and container input that already exist. |
For Option1 do we have to provide a conditional? I think both inputs could be enabled at the same time, it would just have to be non-fatal for the source not to be present. For example you can enable both journald, logfile & udp in iptables integration all at the same time. (And UDP and journald are on by default) |
If we don't have a conditional we risk duplicated logs. I think if we defaulted to always using both inputs we'd get a small amount of duplicated logs today on Debian 11, it looks like the kernal boot logs go to both journald and /var/log/ craig_mackenzie@cmackenzie-debian11-test:~$ journalctl
-- Journal begins at Tue 2023-11-14 20:15:17 UTC, ends at Tue 2023-11-14 20:19:35 UTC. --
Nov 14 20:15:17 debian kernel: Linux version 5.10.0-26-cloud-amd64 (debian-kernel@lists.debian.org) (gcc-10 (D>
Nov 14 20:15:17 debian kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.10.0-26-cloud-amd64 root=UUID=62c0943b>
Nov 14 20:15:17 debian kernel: BIOS-provided physical RAM map:
Nov 14 20:15:17 debian kernel: BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved
Nov 14 20:15:17 debian kernel: BIOS-e820: [mem 0x0000000000001000-0x0000000000054fff] usable
Nov 14 20:15:17 debian kernel: BIOS-e820: [mem 0x0000000000055000-0x000000000005ffff] reserved
Nov 14 20:15:17 debian kernel: BIOS-e820: [mem 0x0000000000060000-0x0000000000097fff] usable
Nov 14 20:15:17 debian kernel: BIOS-e820: [mem 0x0000000000098000-0x000000000009ffff] reserved
Nov 14 20:15:17 debian kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bf8ecfff] usable
Nov 14 20:15:17 debian kernel: BIOS-e820: [mem 0x00000000bf8ed000-0x00000000bf9ecfff] reserved
Nov 14 20:15:17 debian kernel: BIOS-e820: [mem 0x00000000bf9ed000-0x00000000bfaecfff] type 20
Nov 14 20:15:17 debian kernel: BIOS-e820: [mem 0x00000000bfaed000-0x00000000bfb6cfff] reserved
Nov 14 20:15:17 debian kernel: BIOS-e820: [mem 0x00000000bfb6d000-0x00000000bfb7efff] ACPI data
Nov 14 20:15:17 debian kernel: BIOS-e820: [mem 0x00000000bfb7f000-0x00000000bfbfefff] ACPI NVS
Nov 14 20:15:17 debian kernel: BIOS-e820: [mem 0x00000000bfbff000-0x00000000bffdffff] usable
Nov 14 20:15:17 debian kernel: BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
Nov 14 20:15:17 debian kernel: BIOS-e820: [mem 0x0000000100000000-0x000000013fffffff] usable
Nov 14 20:15:17 debian kernel: printk: bootconsole [earlyser0] enabled
Nov 14 20:15:17 debian kernel: NX (Execute Disable) protection: active
Nov 14 20:15:17 debian kernel: efi: EFI v2.70 by EDK II
Nov 14 20:15:17 debian kernel: efi: TPMFinalLog=0xbfbf7000 ACPI=0xbfb7e000 ACPI 2.0=0xbfb7e014 SMBIOS=0xbf9ca0>
Nov 14 20:15:17 debian kernel: secureboot: Secure boot disabled
Nov 14 20:15:17 debian kernel: SMBIOS 2.4 present.
Nov 14 20:15:17 debian kernel: DMI: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
Nov 14 20:15:17 debian kernel: Hypervisor detected: KVM
Nov 14 20:15:17 debian kernel: kvm-clock: Using msrs 4b564d01 and 4b564d00
Nov 14 20:15:17 debian kernel: kvm-clock: cpu 0, msr 78801001, primary cpu clock
Nov 14 20:15:17 debian kernel: kvm-clock: using sched offset of 7655756989 cycles
Nov 14 20:15:17 debian kernel: clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max>
Nov 14 20:15:17 debian kernel: tsc: Detected 2200.158 MHz processor
Nov 14 20:15:17 debian kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
Nov 14 20:15:17 debian kernel: e820: remove [mem 0x000a0000-0x000fffff] usable
Nov 14 20:15:17 debian kernel: last_pfn = 0x140000 max_arch_pfn = 0x400000000
Nov 14 20:15:17 debian kernel: MTRR default type: write-back
Nov 14 20:15:17 debian kernel: MTRR fixed ranges enabled:
Nov 14 20:15:17 debian kernel: 00000-9FFFF write-back
Nov 14 20:15:17 debian kernel: A0000-FFFFF uncachable
craig_mackenzie@cmackenzie-debian11-test:~$ grep -rn 'kvm-clock: cpu 0, msr 78801001, primary cpu cloc' /var/log
grep: /var/log/journal/3465bc73197d954b92a16251605729f5/system.journal: binary file matches
grep: /var/log/private: Permission denied
grep: /var/log/btmp: Permission denied
/var/log/syslog:125:Nov 14 20:15:18 debian kernel: [ 0.000000] kvm-clock: cpu 0, msr 78801001, primary cpu clock
/var/log/messages:27:Nov 14 20:15:18 debian kernel: [ 0.000000] kvm-clock: cpu 0, msr 78801001, primary cpu clock
grep: /var/log/chrony: Permission denied
/var/log/kern.log:27:Nov 14 20:15:18 debian kernel: [ 0.000000] kvm-clock: cpu 0, msr 78801001, primary cpu clock Granted if someone set their logs path to |
It also looks like the journald input is using go-systemd/sdjournal/ which is just wrapping the systemd journal C API: func NewJournal() (j *Journal, err error) {
j = &Journal{}
sd_journal_open, err := getFunction("sd_journal_open")
if err != nil {
return nil, err
}
r := C.my_sd_journal_open(sd_journal_open, &j.cjournal, C.SD_JOURNAL_LOCAL_ONLY) This wouldn't fit with the idea of just using a filestream parser for journald, at best we could just hide the entire journald input inside filestream so there's a single log input, but we'd probably still need dedicated configuration specific to reading journald files. |
The default journald configuration that reads everything is only two lines so I think at this point I'm convinced that keeping the journald input and improving it is the best path: # Read all journald logs
- type: journald
id: everything I don't think folding this into filestream will make filestream easier to use, or be easier to maintain. |
To summarise what we discussed with @cmacknz on a call:
|
A few things that come to mind related to journald:
|
Thanks, I think it would make sense to compare the events collected without the journald inputs to those collected with it for the sources needed by the system integration. If the event content is significantly different it will cause problems for dashboards and queries. |
This is alluded to in some linked issues, but I wanted to explicitly mention that the journald library version in our container images is v245 (from Mar 6, 2020), and when deploying this image on Ubuntu 22.04 nodes, which uses v249, you can't collect logs from the host (no crashes, just no logs). My workaround has been to repack the filebeat binaries in with a more recent base image. We might want to consider bumping our base image as part of making this GA. |
I found another bug, probably another blocker: #39352 It seems that if Filebeat falls too far behind with the Journal the input will crash shortly after starting. |
#32782 and #39352 happen intermittently on my test environments, so far I did not manage to isolate them but they both are coming from a call to
#39352 I only managed to reproduce with Journald |
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
Now that we support reporting status per input when running under Elastic-Agent and given that Journald can cause Filebeat to crash (as detailed by #34077), I added a task to ensure we have input status reporting for Journald before we GA so we can better report unsupported Systemd versions. |
Direct use of |
From the comment in #37086 (comment) implying the documents the journald input produces are unlikely to be optimal today, we need to add a task to compare the structure and metadata of the logs collected by journald to those collected today from syslog with the log input and ensure we only see expected differences. |
@cmacknz, @pierrehilbert I've been working on using However, the filtering we have ( So far I could access that the problem is caused by two issues:
Bear in mind this Which brings me some product related questions:
Just to give an example, the following YAML: include_matches:
- and:
- match:
- FOO=bar
- match:
- BAR=bar
- or:
- match:
- FOO_BAR=foo_bar generates the following logic expression:
Yes, there is an I would expect the above YAML to produce the following logic expression:
Honestly, I don't think we should accept YAML above, it is confusing how the |
The journald input is in technical preview, you can make whatever changes are needed breaking or not. That said, people are using it as is so don't make breaking changes that don't actually help. Our end goal is not just to GA journald, it is to make Debian 12 based systems (and other distros that default to journald) work in our system integration. This means we want users to be able to get the same information out of journald that they would get out of syslog, without breaking any queries, dashboards, or alerts that already exist. So breaking changes need to be focused on the shape of the output data, and if it can be filtered in the same ways, and not on the input configuration syntax used to obtain it. If we can make it so the journald input is a drop in replacement for the log input in the system integration with no configuration changes that is even better, but I'm not sure this will be possible. |
Thanks @cmacknz, I'll take a look at the system integration on a Debian 12 host and what kind of filtering it uses for logs so I can draft the minimal requirements. |
TL;DR: The standard journald input, with no filters will collect all the data we need, to correctly add the dataset we can rely on the syslog facility code journald already adds to the events and are already published in the current version of the journald input. So for #39820 I'll focus on getting the core of the journald input working with Long version:
The Debian 12 Vagrant box I've been using comes with /etc/rsyslog.conf
Wikipedia lists the facilities and their names. With that information it should be easy to "migrate" the system integration to use the journald input and test all ingest pipelines/dashboards. |
A lot of systems do not have (and do not want) rsyslog installed, so please don't make it a requirement for reading from journald. |
I agree with @nerijus here, this is the main reason we need to support journald itself: avoiding to force our users to install rsyslog. |
We won't ;). I just mentioned rsyslog as an easy means of testing/comparing the traditional log files harvesting and the journald on the same system. We won't make rsyslog a dependency for the journald input. |
I'm not sure if that's critical to GA the journald input, but we should better support binary fields. So far I found them in two situations:
Anyway I created an issue for that: #40479 |
I took a look at the current state of the fields produced by the input. My suggestions for aligning closer to ECS are in this table. Once we settle on mappings we should add a table to the input's documentation.
|
I found an issue with our current ( I've already added it to the must have list for GA. The previous implementation did not have this problem. I already have an idea of how to fix it and how to test it. |
I came across a case where I don't think it's a blocker, but we should at least document it. GitHub issue: elastic/integrations#11717 |
We ideally would not do this and should track this in a separate issue. This breaks use cases where we receive journald logs from a remote host. I think this was handled in other places by defining a |
I saw that, that's also what I recommended in the disucss thread, however I'm not sure what is the best approach to "solve" this issue: Do we add the My gut feeling tells me we should use different fields, at least in this context of one Filebeat ingesting logs from multiple hosts. I'll centralise this discussion on a single issue because it's affecting multiple projects in different ways. |
Removed #42208 from the nice to have and moved it to the must have section of this issue. |
system.auth and system.syslog
are not available for Debian 12 under Data streams tab. elastic-agent#3650As of Debian 12 system logs are exclusively available via journald by default. Today we support reading journald logs via the Filebeat journald input, which is still in technical preview and has several major bugs filed against it. See https://github.com/elastic/beats/issues?q=is%3Aissue+is%3Aopen+journald notably:
We need to provide a GA way to read journald logs. There are two paths to this:
Fold the existing journald functionality into filestream, so that there is only one way to read log files and all existing uses of filestream to read system logs continue to work with no or minimal modification. In the ideal case we detect we are reading journald logs based on a .journal extension or well known file paths, but we may need a configuration flag for this. If we do end up with a configuration flag we could consider implementing journald support as a type of parser https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-filestream.html#_parsersEdit:
Option 1 is the path forward, we'll keep the separate journald input.
To close this issue we'll need to:
Must have
Nice to have
include_matches
to reach feature parity with what is exposed byjournalctl
#40185The text was updated successfully, but these errors were encountered: