Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auditbeat ERROR: get status request failed:failed to get audit status reply: no reply received #125

Open
mdnfiras opened this issue Oct 19, 2022 · 6 comments

Comments

@mdnfiras
Copy link

mdnfiras commented Oct 19, 2022

original issue: elastic/beats#33258

long story short: we run auditbeat as DaemonSet on GKE clusters with slightly different versions, some nodes run docker, other nodes run containerd.

it runs with all permissions it needs, journald already unregistered by an initContainer so auditbeat can get audit events.
Problem is that some random auditbeat pods keep outputting this error until we restart them:

ERROR: get status request failed:failed to get audit status reply: no reply received

and if we restart a totally fine auditbeat pod, it might start outputting that error too.

it doesn't however stop writing audit logs to elasticsearch. we get audit logs from the pods that are outputting the error as much as the other pods.

I traced down the error to this block of code:

go-libaudit/audit.go

Lines 496 to 498 in 6fba496

if len(msgs) == 0 {
return nil, errors.New("no reply received")
}

Wouldn't it be okay if msgs was empty? At this point we already got through this without any error:

go-libaudit/audit.go

Lines 480 to 494 in 6fba496

for i := 0; i < 10; i++ {
msgs, err = c.Netlink.Receive(true, parseNetlinkAuditMessage)
if err != nil {
switch {
case errors.Is(err, syscall.EINTR):
continue
case errors.Is(err, syscall.EAGAIN):
time.Sleep(50 * time.Millisecond)
continue
default:
return nil, fmt.Errorf("error receiving audit reply: %w", err)
}
}
break
}

and func (c *NetlinkClient) Receive() already got the appropriate error checks here:

go-libaudit/netlink.go

Lines 152 to 190 in 6fba496

func (c *NetlinkClient) Receive(nonBlocking bool, p NetlinkParser) ([]syscall.NetlinkMessage, error) {
var flags int
if nonBlocking {
flags |= syscall.MSG_DONTWAIT
}
// XXX (akroh): A possible enhancement is to use the MSG_PEEK flag to
// check the message size and increase the buffer size to handle it all.
nr, from, err := syscall.Recvfrom(c.fd, c.readBuf, flags)
if err != nil {
// EAGAIN or EWOULDBLOCK will be returned for non-blocking reads where
// the read would normally have blocked.
return nil, err
}
if nr < syscall.NLMSG_HDRLEN {
return nil, fmt.Errorf("not enough bytes (%v) received to form a netlink header", nr)
}
fromNetlink, ok := from.(*syscall.SockaddrNetlink)
if !ok || fromNetlink.Pid != 0 {
// Spoofed packet received on audit netlink socket.
return nil, errors.New("message received was not from the kernel")
}
buf := c.readBuf[:nr]
// Dump raw data for inspection purposes.
if c.respWriter != nil {
if _, err = c.respWriter.Write(buf); err != nil {
return nil, err
}
}
msgs, err := p(buf)
if err != nil {
return nil, fmt.Errorf("failed to parse netlink messages (bytes_received=%v): %w", nr, err)
}
return msgs, nil
}

Shouldn't len(msgs) == 0 be reported as a warning instead of an error?

@efd6
Copy link
Contributor

efd6 commented Nov 14, 2022

We could define the error returned by getReply as a warning only sentinel, but it would be good to get an understanding of why it is that the systems that you are running demonstrate this behaviour. The only path that explains this is when *NetlinkClient.Receive keeps getting EINTR or EAGAIN from syscall.Recvfrom Do you have any ideas why your hosts would be either not sending the messages or would be seeing heavy use of interrupts? Can you determine which of these is the case?

@efd6

This comment was marked as outdated.

@efd6

This comment was marked as outdated.

@sc07kvm
Copy link

sc07kvm commented Dec 13, 2024

I have the same behavior on GetRules.

failed receiving rule data: no reply received

10 times EAGAIN returns.

@nicholasberlin
Copy link

It's possible this was fixed in 8.16: elastic/beats#41207.
@sc07kvm what version are you running?

@sc07kvm
Copy link

sc07kvm commented Dec 14, 2024

I don't use elastic/beats. I have floating problems with this code:

func getRules() error {
	cli, err := libaudit.NewAuditClient(nil)
	if err != nil {
		return err
	}
	defer func() { _ = cli.Close() }()

	rules, err := cli.GetRules()
	if err != nil {
		return err
	}

	...

	return nil
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants