Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics for DNS probe failed? #739

Closed
siju-vasudevan opened this issue Jan 20, 2021 · 6 comments
Closed

Metrics for DNS probe failed? #739

siju-vasudevan opened this issue Jan 20, 2021 · 6 comments

Comments

@siju-vasudevan
Copy link

Hi,
Its regarding the icmp check
If i am not wrong , blackbox exporter will do a dns probe and then does the icmp check.
In case if the dns probe is getting failed due to any reason(one such reason would be due to the limit in the docker concurrent request-moby/libnetwork#2601) then blackbox expoter will consider the icmp check failed for that host. But here the actual issue is on the dns side and host are reachable. This could ideally generate lot of alerts. can we have a mertics for dns probe status as well ?

Regards,
Siju

@brian-brazil
Copy link
Contributor

You can determine this by probe_icmp_duration_seconds not having a resolve time. More generally, if DNS resolution fails then it is correct that the whole ICMP probe fails as a DNS outage is a serious problem and should generate alerts.

It makes more sense to ask questions like this on the prometheus-users mailing list rather than in a GitHub issue. On the mailing list, more people are available to potentially respond to your question, and the whole community can benefit from the answers provided.

@siju-vasudevan
Copy link
Author

HI Brian,

Thanks for the details.

I agree that the DNS outage is a serious problem but because of DNS outage , ICMP check for all the host will fail and its kind of a false Alert . Also you have 1000s of server getting monitored by blackbox(ICMP) it will create false alerts.

Regards,
Siju

@brian-brazil
Copy link
Contributor

I'd suggest setting up your group_by in alertmanager so that you get only a single notification with all the alerts firing, rather than a notification per alert.

@siju-vasudevan
Copy link
Author

Hi Brian,

Thanks for the suggestion !

My scenario is to have alert for each host and group_by doesn't help . How i can co-relate probe_icmp_duration_seconds and probe_success . i want to create host down alert only if host got resolve. I was trying to create a query something like the below
probe_icmp_duration_seconds{phase =~ "resolve"} == 0 and probe_success == 0 but since both metric is having different labels i think it will not work.

Do you think it is possible?

Regards,
Siju

@brian-brazil
Copy link
Contributor

That's a PromQL usage question best taken to the mailing list.

@siju-vasudevan
Copy link
Author

Ok . i have raised my question in the mailing list but not sure how fast i will get reply.
Thanks a lot for quick response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants