Metrics for DNS probe failed? #739

siju-vasudevan · 2021-01-20T13:17:22Z

Hi,
Its regarding the icmp check
If i am not wrong , blackbox exporter will do a dns probe and then does the icmp check.
In case if the dns probe is getting failed due to any reason(one such reason would be due to the limit in the docker concurrent request-moby/libnetwork#2601) then blackbox expoter will consider the icmp check failed for that host. But here the actual issue is on the dns side and host are reachable. This could ideally generate lot of alerts. can we have a mertics for dns probe status as well ?

Regards,
Siju

brian-brazil · 2021-01-20T13:30:02Z

You can determine this by probe_icmp_duration_seconds not having a resolve time. More generally, if DNS resolution fails then it is correct that the whole ICMP probe fails as a DNS outage is a serious problem and should generate alerts.

It makes more sense to ask questions like this on the prometheus-users mailing list rather than in a GitHub issue. On the mailing list, more people are available to potentially respond to your question, and the whole community can benefit from the answers provided.

siju-vasudevan · 2021-01-20T15:05:27Z

HI Brian,

Thanks for the details.

I agree that the DNS outage is a serious problem but because of DNS outage , ICMP check for all the host will fail and its kind of a false Alert . Also you have 1000s of server getting monitored by blackbox(ICMP) it will create false alerts.

Regards,
Siju

brian-brazil · 2021-01-20T15:18:13Z

I'd suggest setting up your group_by in alertmanager so that you get only a single notification with all the alerts firing, rather than a notification per alert.

siju-vasudevan · 2021-01-21T13:21:17Z

Hi Brian,

Thanks for the suggestion !

My scenario is to have alert for each host and group_by doesn't help . How i can co-relate probe_icmp_duration_seconds and probe_success . i want to create host down alert only if host got resolve. I was trying to create a query something like the below
probe_icmp_duration_seconds{phase =~ "resolve"} == 0 and probe_success == 0 but since both metric is having different labels i think it will not work.

Do you think it is possible?

Regards,
Siju

brian-brazil · 2021-01-21T13:41:33Z

That's a PromQL usage question best taken to the mailing list.

siju-vasudevan · 2021-01-21T14:30:14Z

Ok . i have raised my question in the mailing list but not sure how fast i will get reply.
Thanks a lot for quick response

brian-brazil closed this as completed Jan 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics for DNS probe failed? #739

Metrics for DNS probe failed? #739

siju-vasudevan commented Jan 20, 2021

brian-brazil commented Jan 20, 2021

siju-vasudevan commented Jan 20, 2021

brian-brazil commented Jan 20, 2021

siju-vasudevan commented Jan 21, 2021

brian-brazil commented Jan 21, 2021

siju-vasudevan commented Jan 21, 2021

Metrics for DNS probe failed? #739

Metrics for DNS probe failed? #739

Comments

siju-vasudevan commented Jan 20, 2021

brian-brazil commented Jan 20, 2021

siju-vasudevan commented Jan 20, 2021

brian-brazil commented Jan 20, 2021

siju-vasudevan commented Jan 21, 2021

brian-brazil commented Jan 21, 2021

siju-vasudevan commented Jan 21, 2021