Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect smartctl_devices count with hardware HBAs #236

Open
aieri opened this issue Jul 30, 2024 · 4 comments
Open

incorrect smartctl_devices count with hardware HBAs #236

aieri opened this issue Jul 30, 2024 · 4 comments

Comments

@aieri
Copy link

aieri commented Jul 30, 2024

I have a server in which smartctl_exporter reports an incorrect number of devices:

root@server:~# curl -s localhost:10201/metrics | grep devices
# HELP smartctl_devices Number of devices configured or dynamically discovered
# TYPE smartctl_devices gauge
smartctl_devices 6

root@server:~# lsblk -o NAME,MODEL,SERIAL -d | grep -v loop
NAME    MODEL            SERIAL
sda     MTFDDAV480TDS-1A 324EC57C
sdb     SSDSC2KB240G8L   PHYF207300QZ240AGN
sdc     SSDSC2KB240G8L   PHYF20740177240AGN
nvme0n1 SSDPF2KX019T9L   PHAB123403KS1P9SGN

iirc the exporter uses smartctl --scan in the readSMARTctlDevices function taking to collect the list of devices. Indeed smartctl returns some duplicates:

root@server:~# smartctl --scan
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device
/dev/sdc -d scsi # /dev/sdc, SCSI device
/dev/bus/0 -d megaraid,0 # /dev/bus/0 [megaraid_disk_00], SCSI device
/dev/bus/0 -d megaraid,1 # /dev/bus/0 [megaraid_disk_01], SCSI device
/dev/nvme0 -d nvme # /dev/nvme0, NVMe device

root@server:~# smartctl -i /dev/sdb | grep -i serial
Serial Number:    PHYF207300QZ240AGN
root@server:~# smartctl -i /dev/bus/0 -d megaraid,0 | grep -i serial
Serial Number:    PHYF207300QZ240AGN
root@server:~# smartctl -i /dev/sdc | grep -i serial
Serial Number:    PHYF20740177240AGN
root@server:~# smartctl -i /dev/bus/0 -d megaraid,1 | grep -i serial
Serial Number:    PHYF20740177240AGN

The exporter should probably have some extra logic to deduplicate devices that can be accessed in multiple ways. This should be possible by using either the serial or the WWN as unique identifier, e.g.:

root@server:~# smartctl -i /dev/bus/0 -d megaraid,1 --json | jq -r '.serial_number, .wwn.id'
PHYF20740177240AGN
5724752636
root@server:~# smartctl -i /dev/sdc --json | jq -r '.serial_number, .wwn.id'
PHYF20740177240AGN
5724752636
@k0ste
Copy link
Contributor

k0ste commented Jul 30, 2024

I have a server in which smartctl_exporter reports an incorrect number of devices:

Add exclude regex --smartctl.device-exclude=^/dev/bus/[0-9]+$ for avoid scanning megaraid devices

@aieri
Copy link
Author

aieri commented Jul 30, 2024

thanks, this would indeed work given the specifics of this one server. I am however working on mass deployments of smartctl_exporter in which manual configuration is not feasible. While our automation could provide an autoconfiguration layer, it'd effectively duplicate what the exporter is already doing. I think solving this at the lowest layer would be preferable.

@k0ste
Copy link
Contributor

k0ste commented Jul 31, 2024

thanks, this would indeed work given the specifics of this one server. I am however working on mass deployments of smartctl_exporter in which manual configuration is not feasible. While our automation could provide an autoconfiguration layer, it'd effectively duplicate what the exporter is already doing. I think solving this at the lowest layer would be preferable.

Solutions, at the lowest level, by magic, unfortunately, are impossible. The administrator will still have to choose which polling protocol (sata or megaraid) to consider as a priority. I offered you an option that you can put in your IaC. This is not a solution for a specific server, this is a solution for the megaraid controller. I do not think that you will be able to find any other controllers in your device park, if you still can - share the regular expression

P.S.: see #205

@aieri
Copy link
Author

aieri commented Aug 1, 2024

I quite disagree that deduplicating via a unique key and applying some heuristic to choose the polling protocol is magic, and I also don't appreciate the bitterness. But sure, if this suggestion is unwelcome I'll figure something out in a higher layer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants