This is a simple tool to extract zpool status and output Influx line protocol;
it is designed to be used with a telegraf exec plugin.
This parses the output of zpool list -H -p
and zpool status -s -p
and has
been developed against Ubuntu 20.04 with ZFS 0.8.3 and InfluxDB 1.x for
generating compatible line protocol.
This is sample zpool -H -p
output this tool expects to parse:
testpool111 20401094656 18467381760 1933712896 - - 72 90 1.00 SUSPENDED -
testpool222 20401094656 18467233280 1933861376 - - 68 90 1.00 DEGRADED -
testpool333 10200547328 139776 10200407552 - - 0 0 1.00 ONLINE -
and sample zpool status -s -p
:
pool: testpool111
state: SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: http://zfsonlinux.org/msg/ZFS-8000-HC
scan: scrub in progress since Fri Feb 5 20:00:59 2021
17.2G scanned at 297K/s, 0B issued at 0B/s, 17.2G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM SLOW
testpool111 UNAVAIL 1 0 0 0 insufficient replicas
/home/isaac/disk3.img UNAVAIL 0 0 0 0 corrupted data
/home/isaac/disk4.img ONLINE 0 0 0 0
errors: No known data errors
pool: testpool222
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: scrub repaired 0B in 0 days 00:00:27 with 0 errors on Fri Feb 5 19:54:00 2021
config:
NAME STATE READ WRITE CKSUM SLOW
testpool222 DEGRADED 0 0 0 0
mirror-0 ONLINE 0 0 0 0
/home/isaac/disk1.img ONLINE 0 0 0 0
/home/isaac/disk2.img ONLINE 0 0 0 0
mirror-1 DEGRADED 0 0 0 0
/home/isaac/disk5.img UNAVAIL 0 0 18 0 corrupted data
/home/isaac/disk6.img ONLINE 0 0 0 0
errors: No known data errors
pool: testpool333
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM SLOW
testpool333 ONLINE 0 0 0 0
mirror-0 ONLINE 0 0 0 0
/home/isaac/disk7.img ONLINE 0 0 0 0
/home/isaac/disk8.img ONLINE 0 0 0 0
errors: No known data errors
The compiled tool can be run interactively. It assumes by default that the
TextFSM template
is in your current directory, but that can be set with the --template
CLI
option.
./telegraf-exec-zpool-status
zpool,alternative_root=-,pool=testpool111 allocated=18467381760i,capacity=90i,checkpoint=0i,dedup=1,expand=0i,fragmentation=72i,free=1933712896i,health=6i,size=20401094656i 1612641748237439221
zpool,alternative_root=-,pool=testpool222 allocated=18467233280i,capacity=90i,checkpoint=0i,dedup=1,expand=0i,fragmentation=68i,free=1933861376i,health=2i,size=20401094656i 1612641748237439221
zpool,alternative_root=-,pool=testpool333 allocated=139776i,capacity=0i,checkpoint=0i,dedup=1,expand=0i,fragmentation=0i,free=10200407552i,health=0i,size=10200547328i 1612641748237439221
zpool_device,device=testpool111,pool=testpool111 checksum_errors=0i,health=5i,notes="insufficient replicas",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_device,device=/home/isaac/disk3.img,pool=testpool111 checksum_errors=0i,health=5i,notes="corrupted data",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_device,device=/home/isaac/disk4.img,pool=testpool111 checksum_errors=0i,health=0i,notes="",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_errors,pool=testpool111 errors="No known data errors",errors_found=0i 1612641748251082882
zpool_scrub,pool=testpool222 bytes_repaired=0u,errors_found=0i 1612641748251082882
zpool_device,device=testpool222,pool=testpool222 checksum_errors=0i,health=2i,notes="",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_device,device=mirror-0,pool=testpool222 checksum_errors=0i,health=0i,notes="",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_device,device=/home/isaac/disk1.img,pool=testpool222 checksum_errors=0i,health=0i,notes="",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_device,device=/home/isaac/disk2.img,pool=testpool222 checksum_errors=0i,health=0i,notes="",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_device,device=mirror-1,pool=testpool222 checksum_errors=0i,health=2i,notes="",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_device,device=/home/isaac/disk5.img,pool=testpool222 checksum_errors=18i,health=5i,notes="corrupted data",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_device,device=/home/isaac/disk6.img,pool=testpool222 checksum_errors=0i,health=0i,notes="",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_errors,pool=testpool222 errors="No known data errors",errors_found=0i 1612641748251082882
zpool_device,device=testpool333,pool=testpool333 checksum_errors=0i,health=0i,notes="",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_device,device=mirror-0,pool=testpool333 checksum_errors=0i,health=0i,notes="",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_device,device=/home/isaac/disk7.img,pool=testpool333 checksum_errors=0i,health=0i,notes="",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_device,device=/home/isaac/disk8.img,pool=testpool333 checksum_errors=0i,health=0i,notes="",read_errors=0i,slow_ios=0i,write_errors=0i 1612641748251082882
zpool_errors,pool=testpool333 errors="No known data errors",errors_found=0i 1612641748251082882
This is a sample telegraf exec input that assumes the binary has been installed
to /usr/local/bin/telegraf-exec-zpool-status
and the TextFSM template to
/etc/telegraf/zpool_status_template.txt
:
[[inputs.exec]]
commands = ["/usr/local/bin/telegraf-exec-zpool-status --template=/etc/telegraf/zpool_status_template.txt"]
timeout = "5s"
data_format = "influx"
Then in InfluxDB:
> show field keys from zpool
name: zpool
fieldKey fieldType
-------- ---------
allocated integer
capacity integer
checkpoint integer
dedup float
expand integer
fragmentation integer
free integer
health integer
size integer
> show tag keys from zpool
name: zpool
tagKey
------
alternative_root
host
pool
> show field keys from zpool_device
name: zpool_device
fieldKey fieldType
-------- ---------
checksum_errors integer
health integer
notes string
read_errors integer
slow_ios integer
write_errors integer
> show tag keys from zpool_device
name: zpool_device
tagKey
------
device
host
pool
> show field keys from zpool_scrub
name: zpool_scrub
fieldKey fieldType
-------- ---------
bytes_repaired integer
errors_found integer
> show tag keys from zpool_scrub
name: zpool_scrub
tagKey
------
host
pool
> show field keys from zpool_errors
name: zpool_errors
fieldKey fieldType
-------- ---------
errors string
errors_found integer
> show tag keys from zpool_errors
name: zpool_errors
tagKey
------
host
pool
In order to facilitate graphing I express the health as an integer. Based on the man page I identified the following states to map:
State | Integer |
---|---|
ONLINE | 0 |
OFFLINE | 1 |
DEGRADED | 2 |
FAULTED | 3 |
REMOVED | 4 |
UNAVAIL | 5 |
SUSPENDED | 6 |
The default value if a match isn't found is 99.
Once influxdata/telegraf#6724 is merged the
zpool -H -p
functionality would be redundant and a native telegraf plugin
could be used. However the zpool status -s -p
functionality is outstanding and
would be a useful addition.
Tests should be added especially considering the sensitivity of parsing.