Adding metrics for PCIe devices including their link information

33 views
Skip to first unread message

Naoki MATSUMOTO

unread,
May 29, 2025, 4:10:12 AMMay 29
to prometheus...@googlegroups.com, Yuichiro Ueno, ks...@preferred.jp
Dear Prometheus Developers,

I'm working on a feature to collect PCIe devices' link status.

# Goal
The link status of PCIe devices sometimes changes, like link or speed
downgrades, and devices disappear.
Such failure often happens for servers with many PCIe devices (a bunch
of NVMes or GPUs). I'd like to detect such failures with a PCIe device
collector.

# Proposal
## Approach: detect the current pcie device status from sysfs
Each device has a directory like
`/sys/devices/pci0000:00/0000:00:01.3/0000:09:00.0` . It contains
useful information files like:

- max_link_speed
- `8.0 GT/s PCIe`
- max_link_width
- `4`
- current_link_speed
- `8.0 GT/s PCIe`
- current_link_width
- `4`
- class
- `0x010802`
- vendor
- `0x144d`
- subsystem_vendor
- `0x144d`
- subsystem_device
- `0xa801`
- device
- `0xa809`

Also, the path to the folder indicates:
- segment (0000)
- parent bus (00:01.3)
- device bus (09:00.0)

This should be included in metrics to check pci bus speed degradation
hierarchically (i.e. check device bus speed and check pcie switch
speed).

## Current status:
I've implemented a PoC collector and exporter for procfs and node_exporter.
PR(procfs): https://github.com/prometheus/procfs/pull/728
PR(node_exporter): https://github.com/prometheus/node_exporter/pull/3339

This is an example of exported metrics.
```
# HELP node_pcidevice_info Non-numeric data from
/sys/bus/pci/devices/<location>, value is always 1.
# TYPE node_pcidevice_info gauge
node_pcidevice_info{bus="00",class_id="0x60000",device="00",device_id="0x1630",function="0",parent_bus="*",parent_device="*",parent_function="*",parent_segment="*",segment="0000",subsystem_device_id="0x5095",subsystem_vendor_id="0x17aa",vendor_id="0x1022"}
1
node_pcidevice_info{bus="01",class_id="0x10802",device="00",device_id="0x540a",function="0",parent_bus="00",parent_device="02",parent_function="1",parent_segment="0000",segment="0000",subsystem_device_id="0x5021",subsystem_vendor_id="0xc0a9",vendor_id="0xc0a9"}
1

# HELP node_pcidevice_max_link_speed Value of maximum link speed (GT/s)
# TYPE node_pcidevice_max_link_speed gauge
node_pcidevice_max_link_speed{bus="00",device="02",function="1",segment="0000"}
8
node_pcidevice_max_link_speed{bus="00",device="02",function="2",segment="0000"}
8

# HELP node_pcidevice_current_link_speed Value of current link speed (GT/s)
# TYPE node_pcidevice_current_link_speed gauge
node_pcidevice_current_link_speed{bus="00",device="02",function="1",segment="0000"}
8
node_pcidevice_current_link_speed{bus="00",device="02",function="2",segment="0000"}
2.5

# HELP node_pcidevice_max_link_width Value of maximum link width
(number of lanes)
# TYPE node_pcidevice_max_link_width gauge
node_pcidevice_max_link_width{bus="00",device="02",function="1",segment="0000"}
8
node_pcidevice_max_link_width{bus="00",device="02",function="2",segment="0000"}
1

# HELP node_pcidevice_current_link_width Value of current link width
(number of lanes)
# TYPE node_pcidevice_current_link_width gauge
node_pcidevice_current_link_width{bus="00",device="02",function="1",segment="0000"}
4
node_pcidevice_current_link_width{bus="00",device="02",function="2",segment="0000"}
1
```

I'm looking forward to any feedback or suggestions to make this better!

Thanks,
Naoki MATSUMOTO

Jain Johny

unread,
Sep 20, 2025, 1:57:55 AM (4 days ago) Sep 20
to prometheus...@googlegroups.com, Naoki MATSUMOTO
Hi all,
I have added some more metrics to this collector (I had a similar collector in my private node_exporter repo for some time now and deprecated it in favour of pcidevice_linux.go).

NumaNode        *int32 // /sys/bus/pci/devices/<Location>/numa_node
SriovDriversAutoprobe *bool   // /sys/bus/pci/devices/<Location>/sriov_drivers_autoprobe
SriovNumvfs           *uint32 // /sys/bus/pci/devices/<Location>/sriov_numvfs
SriovTotalvfs         *uint32 // /sys/bus/pci/devices/<Location>/sriov_totalvfs
SriovVfTotalMsix      *uint64 // /sys/bus/pci/devices/<Location>/sriov_vf_total_msix
D3coldAllowed *bool          // /sys/bus/pci/devices/<Location>/d3cold_allowed
PowerState    *PciPowerState // /sys/bus/pci/devices/<Location>/power_state

node_exporter PR: https://github.com/prometheus/node_exporter/pull/3425 (build failing because of the procfs dependency)

I have also added an option in the collector to resolve pci vendor/device/class ids to strings. This has been very useful for me (for both human and LLM use), but increases the size of the metric. This is fine for me as my polling interval for these types of metrics is very high.

In my private repo, I also have an option to filter out pci devices which are not important for monitoring (like PCH features, bridges/switches, on-board graphics etc). It is a substring match based blacklisting. I am thinking of a better approach (may be class based blacklist) and I can add that option as well to this PR. This will significantly reduce the overall metric volume from this collector.

Looking forward to feedback and suggestions.

Thanks
Jain Johny

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/prometheus-developers/CAJQdmQFvG5oF32kF07byZPzUNBr5o2gr2zGquiA3QkeNJNa4_g%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages