Hi All,
I am currently working on a couple Ironic patches which once complete will enable basic hardware health monitoring through the Redfish driver. My main goal is to provide a lightweight, simple mechanism of ensuring that the hardware managed by Ironic is in a good operational condition and to assist the operators in quickly identifying any hardware that may need attention.
In the initial implementation, I aim to collect information from the overall server health field and include it in the node information available to operators. Changes to this field will also be logged in node history. Once these patches are complete, they will create the opportunity to incorporate similar functionality in metal3.
If time allows, I would appreciate it if we can have a brief discussion about the feature to judge the interest in this functionality among metal3 community members and gather any early feedback they may have.
Below gDoc has some more details, including CLI outputs from the PoC code (Ironic) as well as synthetic outputs that metal3 components could provide if we were to go ahead with this work
https://docs.google.com/document/d/1nbtFJHh6n6WaccpCAuBNiA-_redsdiGcip3N8P9aoSE/edit?usp=sharing
Please let me know your thoughts.
Best Regards,