Hi,
We wrote a collector for
Amazon EFA which is a high-speed network interface similar to Infiniband.
This interface is used for tightly coupled applications in HPC (WRF, Ansys Fluent, Gromacs...) and distributed ML (think LLMs like BLOOM, OPT... or Diffusion based models like Stable diffusion). The metrics are used for optimization and troubleshooting of these computational workloads. The collector we wrote is based on the one used by Infiniband and involved changes on ProcFS as well as EFA metrics are exposed similarly.
Would the team be open for us to create a PR to add a new collector for this network interface?
Thanks,
Perif