Assistance Required: High IO Utilization in HLS4ML Autoencoder Mapping

4 views

Skip to first unread message

Sayanti Pal

unread,

Mar 25, 2025, 10:38:49 AMMar 25

to Vladimi...@cern.ch, hls4m...@gmail.com, maurizio...@cern.ch, adriana...@gmail.com

Dear HLS4ML Team,

I am Sayanti, a researcher from Universty of Rostock. Currently I am working on mapping an autoencoder model (~33,000 parameters) using HLS4ML on the xcu250-figd2104-2L-e FPGA but facing high IO utilization issues. With 3-bit quantization, IO utilization reaches 122%, and with 4-bit quantization, it rises to 200%. Even with a larger FPGA, the IO remains over 100%.

I have configured precision (ap_fixed<3,1> for 3bit quantization), ReuseFactor (32), and "Resource" strategy, and io_type as io_stream, layer-specific reuse factors, and unroll factor as 1. However, HLS4ML ignores some settings, leading to full loop unrolling. I attempted to manually modify the firmware (myproject.cpp) and VHDL files to optimize IO usage but these are tedious and error-prone.

Could you please advise on:

Any missing configurations to lower IO utilization or is this the expected behaviour?
Alternative ways to enforce settings without manual edits?
Additional optimizations from HLS4ML’s side that can help limit IO usage?

I appreciate your time and support. Looking forward to your guidance.

Best regards,
Sayanti Pal
University of Rostock
sayan...@uni-rostock.de

Sayanti Pal

unread,

Mar 25, 2025, 12:02:04 PMMar 25

to Vladimir Loncar, hls4m...@gmail.com, Maurizio Pierini, Adrian Alan Pol

Hi Vladimir,

Thank you for your suggestion regarding model pruning. We did try structured pruning, but unfortunately, it didn’t have a significant impact on our model, and the IO utilization remains high.

I am still trying to understand if this high IO utilization is expected behavior when using the HLS4ML tool, or if I might be missing any specific settings. I have applied the 'io_stream' setting for the layers, but from the logs, the behavior still seems more similar to parallel operations rather than streaming/pipelined communication.

Would you be able to confirm if this is the expected behavior or if there’s something I can adjust? I will also open an issue on the HLS4ML GitHub page as you suggested, providing the relevant code and configuration details.

Thank you for your continued support, and I look forward to your feedback.

Best regards,
Sayanti

From: Vladimir Loncar <vladimi...@cern.ch>
Sent: 25 March 2025 16:30:20
To: Sayanti Pal; hls4m...@gmail.com
Cc: Maurizio Pierini; Adrian Alan Pol
Subject: Re: Assistance Required: High IO Utilization in HLS4ML Autoencoder Mapping

________________________________
Achtung! Externe E-Mail: Klicken Sie erst dann auf Links und Anhaenge, nachdem Sie die Vertrauenswuerdigkeit der Absenderadresse geprueft haben.
________________________________

Hi Sayanti,

General advice is to also try pruning your model if possible in addition to quantization. This will save some IO, but won't be the only solution you need to employ.

I suggest you open an issue on hls4ml github page, following the template that will ask you to provide the code/config that you use as this will help us figure out how to best assist you.

Regards,
Vladimir

________________________________________
From: Sayanti Pal <sayan...@uni-rostock.de>
Sent: Tuesday, March 25, 2025 3:38 PM
To: Vladimir Loncar; hls4m...@gmail.com
Cc: Maurizio Pierini; Adrian Alan Pol
Subject: Assistance Required: High IO Utilization in HLS4ML Autoencoder Mapping

Dear HLS4ML Team,

I am Sayanti, a researcher from Universty of Rostock. Currently I am working on mapping an autoencoder model (~33,000 parameters) using HLS4ML on the xcu250-figd2104-2L-e FPGA but facing high IO utilization issues. With 3-bit quantization, IO utilization reaches 122%, and with 4-bit quantization, it rises to 200%. Even with a larger FPGA, the IO remains over 100%.
I have configured precision (ap_fixed<3,1> for 3bit quantization), ReuseFactor (32), and "Resource" strategy, and io_type as io_stream, layer-specific reuse factors, and unroll factor as 1. However, HLS4ML ignores some settings, leading to full loop unrolling. I attempted to manually modify the firmware (myproject.cpp) and VHDL files to optimize IO usage but these are tedious and error-prone.

Could you please advise on:

Any missing configurations to lower IO utilization or is this the expected behaviour?