Hi Vladimir,
Thank you for your suggestion regarding model pruning. We did try structured pruning, but unfortunately, it didn’t have a significant impact on our model, and the IO utilization remains high.
I am still trying to understand if this high IO utilization is expected behavior when using the HLS4ML tool, or if I might be missing any specific settings. I have applied the 'io_stream' setting for the layers, but from the logs, the behavior still seems more similar to parallel operations rather than streaming/pipelined communication.
Would you be able to confirm if this is the expected behavior or if there’s something I can adjust? I will also open an issue on the HLS4ML GitHub page as you suggested, providing the relevant code and configuration details.
Thank you for your continued support, and I look forward to your feedback.
Best regards,
Sayanti