Ok, the issue is solve by reduce the parallel access of SRAM.
Now I'm hitting a different issue, I poke around a bit that into the IR info, that X490 X964 is relate to some loop unroll subroutine.
But our code is very simple, and the loop is very standard, I'm not sure what went wrong and why it complaint about the type in a loop unrolled IR
/gen/lenet/target/scala-2.12/classes ...
[error] /home/pb/cs217/***/gen/lenet/scala/x507_inr_Foreach_kernel.scala:15:56: type mismatch;
[error] found : emul.FixedPoint
[error] required: Double
[error] val x490 = x964_sub / scala.math.pow(2,FixedPoint(BigDecimal("4"),FixFormat(true,16,0)))
[error] ^
[error] one error found
[error] (Compile / compileIncremental) Compilation failed