Hi Martin,
Great to hear your fixed point CNN is working on the FPGA. As for the integer-power-two weights: Ristretto doesn't quantize the bias in this mode. See
base_ristretto_layer.cu, QuantizeWeights_gpu() for more details. However, you need to do things differently on the FPGA (or DSP), you should use dynamic fixed point for the bias. You can quantize the bias similar to how Ristretto quantizes activations to dynamic fixed point: Use enough integer bits such that there is no saturation during quantization. 8 bits should be enough for the bias for most CNNs. If you want to know how Ristretto computes the number of integer bits required to avoid saturation, you can take a look at quantization.cpp, line 164.
So anyways, you observe there is nearly no error for bias in Q16.16. Your challenge now is to keep the error small for smaller bit widths. For this, it is important to choose good fixed point formats.
You also asked about the average pooling layer. That layer is a bit more tricky than max pooling. Here you should also use dynamic fixed point. The math-part of the layer is pretty straight forward, you should be able to implement it on a DSP. And yes, the average pooling layer computes the average of a kernel window. So you'll require division operations.
Best,
Philipp