Ignoring the FIFO for a moment, you want to read 128 bits of data in 16
bit chunks. That means you need to read the chunks serially. You can
convert the 128 bit data to serial 16 bit data either before the FIFO (a
16 bit one) or after the FIFO (a 128 bit FIFO). Your choice depending
on the requirements of your design. If you need to stream 128 bit data
into the FIFO on successive clock cycles, you need a 128 bit FIFO. If
the data comes in with sufficient time between samples that you can
serially read the eight 16 bit words into a 16 bit FIFO, then that will
potentially save bits in the FIFO.
Any of this make sense?
In many FPGAs the block RAMs (which are often used for FIFOs) can have
different data widths on the two data ports and automatically do the
muxing to let data be read out in a different width. So the FIFO and
the data mux can all be combined in one element.
--
Rick