I've optimized the output stage to be pipelined - avoiding the delay
of the cascaded adders. It's a bit faster and fatter. The primary
delay is an xeq mux cascade - but that can wait. As long as it is less
than the multiply, cordics, etc then it should be fine.
brucee