My machine that had Visual Studio 2005 seems to be nice and screwed
and needs a reinstall, so I have not tested these with MSVC yet.
Someone please tell me if these compile, and if they're any faster.
openjpeg-svn528-t1-optimize.patch
The main blob, nearly ready for merging. 25% speedup on Athlon 64. It
does nothing to change the T1 algorithm, it's purely code
optimization. It only really touches the decoder, the encoder is still
slow, performing the same optimizations on the encoder should gain a
similar speedup. It does:
* Unrolling of the 4-step inner loops (by far the most common case)
* Transposition of the data/flag arrays, optimizing cache usage in the
inner loop
* Loop unswitching
* Branch optimization hints for gcc
* Misc cleanup
openjpeg-1.3-t1-mqc-inline.patch
A trick borrowed from Jasper, inlining the MQC gains another 3% on
Athlon 64. Proof of concept, not ready for merging.
openjpeg-1.3-t1-opt5.patch
Really experimental, adding branch hints to the MQC gains %1.6 on
Athlon 64, but is slower on PIII. Don't merge. Although the two DWT
memcpy() patches in there really should be merged.
A total of a 30% speedup on Athlon 64, IIRC even faster on i386. I'm a
little concerned that all this unrolling and inlining might hurt some
embedded platforms, in which case it could be made optional with some
#ifdefs. If anyone sees a slowdown on MIPS/ARM/PPC etc, please tell
me.