It's a great project Shawn !
In terms of SIMD, only a few attempts have been made in the past, generally with mixed success.
Best case : a few % were won in the copy or comparison functions, though benefit was also dependent on the data type being compressed.
though I have no idea how to check that nor if it really involves SIMD.
LZ4 reference source code tries to be portable, and therefore avoids architecture-specific features.
The code is written for the C virtual machine, and tries to provide enough information to the compiler to allow efficient transformations.
As a consequence, the final assembler generated by C compiler may contain some SIMD instructions (generally in the memory copy section, as it's easiest).
That's the best possible outcome : end up with a code which is pure portable C, but is locally compiled into optimized SIMD instructions given the proper compiler and set of flags.
Regards