Multi-core CPUs help a lot, as would more RAM if you were on a 64-bit capable machine. I don't suspect the proprietary driver will improve things any; sometimes they actually have worse 2D performance than the open-source driver. You might try tweaking the priority of the different processes using 'nice' to improve responsiveness or adjusting the 'vm.dirty_ratio' sysctls to prevent I/O stalls, but at the end of the day it comes down to finding out what your bottleneck is and trying to improve that either from the hardware or software perspective.
One more thing, if you haven't already... check 'ldd ./v4l2_ingest' to make sure that libjpeg-turbo is really what's being linked in at runtime. If you see 'libjpeg.so.62 => /path/to/libjpeg-turbo/lib/libjpeg.so.62' then you're fine. On the other hand, if you see 'libjpeg.so.62 => /usr/lib/libjpeg.so.62', you have a problem... see below.
On some systems I've compiled against libjpeg-turbo only to find that when the program loads, it's linked against the system's libjpeg.so.62 instead of libjpeg-turbo's. If that's what you're seeing, editing ld.so.conf or 'export LD_LIBRARY_PATH=/path/to/libjpeg-turbo/lib' in any shell you're running openreplay from will fix it. The LD_LIBRARY_PATH is the most sure-fire way to make sure you're getting the right one because that takes precedence over just about everything else. Once you've made the change, you can use 'ldd' to confirm you're getting the right libjpeg.