Hi Nils,
I'm attaching a slightly modified script to remove some apparently unused operations, but I don't think it will make much difference.
From your system characteristics screenshot, it looks like you have 6 free cores available on the machine. These could more easily be used to speed up the system (e.g. by running tracking of different color planes in separate threads for example).
While you can definitely modify your code to use the GPU, it would take a large rethinking and reimplementation of your entire approach to processing the data, as you just cannot directly write the same types of programs for GPUs as for CPUs (memory sharing and execution characteristics are completely different). Some algorithms such as FindContours also don't have an immediate GPU equivalent, and finally GPUs don't normally have access to writing directly on the file system, for example to log text files, etc.
I still find it surprising you are hitting such limitations, as we routinely record from 5, 6 or even 8 FLIR/pointgrey cameras at the same time on a single computer with no problem. Can you share more information about your acquisition parameters? Frame rate, resolution, dynamic range of images, color space, etc? This might sometimes have a much bigger impact on the data bandwidth than any of the specific algorithms.
Hope this helps.