http://www.engineering.com/ElectronicsDesign/ElectronicsDesignArticle...
uses OpenCV . This could be interesting for a lot of applications,
though I've already observed MANY times that multi-core solutions for a
LOT of common operations isn't as spectacular of an improvement as it
may seem.
here are some threaded 'things' I've toyed with:
FFT/DFT and inverse - 1 core per harmonic gives you the best benefit.
If you use less than 64 harmonics, you won't see a 'full benefit' from
the 64 cores. But it is pretty linear with respect to # of cores used,
up to the # of harmonics you're calculating.
quick sort: 4 threads gave me around 3 times the performance. Doubling
the thread count doesn't improve things much (maybe gets you 4 times).
64 threads wouldn't be much different from 8.
PI calculation: By adding threads on a quad-core, I gained an appx 30%
speed benefit (2+ hours instead of 3+ hours on calculating 1 million
digits of pi). At least 50% of the time was single-thread. In some
cases, adding threads made that part of the calculation take LONGER so I
put in a 'tuneable' to only use threads 'above a certain limit'.
it's also worth pointing out that the coprocessors run at 1Ghz, a little
more than 1/3 the speed of the host processor. Assuming that
instruction fetch/execute takes the SAME NUMBER OF CLOCK TICKS for the
host, you have to have an algorithm that's more than 3 times faster
using threads to make this worth doing. So quicksort and PI
calculations are now "out".
So far the only calculation I've heard that makes sense is an FFT or DFT
[which is common in video compression, as I recall]. So MPEG decoding
*could* use this to its benefit. But what else?
I'm thinking ROBOTICS, where you have 64 independent processes doing
things, and GAMES, where independent threads control non-player
characters, moving objects, and other environmental factors [not to
mention rendering]. For these things, 64 cores would be very useful.
But yeah, you HAVE to be able to break up the processing into
independent jobs or 'chunks' for this to work, and the breaking up
process can't take longer than the original calculation on single-core.
In any case, the geek factor of having 64 cores is pretty cool.
related, this:
http://www.adapteva.com/white-papers/using-a-scalable-parallel-2d-fft...
and this
http://www.kickstarter.com/projects/adapteva/parallella-a-supercomput...