I am doing some Benchmarks with Haskell, in this case on 2D-Convolution using the pacakge in the repa library, not repa-algorithms.
On Single Core Applications, the library performs well but the more cores are being used, the slower the applications becomes until it is slower than a fairly naive C implementation. To be fair, the C - implementation has a custom implementation and adds borders to each image to avoid any if-conditionals and is parallelized with OpenMP.
Especially on computers with as many cores as 160, the efficiency is terrible, as the linked images show.
Are there any explanations for this behaviour?
The haskell program is almost solely the call to the library functions, so almost no overhead.