Hi Nathaniel,
You are right and I said wrongly that rays are sampled uniformly They actually are sampled cos-weighted on the file I sent.
I tried rtrace with the new params but it didn't increase the speed.
So here are the results for 3 scenarios ran in GeForce 1060:
1. rtrace cpu (6 cores) with 288 input sensors finished in ~30s
rtrace -I -ab 10 -ad 40000 -as 60 -aa 0 -lw 0.000025 model.oct < points.txt > results.txt
2. rtrace gpu with 288 input sensors finished in 154s
rtrace -I -ab 10 -ad 40000 -as 60 -aa 0 -lw 0.00000001 model.oct < points.txt > results.txt
3. rtrace gpu with 800000 rays (20 sensors x 40000 rays) finished in 108s
rtrace -ab 9 -ad 20000 -as 0 -aa 0 -lr -9 -lw 4e-4 model.oct < rays.txt > results.txt
I only send rays for 20 sensors on the 3rd scenario otherwise it gives outofmemory exception. So I think I have to send rays in few iterations. Nevertheless it is much slower compare to 2.
Any suggestion on this?
Also on a bigger model (~12k triangles) and ~18k sensors, running rtrace with params from the 1st scenario, the cpu(6 cores) finished in 386s compare to 3860s for gpu version; 10x slower.
Looking at your papers, I tend to conclude that you don't gain much from gpu with -aa 0 ? Or at least the results I get doesn't suggest that.
I also ran same tests on a graphic cards with RT cores(RTX 2080) and the time was then comparable to the cpu(again with 6 cores) version. So it seems non RT cards hits the performance a lot.