I have listened to the execellent talks of Andrew at LGM about nona-gpu (and enblend/enfuse) at:
http://river-valley.tv/nona-gpu-image-remapping-on-the-graphics-processor/
http://river-valley.tv/enblendenfuse-technology-talk/
and finally gave nona-gpu a quick try on my workstation.
Here is the time for remapping a single fisheye image using CPU rendering ( Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz ):
[angelo@vogon hugin_6aroundtilted_testcase_my]$ time nona -r ldr -m TIFF_m -o 01_MG_4453-01_MG_4458_Linux -i 0 /home/angelo/private/panorama/hugin-test/fisheye/hugin_6aroundtilted_testcase_my/01_MG_4453-01_MG_4458_Linux.pto
real 0m9.845s
user 0m25.421s
sys 0m0.129s
So it takes a little less than 10 seconds.
The same on GPU (NVIDIA Quadro FX 1700):
[angelo@vogon hugin_6aroundtilted_testcase_my]$ time nona -g -r ldr -m TIFF_m -o 01_MG_4453-01_MG_4458_Linux -i 0 /home/angelo/private/panorama/hugin-test/fisheye/hugin_6aroundtilted_testcase_my/01_MG_4453-01_MG_4458_Linux.pto
nona: using graphics card: NVIDIA Corporation Quadro FX 1700/PCI/SSE2
destStart=[0, 0]
destEnd=[7712, 3811]
destSize=[(7712, 3811)]
srcSize=[(2304, 3456)]
gpu total time = 1.28589
real 0m5.198s
user 0m4.155s
sys 0m0.864s
So the GPU rendering seems to be rather speedy (around 5 times faster than my already very fast CPU), as something like 3-4 seconds seem to be spend on reading and writing images from/to disk.
Here is where most time was spend by the GPU remapping routines:
[angelo@vogon hugin_6aroundtilted_testcase_my]$ cat ti.log | awk ' BEGIN {upload=0; readback=0; setup=0; render=0} /upload/ {upload+=$NF} /setup/ {setup+=$NF} /render/ {render+=$NF} /readback/ {readback+=$NF} END {printf("upload total: %f\nsetup total: %f\nrender total: %f\nreadback total: %f\n", upload, setup, render, readback)}'
upload total: 0.047785
setup total: 0.111390
render total: 0.901859
readback total: 0.117712
This contradicts the observations by Andrew, where uploading and especially reading back the images took most of the time. Looks like my NVidia Card is a lot better in that respect than the ATI card Andrew used for testing.
Looks like we will have to work on further improvements to the image reading / writing if we want to speed up things even more (when using a good graphics card).
ciao
Pablo
__________________________________________________________________________
Verschicken Sie SMS direkt vom Postfach aus - in alle deutschen und viele
ausländische Netze zum gleichen Preis!
https://produkte.web.de/webde_sms/sms
When watching these videos last week I had a thought: instead of
remapping the images on the GPU, writing output to files then
blending these files with enblend. Wouldn't it be possible to do
the image decomposition first? i.e:
1. deconstruct the image pyramid for each photo with the CPU
2. remap each pyramid 'layer' separately on the GPU (with a 33%
extra rendering cost)
3. While rendering, overlapping images would be spliced with a
'hard' seam - which would have the advantage that unused pixels
don't need to be rendered
3. merge the multiresolution output with the CPU, you then have a
remapped enblended panorama after reading and writing input/output
pixels exactly once.
An additional advantage is that the zenith and nadir 'whorl'
artefacts you get with enblend won't appear (probably).
A disadvantage is that is won't be possible to do seam optimisation,
but there are workarounds, e.g control points in the .pto project
define points that it would be safe to drive a seam through, we can
use these instead.
--
Bruno