On nona-gpu performance.

249 views
Skip to first unread message

Pablo dAngelo

unread,
May 20, 2009, 7:41:58 AM5/20/09
to hugi...@googlegroups.com
Hi all,

I have listened to the execellent talks of Andrew at LGM about nona-gpu (and enblend/enfuse) at:
http://river-valley.tv/nona-gpu-image-remapping-on-the-graphics-processor/
http://river-valley.tv/enblendenfuse-technology-talk/

and finally gave nona-gpu a quick try on my workstation.

Here is the time for remapping a single fisheye image using CPU rendering ( Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz ):

[angelo@vogon hugin_6aroundtilted_testcase_my]$ time nona -r ldr -m TIFF_m -o 01_MG_4453-01_MG_4458_Linux -i 0 /home/angelo/private/panorama/hugin-test/fisheye/hugin_6aroundtilted_testcase_my/01_MG_4453-01_MG_4458_Linux.pto

real 0m9.845s
user 0m25.421s
sys 0m0.129s

So it takes a little less than 10 seconds.

The same on GPU (NVIDIA Quadro FX 1700):

[angelo@vogon hugin_6aroundtilted_testcase_my]$ time nona -g -r ldr -m TIFF_m -o 01_MG_4453-01_MG_4458_Linux -i 0 /home/angelo/private/panorama/hugin-test/fisheye/hugin_6aroundtilted_testcase_my/01_MG_4453-01_MG_4458_Linux.pto
nona: using graphics card: NVIDIA Corporation Quadro FX 1700/PCI/SSE2
destStart=[0, 0]
destEnd=[7712, 3811]
destSize=[(7712, 3811)]
srcSize=[(2304, 3456)]

gpu total time = 1.28589

real 0m5.198s
user 0m4.155s
sys 0m0.864s

So the GPU rendering seems to be rather speedy (around 5 times faster than my already very fast CPU), as something like 3-4 seconds seem to be spend on reading and writing images from/to disk.

Here is where most time was spend by the GPU remapping routines:

[angelo@vogon hugin_6aroundtilted_testcase_my]$ cat ti.log | awk ' BEGIN {upload=0; readback=0; setup=0; render=0} /upload/ {upload+=$NF} /setup/ {setup+=$NF} /render/ {render+=$NF} /readback/ {readback+=$NF} END {printf("upload total: %f\nsetup total: %f\nrender total: %f\nreadback total: %f\n", upload, setup, render, readback)}'
upload total: 0.047785
setup total: 0.111390
render total: 0.901859
readback total: 0.117712

This contradicts the observations by Andrew, where uploading and especially reading back the images took most of the time. Looks like my NVidia Card is a lot better in that respect than the ATI card Andrew used for testing.

Looks like we will have to work on further improvements to the image reading / writing if we want to speed up things even more (when using a good graphics card).

ciao
Pablo
__________________________________________________________________________
Verschicken Sie SMS direkt vom Postfach aus - in alle deutschen und viele
ausländische Netze zum gleichen Preis!
https://produkte.web.de/webde_sms/sms

Bruno Postle

unread,
May 20, 2009, 8:45:54 AM5/20/09
to Hugin ptx
On Wed 20-May-2009 at 13:41 +0200, Pablo d'Angelo wrote:
>
>I have listened to the execellent talks of Andrew at LGM about nona-gpu (and enblend/enfuse) at:
>http://river-valley.tv/nona-gpu-image-remapping-on-the-graphics-processor/
>http://river-valley.tv/enblendenfuse-technology-talk/

When watching these videos last week I had a thought: instead of
remapping the images on the GPU, writing output to files then
blending these files with enblend. Wouldn't it be possible to do
the image decomposition first? i.e:

1. deconstruct the image pyramid for each photo with the CPU

2. remap each pyramid 'layer' separately on the GPU (with a 33%
extra rendering cost)

3. While rendering, overlapping images would be spliced with a
'hard' seam - which would have the advantage that unused pixels
don't need to be rendered

3. merge the multiresolution output with the CPU, you then have a
remapped enblended panorama after reading and writing input/output
pixels exactly once.

An additional advantage is that the zenith and nadir 'whorl'
artefacts you get with enblend won't appear (probably).

A disadvantage is that is won't be possible to do seam optimisation,
but there are workarounds, e.g control points in the .pto project
define points that it would be safe to drive a seam through, we can
use these instead.

--
Bruno

T. Modes

unread,
May 22, 2009, 5:13:01 AM5/22/09
to hugin and other free panoramic software
I want also give nona-gpu a try. But the branch doesn't compile on
windows (MS VCEE 2008).

The follow errors come:

fatal error C1083: "sys/time.h": No such file or directory d:\src\nona-
gpu\src\hugin_base\vigra_ext\ImageTransformsGPU.cpp 34

error C4716:
'vigra_ext::ImageInterpolator<vigra::ConstBasicImageIterator<vigra::RGBValue<float,
0,1,2>,vigra::RGBValue<float,0,1,2> *
*>,vigra::RGBAccessor<vigra::RGBValue<float,0,1,2>
>,vigra_ext::interp_cubic>::emitGLSL': Must return value d:\src\nona-
gpu\src\hugin_base\vigra_ext\interpolators.h 461

The last one is reported over 100 times with different function, but
always in line 461 of interpolators.h

Thomas

Harry van der Wolf

unread,
May 22, 2009, 6:13:28 AM5/22/09
to hugi...@googlegroups.com
Hi Thomas,

You should place something like

#ifdef __unix__
#include <sys/time.h>
#include <sys/resource.h>
#endif

See src/hugin1/panoinc.h

Harry

2009/5/22 T. Modes <Thomas...@gmx.de>

T. Modes

unread,
May 22, 2009, 6:49:06 AM5/22/09
to hugin and other free panoramic software
Harry van der Wolf schrieb:
> Hi Thomas,
>
> You should place something like
>
> #ifdef __unix__
> #include <sys/time.h>
> #include <sys/resource.h>
> #endif
>
> See src/hugin1/panoinc.h
>
> Harry
>

Hi Harry,

I tried this also. But when I make sys/time.h condinational then the
compiler complains about missing timeval and gettimeofday function.

I googled a bit and found a timeval.h by Wu Yongwei, which declares
timeval und gettimeofday. But when I include this one, I get many
compilation errors in vigra/numerictraits.hxx.

Thomas
Reply all
Reply to author
Forward
0 new messages