nona-gpu - has anybody got it working?

264 views
Skip to first unread message

Yuval Levy

unread,
Aug 4, 2009, 11:51:24 PM8/4/09
to hugin-ptx
Hi all

I just tested the latest SVN on my workstation with the following GPU
(se diagnostic commands and output below) and it fails. Is this my GPU
or do we have a bug in the code? This is on Ubuntu 64bit, and I've even
tried with 9.10 (alpha2) because it has a slightly newer GLEW.

$ lspci | grep VGA

00:05.0 VGA compatible controller: nVidia Corporation C51PV [GeForce
6150] (rev a2)

$ glxinfo | grep OpenGL

OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: GeForce 6150/PCI/SSE2
OpenGL version string: 2.1.2 NVIDIA 185.18.14
OpenGL shading language version string: 1.20 NVIDIA via Cg compiler
OpenGL extensions:

$ glewinfo | grep GL_ARB
GL_ARB_color_buffer_float: OK
GL_ARB_depth_buffer_float: MISSING
GL_ARB_depth_texture: OK
GL_ARB_draw_buffers: OK
GL_ARB_draw_instanced: OK [MISSING]
GL_ARB_fragment_program: OK
GL_ARB_fragment_program_shadow: OK
GL_ARB_fragment_shader: OK
GL_ARB_framebuffer_object: MISSING [OK]
GL_ARB_framebuffer_sRGB: MISSING
GL_ARB_geometry_shader4: OK [MISSING]
GL_ARB_half_float_pixel: OK
GL_ARB_half_float_vertex: OK
GL_ARB_imaging: OK
GL_ARB_instanced_arrays: MISSING
GL_ARB_map_buffer_range: OK
GL_ARB_matrix_palette: MISSING
GL_ARB_multisample: OK
GL_ARB_multitexture: OK
GL_ARB_occlusion_query: OK
GL_ARB_pixel_buffer_object: OK
GL_ARB_point_parameters: OK
GL_ARB_point_sprite: OK
GL_ARB_shader_objects: OK
GL_ARB_shading_language_100: OK
GL_ARB_shadow: OK
GL_ARB_shadow_ambient: MISSING
GL_ARB_texture_border_clamp: OK
GL_ARB_texture_buffer_object: OK [MISSING]
GL_ARB_texture_compression: OK
GL_ARB_texture_compression_rgtc: MISSING
GL_ARB_texture_cube_map: OK
GL_ARB_texture_env_add: OK
GL_ARB_texture_env_combine: OK
GL_ARB_texture_env_crossbar: MISSING
GL_ARB_texture_env_dot3: OK
GL_ARB_texture_float: OK
GL_ARB_texture_mirrored_repeat: OK
GL_ARB_texture_non_power_of_two: OK
GL_ARB_texture_rectangle: OK
GL_ARB_texture_rg: MISSING
GL_ARB_transpose_matrix: OK
GL_ARB_vertex_array_object: OK
GL_ARB_vertex_blend: MISSING
GL_ARB_vertex_buffer_object: OK
GL_ARB_vertex_program: OK
GL_ARB_vertex_shader: OK
GL_ARB_window_pos: OK


and this is what happens when I run nona with -g

$ nona -g -o testgpu.tif _MG_8768-_MG_8873.pto
nona: using graphics card: NVIDIA Corporation GeForce 6150/PCI/SSE2
destStart=[1900, 2303]
destEnd=[2804, 5700]
destSize=[(904, 3397)]
srcSize=[(3480, 2314)]
srcBuffer=0x7f4fd4934010
srcAlphaBuffer=0x7f4fd35cd010
destBuffer=0x7f4fd406a010
destAlphaBuffer=0x7f4fd3d7c010
destGLInternalFormat=GL_RGBA8
destGLFormat=GL_RGB
destGLType=GL_UNSIGNED_BYTE
srcGLInternalFormat=GL_RGBA8
srcGLFormat=GL_RGB
srcGLType=GL_UNSIGNED_BYTE
srcAlphaGLType=GL_UNSIGNED_BYTE
destAlphaGLType=GL_UNSIGNED_BYTE
warparound=0
needsAtanWorkaround=0
maxTextureSize=4096
Source chunks:
[(0, 0) to (3480, 2314) = (3480x2314)]
Dest chunks:
[(0, 0) to (904, 1699) = (904x1699)]
[(0, 1699) to (904, 3397) = (904x1698)]
Total GPU memory used: 132001184
Interpolator chunks:
[(0, 0) to (4, 4) = (4x4)]
[(4, 0) to (8, 4) = (4x4)]
[(8, 0) to (12, 4) = (4x4)]
[(12, 0) to (16, 4) = (4x4)]
[(16, 0) to (20, 4) = (4x4)]
[(20, 0) to (24, 4) = (4x4)]
[(24, 0) to (28, 4) = (4x4)]
[(28, 0) to (32, 4) = (4x4)]
[(0, 4) to (4, 8) = (4x4)]
[(4, 4) to (8, 8) = (4x4)]
[(8, 4) to (12, 8) = (4x4)]
[(12, 4) to (16, 8) = (4x4)]
[(16, 4) to (20, 8) = (4x4)]
[(20, 4) to (24, 8) = (4x4)]
[(24, 4) to (28, 8) = (4x4)]
[(28, 4) to (32, 8) = (4x4)]
[(0, 8) to (4, 12) = (4x4)]
[(4, 8) to (8, 12) = (4x4)]
[(8, 8) to (12, 12) = (4x4)]
[(12, 8) to (16, 12) = (4x4)]
[(16, 8) to (20, 12) = (4x4)]
[(20, 8) to (24, 12) = (4x4)]
[(24, 8) to (28, 12) = (4x4)]
[(28, 8) to (32, 12) = (4x4)]
[(0, 12) to (4, 16) = (4x4)]
[(4, 12) to (8, 16) = (4x4)]
[(8, 12) to (12, 16) = (4x4)]
[(12, 12) to (16, 16) = (4x4)]
[(16, 12) to (20, 16) = (4x4)]
[(20, 12) to (24, 16) = (4x4)]
[(24, 12) to (28, 16) = (4x4)]
[(28, 12) to (32, 16) = (4x4)]
[(0, 16) to (4, 20) = (4x4)]
[(4, 16) to (8, 20) = (4x4)]
[(8, 16) to (12, 20) = (4x4)]
[(12, 16) to (16, 20) = (4x4)]
[(16, 16) to (20, 20) = (4x4)]
[(20, 16) to (24, 20) = (4x4)]
[(24, 16) to (28, 20) = (4x4)]
[(28, 16) to (32, 20) = (4x4)]
[(0, 20) to (4, 24) = (4x4)]
[(4, 20) to (8, 24) = (4x4)]
[(8, 20) to (12, 24) = (4x4)]
[(12, 20) to (16, 24) = (4x4)]
[(16, 20) to (20, 24) = (4x4)]
[(20, 20) to (24, 24) = (4x4)]
[(24, 20) to (28, 24) = (4x4)]
[(28, 20) to (32, 24) = (4x4)]
[(0, 24) to (4, 28) = (4x4)]
[(4, 24) to (8, 28) = (4x4)]
[(8, 24) to (12, 28) = (4x4)]
[(12, 24) to (16, 28) = (4x4)]
[(16, 24) to (20, 28) = (4x4)]
[(20, 24) to (24, 28) = (4x4)]
[(24, 24) to (28, 28) = (4x4)]
[(28, 24) to (32, 28) = (4x4)]
[(0, 28) to (4, 32) = (4x4)]
[(4, 28) to (8, 32) = (4x4)]
[(8, 28) to (12, 32) = (4x4)]
[(12, 28) to (16, 32) = (4x4)]
[(16, 28) to (20, 32) = (4x4)]
[(20, 28) to (24, 32) = (4x4)]
[(24, 28) to (28, 32) = (4x4)]
[(28, 28) to (32, 32) = (4x4)]
#version 110
#extension GL_ARB_texture_rectangle : enable
uniform sampler2DRect SrcTexture;
float sinh(const in float x) { return (exp(x) - exp(-x)) / 2.0; }
float cosh(const in float x) { return (exp(x) + exp(-x)) / 2.0; }
float atan2_xge0(const in float y, const in float x) {
return atan(y, x);
}
float atan2_safe(const in float y, const in float x) {
return atan(y, x);
}
float atan_safe(const in float yx) {
return atan(yx);
}
void main(void)
{
float discardA = 1.0;
float discardB = 0.0;
vec2 src = gl_TexCoord[0].st;
src -= vec2(9012.0000000000000000, 4506.0000000000000000);

// rotate_erect(27036.000000000000000, 7570.1936456957173505)
{
src.s += 7570.1936456957173505;
float w = (abs(src.s) > 27036.000000000000000) ? 1.0 : 0.0;
float n = (src.s < 0.0) ? 0.5 : -0.5;
src.s += w * -54072.000000000000000 * ceil(src.s /
54072.000000000000000 + n);
}

// sphere_tp_erect(8605.8260828649654286)
{
float phi = src.s / 8605.8260828649654286;
float theta = -src.t / 8605.8260828649654286 +
1.5707963267948965580;
if (theta < 0.0) {
theta = -theta;
phi += 3.1415926535897931160;
}
if (theta > 3.1415926535897931160) {
theta = 3.1415926535897931160 - (theta -
3.1415926535897931160);
phi += 3.1415926535897931160;
}
float s = sin(theta);
vec2 v = vec2(s * sin(phi), cos(theta));
float r = length(v);
theta = 8605.8260828649654286 * atan2_safe(r, s * cos(phi));
src = v * (theta / r);
}

// persp_sphere(8605.8260828649654286)
{
mat3 m = mat3(-0.021348969898465505746,
-0.99861950017682066250, -0.047992867708350636646,
0.99977208476946100024, -0.021324357795218604888,
-0.0010248318628368971190,
0.0000000000000000000, -0.048003808507433361197,
0.99884715265589141264);
float r = length(src);
float theta = r / 8605.8260828649654286;
float s = 0.0;
if (r != 0.0) s = sin(theta) / r;
vec3 v = vec3(s * src.s, s * src.t, cos(theta));
vec3 u = v * m;
r = length(u.st);
theta = 0.0;
if (r != 0.0) theta = 8605.8260828649654286 * atan2_safe(r,
u.p) / r;
src = theta * u.st;
}

// rect_sphere_tp(8605.8260828649654286)
{
float r = length(src);
float theta = r / 8605.8260828649654286;
float rho = 0.0;
if (theta >= 1.5707963267948965580) rho = 1.6e16;
else if (theta == 0.0) rho = 1.0;
else rho = tan(theta) / theta;
src *= rho;
}

// resize(0.99565793672523394964, 0.99565793672523394964)
src *= vec2(0.99565793672523394964, 0.99565793672523394964);

// radial(1.0043634782470169942, 0.0000000000000000000,
-0.0043634782470170496715, 0.0000000000000000000, 1157.0000000000000000,
8.7592802374617306782)
{
float r = length(src) / 1157.0000000000000000;
float scale = 1000.0;
if (r < 8.7592802374617306782) {
scale = ((0.0000000000000000000 * r +
-0.0043634782470170496715) * r + 0.0000000000000000000) * r +
1.0043634782470169942;
}
src *= scale;
}

src += vec2(1736.5000000000000000, 1156.5000000000000000);

src = src * discardA + vec2(-1000.0, -1000.0) * discardB;
gl_FragColor = vec4(src.s, 0.0, 0.0, src.t);
}
#version 110
#extension GL_ARB_texture_rectangle : enable
uniform sampler2DRect CoordTexture;
uniform sampler2DRect SrcTexture;
uniform sampler2DRect AccumTexture;
uniform vec2 SrcUL;
uniform vec2 SrcLR;
uniform vec2 KernelUL;
uniform vec2 KernelWH;
float w(const in float i, const in float f) {
float c = (i < 16.000000000000000000) ? 1.0 : -1.0;
float x = c * (15.000000000000000000 - i + f);
vec2 xpi = vec2(x, x / 16.000000000000000000) * 3.1415926535897931160;
vec2 xsin = sin(xpi);
vec2 result = vec2(1.0, 1.0);
if (xpi.x != 0.0) result.x = xsin.x / xpi.x;
if (xpi.y != 0.0) result.y = xsin.y / xpi.y;
return result.x * result.y;
}
void main(void)
{
vec2 src = texture2DRect(CoordTexture, gl_TexCoord[0].st).sq;
vec4 accum = texture2DRect(AccumTexture, gl_TexCoord[0].st);

src -= SrcUL;
vec2 t = floor(src) + -14.500000000000000000;
vec2 f = fract(src);
vec2 k = vec2(0.0, 0.0);

for (float ky = 0.0; ky < 4.0000000000000000000; ky += 1.0) {
k.t = ky + KernelUL.t;
float wy = w(k.t, f.t);
for (float kx = 0.0; kx < 4.0000000000000000000; kx += 1.0) {
k.s = kx + KernelUL.s;
float wx = w(k.s, f.s);
vec2 ix = t + k;
vec4 sp = texture2DRect(SrcTexture, ix);
float weight = wx * wy * sp.a;
accum += sp * weight;
}
}

gl_FragColor = accum;
}

#version 110
#extension GL_ARB_texture_rectangle : enable
uniform sampler2DRect NormTexture;
uniform sampler2DRect CoordTexture;
uniform sampler2DRect InvLutTexture;
uniform sampler2DRect DestLutTexture;
void main(void)
{
// Normalization
vec4 n = texture2DRect(NormTexture, gl_TexCoord[0].st);
vec4 p = vec4(0.0, 0.0, 0.0, 0.0);
if (n.a >= 0.2) p = n / n.a;

// Photometric
// invLutSize = 256.00000000000000000
// pixelMax = 255.00000000000000000
// destLutSize = 1024.0000000000000000
// destExposure = 3.4434159776346751091e-05
// srcExposure = 3.4435263503927971915e-05
// whiteBalanceRed = 1.0000000000000000000
// whiteBalanceBlue = 1.0000000000000000000
p.rgb = p.rgb * 255.00000000000000000;
vec2 invR = texture2DRect(InvLutTexture, vec2(p.r, 0.0)).sq;
vec2 invG = texture2DRect(InvLutTexture, vec2(p.g, 0.0)).sq;
vec2 invB = texture2DRect(InvLutTexture, vec2(p.b, 0.0)).sq;
vec3 invX = vec3(invR.x, invG.x, invB.x);
vec3 invY = vec3(invR.y, invG.y, invB.y);
vec3 invA = fract(p.rgb);
p.rgb = mix(invX, invY, invA);
// VigCorrMode=VIGCORR_RADIAL
float vig = 1.0;
{
vec2 vigCorrCenter = vec2(1736.5000000000000000,
1156.5000000000000000);
float radiusScale=0.00047914298856256606332;
float radialVigCorrCoeff[4] = float[4](1.0000000000000000000,
0.0000000000000000000, 0.0000000000000000000, 0.0000000000000000000);
vec2 src = texture2DRect(CoordTexture, gl_TexCoord[0].st).sq;
vec2 d = src - vigCorrCenter;
d *= radiusScale;
vig = radialVigCorrCoeff[0];
float r2 = dot(d, d);
float r = r2;
vig += radialVigCorrCoeff[1] * r;
r *= r2;
vig += radialVigCorrCoeff[2] * r;
r *= r2;
vig += radialVigCorrCoeff[3] * r;
}
vec3 exposure_whitebalance = vec3(0.99996794775271302669,
0.99996794775271302669, 0.99996794775271302669);
p.rgb = (p.rgb * exposure_whitebalance) / vig;
p.rgb = p.rgb * 1023.0000000000000000;
vec2 destR = texture2DRect(DestLutTexture, vec2(p.r, 0.0)).sq;
vec2 destG = texture2DRect(DestLutTexture, vec2(p.g, 0.0)).sq;
vec2 destB = texture2DRect(DestLutTexture, vec2(p.b, 0.0)).sq;
vec3 destX = vec3(destR.x, destG.x, destB.x);
vec3 destY = vec3(destR.y, destG.y, destB.y);
vec3 destA = fract(p.rgb);
p.rgb = mix(destX, destY, destA);

gl_FragColor = p;
}

nona: normalization/photometric shader program could not be compiled.
nona: GL info log:
0(35) : error C7551: OpenGL first class arrays require #version 120
0(35) : error C7553: OpenGL array assignments require #version 120


any hints / help?

Yuv

Zoran Zorkic

unread,
Aug 5, 2009, 7:40:33 AM8/5/09
to hugin and other free panoramic software


On Aug 5, 5:51 am, Yuval Levy <goo...@levy.ch> wrote:

> nona: normalization/photometric shader program could not be compiled.
> nona: GL info log:
> 0(35) : error C7551: OpenGL first class arrays require #version 120
> 0(35) : error C7553: OpenGL array assignments require #version 120
>
> any hints / help?
>
> Yuv

Funny, does the same thing as the windows version (http://
hugin.panotools.org/testing/hugin/nona.zip).

Using win Xp sp2 32bit, c2d 65...@3.4GHz, 4gb ram, nvidia 9800gt 512mb.
Gpu drivers: 178.24.


E:\projects\brascine\testN>nona -g -o gpu.tif "DSC_3314-DSC_3350-
mk4.pto"
nona: using graphics card: NVIDIA Corporation GeForce 9800 GT/PCI/SSE2
destStart=[0, 1500]
destEnd=[10000, 3500]
destSize=[(10000, 2000)]
srcSize=[(4288, 2848)]
srcBuffer=04100020
srcAlphaBuffer=00000000
destBuffer=06400020
destAlphaBuffer=09D40020
destGLInternalFormat=GL_RGBA8
destGLFormat=GL_RGB
destGLType=GL_UNSIGNED_BYTE
srcGLInternalFormat=GL_RGBA8
srcGLFormat=GL_RGB
srcGLType=GL_UNSIGNED_BYTE
srcAlphaGLType=GL_BYTE
destAlphaGLType=GL_UNSIGNED_BYTE
warparound=0
needsAtanWorkaround=0
maxTextureSize=8192
Source chunks:
[(0, 0) to (4288, 2848) = (4288x2848)]
Dest chunks:
[(0, 0) to (2000, 2000) = (2000x2000)]
[(2000, 0) to (4000, 2000) = (2000x2000)]
[(4000, 0) to (6000, 2000) = (2000x2000)]
[(6000, 0) to (8000, 2000) = (2000x2000)]
[(8000, 0) to (10000, 2000) = (2000x2000)]
Total GPU memory used: 261485568
Interpolator chunks:
[(0, 0) to (4, 4) = (4x4)]
#version 110
#extension GL_ARB_texture_rectangle : enable
uniform sampler2DRect SrcTexture;
float sinh(const in float x) { return (exp(x) - exp(-x)) / 2.0; }
float cosh(const in float x) { return (exp(x) + exp(-x)) / 2.0; }
float atan2_xge0(const in float y, const in float x) {
return atan(y, x);
}
float atan2_safe(const in float y, const in float x) {
return atan(y, x);
}
float atan_safe(const in float yx) {
return atan(yx);
}
void main(void)
{
float discardA = 1.0;
float discardB = 0.0;
vec2 src = gl_TexCoord[0].st;
src -= vec2(5000.0000000000000000, 2500.0000000000000000);

// rotate_erect(5000.0000000000000000, 4578.8743051789161000)
{
src.s += 4578.8743051789161000;
float w = (abs(src.s) > 5000.0000000000000000) ? 1.0 : 0.0;
float n = (src.s < 0.0) ? 0.5 : -0.5;
src.s += w * -10000.000000000000000 * ceil(src.s /
10000.000000000000000
+ n);
}

// sphere_tp_erect(1591.5494309189535000)
{
float phi = src.s / 1591.5494309189535000;
float theta = -src.t / 1591.5494309189535000 +
1.5707963267948966000;
if (theta < 0.0) {
theta = -theta;
phi += 3.1415926535897931000;
}
if (theta > 3.1415926535897931000) {
theta = 3.1415926535897931000 - (theta -
3.1415926535897931000);
phi += 3.1415926535897931000;
}
float s = sin(theta);
vec2 v = vec2(s * sin(phi), cos(theta));
float r = length(v);
theta = 1591.5494309189535000 * atan2_safe(r, s * cos(phi));
src = v * (theta / r);
}

// persp_sphere(1591.5494309189535000)
{
mat3 m = mat3(-0.027626354700433991000,
-0.99960028450937488000, 0.00600
46427656072869000,
0.99961831942295143000,
-0.027625856271273803000, 0.000165
94973068102922000,
0.00000000000000000000,
0.0060069354962137755000, 0.999981
95820021885000);
float r = length(src);
float theta = r / 1591.5494309189535000;
float s = 0.0;
if (r != 0.0) s = sin(theta) / r;
vec3 v = vec3(s * src.s, s * src.t, cos(theta));
vec3 u = v * m;
r = length(u.st);
theta = 0.0;
if (r != 0.0) theta = 1591.5494309189535000 * atan2_safe(r,
u.p) / r;
src = theta * u.st;
}

// rect_sphere_tp(1591.5494309189535000)
{
float r = length(src);
float theta = r / 1591.5494309189535000;
float rho = 0.0;
if (theta >= 1.5707963267948966000) rho = 1.6e16;
else if (theta == 0.0) rho = 1.0;
else rho = tan(theta) / theta;
src *= rho;
}

// resize(2.0336701045386603000, 2.0336701045386603000)
src *= vec2(2.0336701045386603000, 2.0336701045386603000);

// radial(1.0260016106491627000, 0.015265566534749600000,
-0.060843527862162
197000, 0.019576350678249799000, 1424.0000000000000000,
1000.0000000000000000)
{
float r = length(src) / 1424.0000000000000000;
float scale = 1000.0;
if (r < 1000.0000000000000000) {
scale = ((0.019576350678249799000 * r +
-0.060843527862162197000) *
r + 0.015265566534749600000) * r + 1.0260016106491627000;
}
src *= scale;
}

// vert(-9.8650613876177893000)
src.t += -9.8650613876177893000;

// horiz(-53.086845233190999000)
src.s += -53.086845233190999000;

src += vec2(2143.5000000000000000, 1423.5000000000000000);

src = src * discardA + vec2(-1000.0, -1000.0) * discardB;
gl_FragColor = vec4(src.s, 0.0, 0.0, src.t);
}
#version 110
#extension GL_ARB_texture_rectangle : enable
uniform sampler2DRect CoordTexture;
uniform sampler2DRect SrcTexture;
uniform sampler2DRect AccumTexture;
uniform vec2 SrcUL;
uniform vec2 SrcLR;
uniform vec2 KernelUL;
uniform vec2 KernelWH;
float w(const in float i, const in float f) {
float A = -0.75000000000000000000;
float c = abs(i - 1.0);
float m = (i > 1.0) ? -1.0 : 1.0;
float p = c + m * f;
if (i == 1.0 || i == 2.0) {
return (( A + 2.0 )*p - ( A + 3.0 ))*p*p + 1.0;
} else {
return (( A * p - 5.0 * A ) * p + 8.0 * A ) * p - 4.0 * A;
}
}
void main(void)
{
vec2 src = texture2DRect(CoordTexture, gl_TexCoord[0].st).sq;
vec4 accum = texture2DRect(AccumTexture, gl_TexCoord[0].st);

src -= SrcUL;
vec2 t = floor(src) + -0.50000000000000000000;
// destExposure = 0.00012499863837714539000
// srcExposure = 0.00012500000593717716000
// whiteBalanceRed = 1.0000000000000000000
// whiteBalanceBlue = 1.0000000000000000000
p.rgb = p.rgb * 255.00000000000000000;
vec2 invR = texture2DRect(InvLutTexture, vec2(p.r, 0.0)).sq;
vec2 invG = texture2DRect(InvLutTexture, vec2(p.g, 0.0)).sq;
vec2 invB = texture2DRect(InvLutTexture, vec2(p.b, 0.0)).sq;
vec3 invX = vec3(invR.x, invG.x, invB.x);
vec3 invY = vec3(invR.y, invG.y, invB.y);
vec3 invA = fract(p.rgb);
p.rgb = mix(invX, invY, invA);
// VigCorrMode=VIGCORR_RADIAL
float vig = 1.0;
{
vec2 vigCorrCenter = vec2(2143.5000000000000000,
1423.5000000000000000);

float radiusScale=0.00038852865479009093000;
float radialVigCorrCoeff[4] = float[4](1.0000000000000000000,
-0.1289764
9574525599000, 0.011528766859390600000, -0.046104517600347603000);
vec2 src = texture2DRect(CoordTexture, gl_TexCoord[0].st).sq;
vec2 d = src - vigCorrCenter;
d *= radiusScale;
vig = radialVigCorrCoeff[0];
float r2 = dot(d, d);
float r = r2;
vig += radialVigCorrCoeff[1] * r;
r *= r2;
vig += radialVigCorrCoeff[2] * r;
r *= r2;
vig += radialVigCorrCoeff[3] * r;
}
vec3 exposure_whitebalance = vec3(0.99998905952026551000,
0.9999890595202655
1000, 0.99998905952026551000);

Yuval Levy

unread,
Aug 5, 2009, 8:49:00 AM8/5/09
to hugi...@googlegroups.com
Zoran Zorkic wrote:
> Funny, does the same thing as the windows version (http://
> hugin.panotools.org/testing/hugin/nona.zip).

thanks for testing, Zoran.


> nvidia 9800gt 512mb.
> Gpu drivers: 178.24.

we'll need a lot for this. I was not sure if it was just a problem of my
specific video card or a general one.

I fixed this (SVN4169). We need to provide a nona gpu binary for testing
of that (and I guess of a few other revisions in the future) on as broad
a set of GPUs as possible.

My GPU now compiles this part of the code but fails later with:

$ nona -g -o test.tif _MG_8768-_MG_8873.pto
nona: using graphics card: NVIDIA Corporation GeForce 6150/PCI/SSE2

<removed much of the verbosity>


gpu shader program compile time = 0.026
nona: GL error: Framebuffer incomplete, incomplete attachment in:
/home/yuv/src/hugin/src/hugin_base/vigra_ext/ImageTransformsGPU.cpp:700

we need more tests from more video cards to determine if it is a bug in
the code or if it is the specifics of my video card or even of my system
/ driver.

Yuv

Zoran Zorkic

unread,
Aug 5, 2009, 10:34:02 AM8/5/09
to hugin and other free panoramic software
Well, for now you just need someone with an ATI card to test as only
nVidia and ATI have gpus that can deliver speedups.

I ran nona (hugin bulit yesterday) in Ubuntu under VmWare and got this
far:
---------------------------------
$ nona -g -o nebo.tif nebo-dark.pto
nona: using graphics card: Mesa Project Software Rasterizer
destStart=[28001, 0]
destEnd=[31625, 4612]
destSize=[(3624, 4612)]
srcSize=[(4288, 2848)]
srcBuffer=0x67d23008
srcAlphaBuffer=0
destBuffer=0x64d51008
destAlphaBuffer=0x63d60008
destGLInternalFormat=GL_RGBA8
destGLFormat=GL_RGB
destGLType=GL_UNSIGNED_BYTE
srcGLInternalFormat=GL_RGBA8
srcGLFormat=GL_RGB
srcGLType=GL_UNSIGNED_BYTE
srcAlphaGLType=GL_BYTE
destAlphaGLType=GL_UNSIGNED_BYTE
warparound=0
needsAtanWorkaround=0
maxTextureSize=2048
-------------------------------------

It didn't crash, but it also didn't do much. After 2h I paused the VM.
Will let it run over night, or try a smaller project :/

I would tested more, but, I only have nVidia gpus in all my computers/
laptops :)

Lukáš Jirkovský

unread,
Aug 5, 2009, 11:31:26 AM8/5/09
to hugi...@googlegroups.com
2009/8/5 Yuval Levy <goo...@levy.ch>:

Hi,
I'm getting exactly the same error. Graphics card is Asus nVidia 6200
LE and drivers are 190.16

Lukáš

Zoran Zorkic

unread,
Aug 5, 2009, 3:39:06 PM8/5/09
to hugin and other free panoramic software
Got a friend to test on an ATI card.
XP sp3 32bit. ATI Radeon 4850 512mb.
Driver details:

: Driver Packaging Version
8.541-080923a-069992C-ATI

Catalyst® Version
08.10

2D Driver Version
6.14.10.6869

Direct3D Version
6.14.10.0618

OpenGL Version
6.14.10.8086
---------------------------

0.047
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] source chunk=
[(0,
0) to (2
144, 1424) = (2144x1424)] src upload = 0.032
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] source chunk=
[(0,
0) to (2
144, 1424) = (2144x1424)] src render = 0.015
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] source chunk=
[(0,
0) to (2
144, 1424) = (2144x1424)] interpolation chunk=[(0, 0) to (4, 4) =
(4x4)]
setup =
0
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] source chunk=
[(0,
0) to (2
144, 1424) = (2144x1424)] interpolation chunk=[(0, 0) to (4, 4) =
(4x4)]
render
= 0.031
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] normalization
setup = 0
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] normalization
render = 0.0
16
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] dest rgb
disassembly setup
= 0
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] dest rgb
disassembly rende
r = 0
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] rgb readback =
0.547
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] dest alpha
disassembly set
up = 0.016
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] dest alpha
disassembly ren
der = 0
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] alpha readback =
0.203
gpu destruct time = 0
gpu total time = 1.656
destStart=[2246, 0]
destEnd=[3734, 1857]
destSize=[(1488, 1857)]
srcSize=[(2144, 1424)]
srcBuffer=087A0020
srcAlphaBuffer=00000000
destBuffer=09060020
destAlphaBuffer=0A310020
destGLInternalFormat=GL_RGBA8
destGLFormat=GL_RGB
destGLType=GL_UNSIGNED_BYTE
srcGLInternalFormat=GL_RGBA8
srcGLFormat=GL_RGB
srcGLType=GL_UNSIGNED_BYTE
srcAlphaGLType=GL_BYTE
destAlphaGLType=GL_UNSIGNED_BYTE
warparound=0
needsAtanWorkaround=1
maxTextureSize=8192
Source chunks:
[(0, 0) to (2144, 1424) = (2144x1424)]
Dest chunks:
[(0, 0) to (1488, 1857) = (1488x1857)]
Total GPU memory used: 142952896
Interpolator chunks:
[(0, 0) to (4, 4) = (4x4)]
#version 110
#extension GL_ARB_texture_rectangle : enable
uniform sampler2DRect SrcTexture;
float sinh(const in float x) { return (exp(x) - exp(-x)) / 2.0; }
float cosh(const in float x) { return (exp(x) + exp(-x)) / 2.0; }
float atan2_xge0(const in float y, const in float x) {
if (abs(y) > x) {
return sign(y) * (1.5707963267948966000 - atan(x, abs(y)));
} else {
return atan(y, x);
}
}
float atan2_safe(const in float y, const in float x) {
if (x >= 0.0) return atan2_xge0(y, x);
else return (sign(y) * 3.1415926535897931000) - atan2_xge0(y, -x);
}
float atan_safe(const in float yx) {
if (abs(yx) > 1.0) {
return sign(yx) * (1.5707963267948966000 - atan(1.0/abs(yx)));
} else {
return atan(yx);
}
}
void main(void)
{
float discardA = 1.0;
float discardB = 0.0;
vec2 src = gl_TexCoord[0].st;
src -= vec2(2350.0000000000000000, 928.50000000000000000);

// rotate_erect(5222.2222222222217000, -615.44724491426177000)
{
src.s += -615.44724491426177000;
float w = (abs(src.s) > 5222.2222222222217000) ? 1.0 : 0.0;
float n = (src.s < 0.0) ? 0.5 : -0.5;
src.s += w * -10444.444444444443000 * ceil(src.s /
10444.444444444443000
+ n);
}

// sphere_tp_erect(1662.2849611820179000)
{
float phi = src.s / 1662.2849611820179000;
float theta = -src.t / 1662.2849611820179000 + 1.5707963267948966000;
if (theta < 0.0) {
theta = -theta;
phi += 3.1415926535897931000;
}
if (theta > 3.1415926535897931000) {
theta = 3.1415926535897931000 - (theta - 3.1415926535897931000);
phi += 3.1415926535897931000;
}
float s = sin(theta);
vec2 v = vec2(s * sin(phi), cos(theta));
float r = length(v);
theta = 1662.2849611820179000 * atan2_safe(r, s * cos(phi));
src = v * (theta / r);
}

// persp_sphere(1662.2849611820179000)
{
mat3 m = mat3(-0.0056712720158038649000, -0.99996699222001262000,
0.0058
181736123949345000,
0.99998391820754939000, -0.0056711760223806616000, 3.29969
75841580861000e-005,
0.00000000000000000000, 0.0058182671805601547000, 0.999983
07374025863000);
float r = length(src);
float theta = r / 1662.2849611820179000;
float s = 0.0;
if (r != 0.0) s = sin(theta) / r;
vec3 v = vec3(s * src.s, s * src.t, cos(theta));
vec3 u = v * m;
r = length(u.st);
theta = 0.0;
if (r != 0.0) theta = 1662.2849611820179000 * atan2_safe(r, u.p) / r;
src = theta * u.st;
}

// rect_sphere_tp(1662.2849611820179000)
{
float r = length(src);
float theta = r / 1662.2849611820179000;
float rho = 0.0;
if (theta >= 1.5707963267948966000) rho = 1.6e16;
else if (theta == 0.0) rho = 1.0;
else rho = tan(theta) / theta;
src *= rho;
}

// resize(0.98018589452762850000, 0.98018589452762850000)
src *= vec2(0.98018589452762850000, 0.98018589452762850000);

// radial(1.0200684747885995000, 0.00000000000000000000,
-0.0200684747885996
02000, 0.00000000000000000000, 712.00000000000000000,
4.1162036363733581000)
{
float r = length(src) / 712.00000000000000000;
float scale = 1000.0;
if (r < 4.1162036363733581000) {
scale = ((0.00000000000000000000 * r + -0.020068474788599602000) * r
+ 0.00000000000000000000) * r + 1.0200684747885995000;
}
src *= scale;
}

// vert(-8.2353767291722306000)
src.t += -8.2353767291722306000;

// horiz(-39.822211692411202000)
src.s += -39.822211692411202000;

src += vec2(1071.5000000000000000, 711.50000000000000000);

src = src * discardA + vec2(-1000.0, -1000.0) * discardB;
gl_FragColor = vec4(src.s, 0.0, 0.0, src.t);
}
nona: GL info log:
Fragment shader was successfully compiled to run on hardware.

nona: GL info log:
Fragment shader(s) linked, no vertex shader(s) defined.
nona: GL info log:
Fragment shader was successfully compiled to run on hardware.

nona: GL info log:
Fragment shader(s) linked, no vertex shader(s) defined.


#version 110
#extension GL_ARB_texture_rectangle : enable
uniform sampler2DRect NormTexture;
uniform sampler2DRect CoordTexture;
uniform sampler2DRect InvLutTexture;
uniform sampler2DRect DestLutTexture;
void main(void)
{
// Normalization
vec4 n = texture2DRect(NormTexture, gl_TexCoord[0].st);
vec4 p = vec4(0.0, 0.0, 0.0, 0.0);
if (n.a >= 0.2) p = n / n.a;

// Photometric
// invLutSize = 256.00000000000000000
// pixelMax = 255.00000000000000000
// destLutSize = 1024.0000000000000000
// destExposure = 0.00012499863837714539000
// srcExposure = 0.00012434661782106230000
// whiteBalanceRed = 1.0000000000000000000
// whiteBalanceBlue = 1.0000000000000000000
p.rgb = p.rgb * 255.00000000000000000;
vec2 invR = texture2DRect(InvLutTexture, vec2(p.r, 0.0)).sq;
vec2 invG = texture2DRect(InvLutTexture, vec2(p.g, 0.0)).sq;
vec2 invB = texture2DRect(InvLutTexture, vec2(p.b, 0.0)).sq;
vec3 invX = vec3(invR.x, invG.x, invB.x);
vec3 invY = vec3(invR.y, invG.y, invB.y);
vec3 invA = fract(p.rgb);
p.rgb = mix(invX, invY, invA);
// VigCorrMode=VIGCORR_RADIAL
float vig = 1.0;
{
vec2 vigCorrCenter = vec2(1071.5000000000000000,
711.50000000000000000);

float radiusScale=0.00077705730958018185000;
float radialVigCorrCoeff[4] = float[4](1.0000000000000000000,
-0.1863289
0308207500000, 0.15276620308634001000, -0.12052132655603000000);
vec2 src = texture2DRect(CoordTexture, gl_TexCoord[0].st).sq;
vec2 d = src - vigCorrCenter;
d *= radiusScale;
vig = radialVigCorrCoeff[0];
float r2 = dot(d, d);
float r = r2;
vig += radialVigCorrCoeff[1] * r;
r *= r2;
vig += radialVigCorrCoeff[2] * r;
r *= r2;
vig += radialVigCorrCoeff[3] * r;
}
vec3 exposure_whitebalance = vec3(1.0052435729053875000,
1.00524357290538750
00, 1.0052435729053875000);
p.rgb = (p.rgb * exposure_whitebalance) / vig;
p.rgb = p.rgb * 1023.0000000000000000;
vec2 destR = texture2DRect(DestLutTexture, vec2(p.r, 0.0)).sq;
vec2 destG = texture2DRect(DestLutTexture, vec2(p.g, 0.0)).sq;
vec2 destB = texture2DRect(DestLutTexture, vec2(p.b, 0.0)).sq;
vec3 destX = vec3(destR.x, destG.x, destB.x);
vec3 destY = vec3(destR.y, destG.y, destB.y);
vec3 destA = fract(p.rgb);
p.rgb = mix(destX, destY, destA);

gl_FragColor = p;
}

nona: GL info log:
Fragment shader was successfully compiled to run on hardware.

nona: GL info log:
Fragment shader(s) linked, no vertex shader(s) defined.

Gerry Patterson

unread,
Aug 5, 2009, 10:21:55 PM8/5/09
to hugi...@googlegroups.com
On Tue, Aug 4, 2009 at 10:51 PM, Yuval Levy <goo...@levy.ch> wrote:

Hi all

I just tested the latest SVN on my workstation with the following GPU
(se diagnostic commands and output below) and it fails. Is this my GPU
or do we have a bug in the code? This is on Ubuntu 64bit, and I've even
tried with 9.10 (alpha2) because it has a slightly newer GLEW.
<--snip-->

nona: normalization/photometric shader program could not be compiled.
nona: GL info log:
0(35) : error C7551: OpenGL first class arrays require #version 120
0(35) : error C7553: OpenGL array assignments require #version 120


any hints / help?

Yuv

Hello

I haven't been able to get this to work either.  Below is the output from my test run:

01:00.0 VGA compatible controller: nVidia Corporation NV44A [GeForce 6200] (rev a1)

OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: GeForce 6200/AGP/SSE2
OpenGL version string: 2.1.2 NVIDIA 173.14.16

OpenGL shading language version string: 1.20 NVIDIA via Cg compiler
OpenGL extensions:
GL_ARB_color_buffer_float:                                     OK
GL_ARB_depth_texture:                                          OK
GL_ARB_draw_buffers:                                           OK

GL_ARB_fragment_program:                                       OK
GL_ARB_fragment_program_shadow:                                OK
GL_ARB_fragment_shader:                                        OK
GL_ARB_half_float_pixel:                                       OK
GL_ARB_imaging:                                                OK

GL_ARB_matrix_palette:                                         MISSING
GL_ARB_multisample:                                            OK
GL_ARB_multitexture:                                           OK
GL_ARB_occlusion_query:                                        OK
GL_ARB_pixel_buffer_object:                                    OK
GL_ARB_point_parameters:                                       OK
GL_ARB_point_sprite:                                           OK
GL_ARB_shader_objects:                                         OK
GL_ARB_shading_language_100:                                   OK
GL_ARB_shadow:                                                 OK
GL_ARB_shadow_ambient:                                         MISSING
GL_ARB_texture_border_clamp:                                   OK
GL_ARB_texture_compression:                                    OK

GL_ARB_texture_cube_map:                                       OK
GL_ARB_texture_env_add:                                        OK
GL_ARB_texture_env_combine:                                    OK
GL_ARB_texture_env_crossbar:                                   MISSING
GL_ARB_texture_env_dot3:                                       OK
GL_ARB_texture_float:                                          OK
GL_ARB_texture_mirrored_repeat:                                OK
GL_ARB_texture_non_power_of_two:                               OK
GL_ARB_texture_rectangle:                                      OK
GL_ARB_transpose_matrix:                                       OK

GL_ARB_vertex_blend:                                           MISSING
GL_ARB_vertex_buffer_object:                                   OK
GL_ARB_vertex_program:                                         OK
GL_ARB_vertex_shader:                                          OK
GL_ARB_window_pos:                                             OK

$ nona -g -o testgpu.tif mesquite_tree2.pto                                                                                                                    
nona: using graphics card: NVIDIA Corporation GeForce 6200/AGP/SSE2            
destStart=[30, 1541]                                                           
destEnd=[3070, 5439]                                                           
destSize=[(3040, 3898)]                                                        
srcSize=[(2592, 3456)]                                                         
srcBuffer=0xb2e35008                                                           
srcAlphaBuffer=0                                                               
destBuffer=0xb0c4d008                                                          
destAlphaBuffer=0xb00ff008                                                     
destGLInternalFormat=GL_RGBA8                                                  
destGLFormat=GL_RGB                                                            
destGLType=GL_UNSIGNED_BYTE                                                    
srcGLInternalFormat=GL_RGBA8                                                   
srcGLFormat=GL_RGB                                                             
srcGLType=GL_UNSIGNED_BYTE                                                     
srcAlphaGLType=GL_BYTE                                                         
destAlphaGLType=GL_UNSIGNED_BYTE                                               
warparound=0                                                                   
needsAtanWorkaround=0                                                          
maxTextureSize=4096                                                            
Source chunks:                                                                 
    [(0, 0) to (2592, 3456) = (2592x3456)]                                     
Dest chunks:                                                                   
    [(0, 0) to (1520, 1949) = (1520x1949)]                                     
    [(1520, 0) to (3040, 1949) = (1520x1949)]                                  
    [(0, 1949) to (1520, 3898) = (1520x1949)]                                  
    [(1520, 1949) to (3040, 3898) = (1520x1949)]                               
Total GPU memory used: 193054784                                               
Interpolator chunks:                                                           
    [(0, 0) to (4, 4) = (4x4)]                                                 
#version 110                                                                   
#extension GL_ARB_texture_rectangle : enable                                   
uniform sampler2DRect SrcTexture;                                              
float sinh(const in float x) { return (exp(x) - exp(-x)) / 2.0; }              
float cosh(const in float x) { return (exp(x) + exp(-x)) / 2.0; }              
float atan2_xge0(const in float y, const in float x) {                         
    return atan(y, x);                                                         
}                                                                              
float atan2_safe(const in float y, const in float x) {                         
    return atan(y, x);                                                         
}                                                                              
float atan_safe(const in float yx) {                                           
    return atan(yx);                                                           
}                                                                              
void main(void)                                                                
{                                                                              
    float discardA = 1.0;                                                      
    float discardB = 0.0;                                                      
    vec2 src = gl_TexCoord[0].st;                                              
    src -= vec2(1532.0000000000000000, 2719.5000000000000000);                 

    // erect_rect(3609.1658244419895709)
    src.t = 3609.1658244419895709 * atan2_xge0(src.t, length(vec2(3609.1658244419895709, src.s)));                                                             
    src.s = 3609.1658244419895709 * atan2_safe(src.s, 3609.1658244419895709);  

    // rotate_erect(11338.528839654303738, 341.46517850647347814)
    {                                                           
        src.s += 341.46517850647347814;                         
        float w = (abs(src.s) > 11338.528839654303738) ? 1.0 : 0.0;

        float n = (src.s < 0.0) ? 0.5 : -0.5;                     
        src.s += w * -22677.057679308607476 * ceil(src.s / 22677.057679308607476 + n);                                                                         
    }                                                                          

    // sphere_tp_erect(3609.1658244419895709)
    {                                       
        float phi = src.s / 3609.1658244419895709;
        float theta = -src.t / 3609.1658244419895709 + 1.5707963267948965580;

        if (theta < 0.0) {                                                  
            theta = -theta;                                                 
            phi += 3.1415926535897931160;                                   
        }                                                                   
        if (theta > 3.1415926535897931160) {                                
            theta = 3.1415926535897931160 - (theta - 3.1415926535897931160);
            phi += 3.1415926535897931160;                                   
        }                                                                   
        float s = sin(theta);                                               
        vec2 v = vec2(s * sin(phi), cos(theta));                            
        float r = length(v);                                                
        theta = 3609.1658244419895709 * atan2_safe(r, s * cos(phi));        
        src = v * (theta / r);                                              
    }                                                                       

    // persp_sphere(3609.1658244419895709)
    {                                    
        mat3 m = mat3(0.99995112409883735172, 0.0097154758706127688356, -0.0018327416837042632709,                                                             
                      -0.0098868303045842789029, 0.98262038678561747229, -0.18536295762587515212,                                                              
                      0.0000000000000000000, 0.18537201785029791545, 0.98266841558997353179);                                                                  
        float r = length(src);                                                 
        float theta = r / 3609.1658244419895709;                               
        float s = 0.0;                                                         
        if (r != 0.0) s = sin(theta) / r;                                      
        vec3 v = vec3(s * src.s, s * src.t, cos(theta));                       
        vec3 u = v * m;                                                        
        r = length(u.st);                                                      
        theta = 0.0;                                                           
        if (r != 0.0) theta = 3609.1658244419895709 * atan2_safe(r, u.p) / r;  
        src = theta * u.st;                                                    
    }                                                                          

    // rect_sphere_tp(3609.1658244419895709)
    {                                      
        float r = length(src);             
        float theta = r / 3609.1658244419895709;

        float rho = 0.0;                       
        if (theta >= 1.5707963267948965580) rho = 1.6e16;
        else if (theta == 0.0) rho = 1.0;               
        else rho = tan(theta) / theta;                  
        src *= rho;                                     
    }                                                   

    // resize(1.0295119402367802763, 1.0295119402367802763)
    src *= vec2(1.0295119402367802763, 1.0295119402367802763);

    // radial(1.0335018091364118753, -0.052253722266097701876, 0.047922340913515902583, -0.029170427783829999679, 1296.0000000000000000, 2.3984429359381937985)
    {                                                                          
        float r = length(src) / 1296.0000000000000000;                         
        float scale = 1000.0;                                                  
        if (r < 2.3984429359381937985) {                                       
            scale = ((-0.029170427783829999679 * r + 0.047922340913515902583) * r + -0.052253722266097701876) * r + 1.0335018091364118753;                     
        }                                                                      
        src *= scale;                                                          
    }                                                                          

    // vert(28.494643239323899309)
    src.t += 28.494643239323899309;

    // horiz(-337.67985974931099236)
    src.s += -337.67985974931099236;

    src += vec2(1295.5000000000000000, 1727.5000000000000000);


    src = src * discardA + vec2(-1000.0, -1000.0) * discardB;
    gl_FragColor = vec4(src.s, 0.0, 0.0, src.t);            
}                                                           
#version 110                                                
#extension GL_ARB_texture_rectangle : enable                
uniform sampler2DRect CoordTexture;                         
uniform sampler2DRect SrcTexture;                           
uniform sampler2DRect AccumTexture;                         
uniform vec2 SrcUL;                                         
uniform vec2 SrcLR;                                         
uniform vec2 KernelUL;                                      
uniform vec2 KernelWH;                                      
float w(const in float i, const in float f) {               
    float A = -0.75000000000000000000;                      
    float c = abs(i - 1.0);                                 
    float m = (i > 1.0) ? -1.0 : 1.0;                       
    float p = c + m * f;                                    
    if (i == 1.0 || i == 2.0) {                             
        return (( A + 2.0 )*p - ( A + 3.0 ))*p*p + 1.0;     
    } else {                                                
        return (( A * p - 5.0 * A ) * p + 8.0 * A ) * p - 4.0 * A;
    }                                                            
}                                                                
void main(void)                                                  
{                                                                
    vec2 src = texture2DRect(CoordTexture, gl_TexCoord[0].st).sq;
    vec4 accum = texture2DRect(AccumTexture, gl_TexCoord[0].st); 

    src -= SrcUL;
    vec2 t = floor(src) + -0.50000000000000000000;

    vec2 f = fract(src);                         
    vec2 k = vec2(0.0, 0.0);                     

    for (float ky = 0.0; ky < 4.0000000000000000000; ky += 1.0) {
        k.t = ky + KernelUL.t;                                  
        float wy = w(k.t, f.t);                                 
        for (float kx = 0.0; kx < 4.0000000000000000000; kx += 1.0) {
            k.s = kx + KernelUL.s;                                  
            float wx = w(k.s, f.s);                                 
            vec2 ix = t + k;                                        
            vec4 sp = texture2DRect(SrcTexture, ix);                
            float weight = wx * wy * sp.a;                          
            accum += sp * weight;                                   
        }                                                           
    }                                                               

    gl_FragColor = accum;
}                       

#version 120

#extension GL_ARB_texture_rectangle : enable
uniform sampler2DRect NormTexture;         
uniform sampler2DRect CoordTexture;        
uniform sampler2DRect InvLutTexture;       
uniform sampler2DRect DestLutTexture;      
void main(void)                            
{                                          
    // Normalization                       
    vec4 n = texture2DRect(NormTexture, gl_TexCoord[0].st);
    vec4 p = vec4(0.0, 0.0, 0.0, 0.0);                    
    if (n.a >= 0.2) p = n / n.a;                          

    // Photometric
    // invLutSize = 256.00000000000000000
    // pixelMax = 255.00000000000000000
    // destLutSize = 1024.0000000000000000
    // destExposure = 0.00083051854295874060971
    // srcExposure = 0.0010884354309159897500

    // whiteBalanceRed = 1.0000000000000000000
    // whiteBalanceBlue = 1.0000000000000000000
    p.rgb = p.rgb * 255.00000000000000000;
    vec2 invR = texture2DRect(InvLutTexture, vec2(p.r, 0.0)).sq;
    vec2 invG = texture2DRect(InvLutTexture, vec2(p.g, 0.0)).sq;
    vec2 invB = texture2DRect(InvLutTexture, vec2(p.b, 0.0)).sq;
    vec3 invX = vec3(invR.x, invG.x, invB.x);
    vec3 invY = vec3(invR.y, invG.y, invB.y);
    vec3 invA = fract(p.rgb);
    p.rgb = mix(invX, invY, invA);
    // VigCorrMode=VIGCORR_RADIAL
    float vig = 1.0;
    {
        vec2 vigCorrCenter = vec2(1295.5000000000000000, 1727.5000000000000000);
        float radiusScale=0.00046296296296296298063;
        float radialVigCorrCoeff[4] = float[4](1.0000000000000000000, -0.015456798166679400208, -0.65901557955445200232, 0.42564979175476702622);

        vec2 src = texture2DRect(CoordTexture, gl_TexCoord[0].st).sq;
        vec2 d = src - vigCorrCenter;
        d *= radiusScale;
        vig = radialVigCorrCoeff[0];
        float r2 = dot(d, d);
        float r = r2;
        vig += radialVigCorrCoeff[1] * r;
        r *= r2;
        vig += radialVigCorrCoeff[2] * r;
        r *= r2;
        vig += radialVigCorrCoeff[3] * r;
    }
    vec3 exposure_whitebalance = vec3(0.76303887154776361967, 0.76303887154776361967, 0.76303887154776361967);

    p.rgb = (p.rgb * exposure_whitebalance) / vig;
    p.rgb = p.rgb * 1023.0000000000000000;
    vec2 destR = texture2DRect(DestLutTexture, vec2(p.r, 0.0)).sq;
    vec2 destG = texture2DRect(DestLutTexture, vec2(p.g, 0.0)).sq;
    vec2 destB = texture2DRect(DestLutTexture, vec2(p.b, 0.0)).sq;
    vec3 destX = vec3(destR.x, destG.x, destB.x);
    vec3 destY = vec3(destR.y, destG.y, destB.y);
    vec3 destA = fract(p.rgb);
    p.rgb = mix(destX, destY, destA);

    gl_FragColor = p;
}

gpu shader program compile time = 0.08
nona: GL error: Framebuffer incomplete, incomplete attachment in: /home/gpatters/work_area/hugin-git/myrepo/src/hugin_base/vigra_ext/ImageTransformsGPU.cpp:700


I haven't quite figured out what this means yet.  This seems to be a different error than yours.  I am running kubuntu 9.04 i386 with Hugin SVN Rev 4169

- Gerry




Yuval Levy

unread,
Aug 5, 2009, 10:27:49 PM8/5/09
to hugi...@googlegroups.com
Gerry Patterson wrote:
> gpu shader program compile time = 0.08
> nona: GL error: Framebuffer incomplete, incomplete attachment in:
> /home/gpatters/work_area/hugin-git/myrepo/src/hugin_base/vigra_ext/ImageTransformsGPU.cpp:700
>
>
> I haven't quite figured out what this means yet. This seems to be a
> different error than yours. I am running kubuntu 9.04 i386 with Hugin SVN
> Rev 4169

actually this is exactly the same error as me after I thought I fixed
the bug in Rev 4169.

and with the same hardware, same Rev., though in Windows I get a
different error - most likely because of the driver that does not
support v1.20 of the shader language:

nona: normalization/photometric shader program could not be compiled.
nona: GL info log:

(1) : error C0201: unsupported version 120
(3) : warning C7508: extension GL_ARB_texture_rectangle not supported
(3) : warning C7506: OpenGL does not define the global type sampler2DRect
(10) : warning C7506: OpenGL does not define the global function
texture2DRect
(35) : error C0000: syntax error, unexpected '[', expecting '(' at token "["
(35) : error C0501: type name expected at token "["
(35) : error C1068: too much data in type constructor
(35) : error C1033: cast not allowed
(35) : error C1056: invalid initialization

Yuv

Yuval Levy

unread,
Aug 5, 2009, 10:32:16 PM8/5/09
to hugi...@googlegroups.com
Hi Zoran,


Zoran Zorkic wrote:
> Got a friend to test on an ATI card.

thanks for this.

<cut a long chunk of nona output>


> nona: GL info log:
> Fragment shader was successfully compiled to run on hardware.
>
> nona: GL info log:
> Fragment shader(s) linked, no vertex shader(s) defined.

I'm not sure what to make of this. Any image at the end of the process?
and I guess this is still with the first nona-gpu binary by Guido?

I've just built nona-gpu Rev. 4169 on Windows (yes, my tool chain in
Windows is up and running again, kind of)

It's available for download at <http://www.photopla.net/hugin/nona_4169.7z>

Looking forward for further test reports.

Yuv

Yuval Levy

unread,
Aug 5, 2009, 10:37:15 PM8/5/09
to hugi...@googlegroups.com
Zoran Zorkic wrote:
> Well, for now you just need someone with an ATI card to test as only
> nVidia and ATI have gpus that can deliver speedups.

if it was that easy...

there are so many variations out there of nVidia and ATI cards; and of
drivers; and there are plenty of factors that could influence success or
failure of GPU stitching with nona.

so we need to test broadly (look at my test - same hardware, two
different systems, two different errors) and we need to collect and
document results so that when we make this broadly available we can give
as precise as possible guidance as to what hardware/driver/system
combination will work.

Yuv

Zoran Zorkic

unread,
Aug 6, 2009, 1:12:20 AM8/6/09
to hugin and other free panoramic software


> > nona: GL info log:
> > Fragment shader was successfully compiled to run on hardware.
>
> > nona: GL info log:
> > Fragment shader(s) linked, no vertex shader(s) defined.
>
> I'm not sure what to make of this. Any image at the end of the process?
> and I guess this is still with the first nona-gpu binary by Guido?

Yup.

> I've just built nona-gpu Rev. 4169 on Windows (yes, my tool chain in
> Windows is up and running again, kind of)
>
> It's available for download at <http://www.photopla.net/hugin/nona_4169.7z>
> Looking forward for further test reports.

No problem. Glad I can help out.
4169 gives me this on my system:
----------------------------------------------------------
C:\nona-test>nona -g -o tst test-gui.pto
nona: using graphics card: NVIDIA Corporation GeForce 9800 GT/PCI/SSE2
destStart=[3394, 0]
destEnd=[4706, 1857]
destSize=[(1312, 1857)]
srcSize=[(2144, 1424)]
srcBuffer=06470020
srcAlphaBuffer=00000000
destBuffer=06D30020
destAlphaBuffer=07430020
destGLInternalFormat=GL_RGBA8
destGLFormat=GL_RGB
destGLType=GL_UNSIGNED_BYTE
srcGLInternalFormat=GL_RGBA8
srcGLFormat=GL_RGB
srcGLType=GL_UNSIGNED_BYTE
srcAlphaGLType=GL_BYTE
destAlphaGLType=GL_UNSIGNED_BYTE
warparound=0
needsAtanWorkaround=0
maxTextureSize=8192
Source chunks:
[(0, 0) to (2144, 1424) = (2144x1424)]
Dest chunks:
[(0, 0) to (1312, 1857) = (1312x1857)]
Total GPU memory used: 128572288
Interpolator chunks:
[(0, 0) to (4, 4) = (4x4)]
#version 110
#extension GL_ARB_texture_rectangle : enable
uniform sampler2DRect SrcTexture;
float sinh(const in float x) { return (exp(x) - exp(-x)) / 2.0; }
float cosh(const in float x) { return (exp(x) + exp(-x)) / 2.0; }
float atan2_xge0(const in float y, const in float x) {
return atan(y, x);
}
float atan2_safe(const in float y, const in float x) {
return atan(y, x);
}
float atan_safe(const in float yx) {
return atan(yx);
}
void main(void)
{
float discardA = 1.0;
float discardB = 0.0;
vec2 src = gl_TexCoord[0].st;
src -= vec2(2350.0000000000000000, 928.50000000000000000);

// rotate_erect(5222.2222222222217000, -1767.9272076346463000)
{
src.s += -1767.9272076346463000;
float w = (abs(src.s) > 5222.2222222222217000) ? 1.0 : 0.0;
float n = (src.s < 0.0) ? 0.5 : -0.5;
src.s += w * -10444.444444444443000 * ceil(src.s /
10444.444444444443000
+ n);
}

// sphere_tp_erect(1662.2849611820179000)
{
float phi = src.s / 1662.2849611820179000;
float theta = -src.t / 1662.2849611820179000 +
1.5707963267948966000;
if (theta < 0.0) {
theta = -theta;
phi += 3.1415926535897931000;
}
if (theta > 3.1415926535897931000) {
theta = 3.1415926535897931000 - (theta -
3.1415926535897931000);
phi += 3.1415926535897931000;
}
float s = sin(theta);
vec2 v = vec2(s * sin(phi), cos(theta));
float r = length(v);
theta = 1662.2849611820179000 * atan2_safe(r, s * cos(phi));
src = v * (theta / r);
}

// persp_sphere(1662.2849611820179000)
{
mat3 m = mat3(-0.0013991153339531414000,
-0.99995332803644821000, 0.0095
595096691022414000,
0.99999902123766216000,
-0.0013990514038320933000, 1.33748
69653932957000e-005,
0.00000000000000000000,
0.0095595190255994313000, 0.999954
30675406327000);
#version 120
#extension GL_ARB_texture_rectangle : enable
uniform sampler2DRect NormTexture;
uniform sampler2DRect CoordTexture;
uniform sampler2DRect InvLutTexture;
uniform sampler2DRect DestLutTexture;
void main(void)
{
// Normalization
vec4 n = texture2DRect(NormTexture, gl_TexCoord[0].st);
vec4 p = vec4(0.0, 0.0, 0.0, 0.0);
if (n.a >= 0.2) p = n / n.a;

// Photometric
// invLutSize = 256.00000000000000000
// pixelMax = 255.00000000000000000
// destLutSize = 1024.0000000000000000
// destExposure = 0.00012499863837714539000
// srcExposure = 0.00012500000593717716000
vec3 exposure_whitebalance = vec3(0.99998905952026551000,
0.9999890595202655
1000, 0.99998905952026551000);
p.rgb = (p.rgb * exposure_whitebalance) / vig;
p.rgb = p.rgb * 1023.0000000000000000;
vec2 destR = texture2DRect(DestLutTexture, vec2(p.r, 0.0)).sq;
vec2 destG = texture2DRect(DestLutTexture, vec2(p.g, 0.0)).sq;
vec2 destB = texture2DRect(DestLutTexture, vec2(p.b, 0.0)).sq;
vec3 destX = vec3(destR.x, destG.x, destB.x);
vec3 destY = vec3(destR.y, destG.y, destB.y);
vec3 destA = fract(p.rgb);
p.rgb = mix(destX, destY, destA);

gl_FragColor = p;
}

gpu shader program compile time = 0.172
gpu shader texture/framebuffer setup time = 0.031
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] coord image
render time =
0
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] source chunk=
[(0, 0) to (2
144, 1424) = (2144x1424)] src upload = 0.016
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] source chunk=
[(0, 0) to (2
144, 1424) = (2144x1424)] src render = 0.016
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] source chunk=
[(0, 0) to (2
144, 1424) = (2144x1424)] interpolation chunk=[(0, 0) to (4, 4) =
(4x4)] setup =
0
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] source chunk=
[(0, 0) to (2
144, 1424) = (2144x1424)] interpolation chunk=[(0, 0) to (4, 4) =
(4x4)] render
= 0.015
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] normalization
setup = 0
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] normalization
render = 0.0
16
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] dest rgb
disassembly setup
= 0
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] dest rgb
disassembly rende
r = 0
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] rgb readback =
0.015
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] dest alpha
disassembly set
up = 0
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] dest alpha
disassembly ren
der = 0
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] alpha readback =
0
gpu destruct time = 0.016
gpu total time = 0.297
destStart=[2246, 0]
destEnd=[3734, 1857]
destSize=[(1488, 1857)]
srcSize=[(2144, 1424)]
srcBuffer=06470020
srcAlphaBuffer=00000000
destBuffer=06D30020
destAlphaBuffer=07A90020
destGLInternalFormat=GL_RGBA8
destGLFormat=GL_RGB
destGLType=GL_UNSIGNED_BYTE
srcGLInternalFormat=GL_RGBA8
srcGLFormat=GL_RGB
srcGLType=GL_UNSIGNED_BYTE
srcAlphaGLType=GL_BYTE
destAlphaGLType=GL_UNSIGNED_BYTE
warparound=0
needsAtanWorkaround=0
maxTextureSize=8192
Source chunks:
[(0, 0) to (2144, 1424) = (2144x1424)]
Dest chunks:
[(0, 0) to (1488, 1857) = (1488x1857)]
Total GPU memory used: 142952896
Interpolator chunks:
[(0, 0) to (4, 4) = (4x4)]
#version 110
#extension GL_ARB_texture_rectangle : enable
uniform sampler2DRect SrcTexture;
float sinh(const in float x) { return (exp(x) - exp(-x)) / 2.0; }
float cosh(const in float x) { return (exp(x) + exp(-x)) / 2.0; }
float atan2_xge0(const in float y, const in float x) {
return atan(y, x);
}
float atan2_safe(const in float y, const in float x) {
return atan(y, x);
}
float atan_safe(const in float yx) {
#version 120
C:\nona-test>
----------------------

I'll try to get ATI results later today.

Ryan Sleevi

unread,
Aug 6, 2009, 2:00:50 AM8/6/09
to hugi...@googlegroups.com
Compiled my own x64 version (using the Glut For Win32 sources -
http://www.xmission.com/~nate/glut.html ), as I wanted to see if there was
going to be any headache on Windows. Some wonky macros, but nothing fatal.
In order to get CMake to find Glut, I had to touch my FindGLUT macro, but I
imagine that's because of the use of Glut for Win32.

I'll see what other video cards I can scrape up and see how they handle.

Is there some particular set of test data to use? Just threw some simple
stuff at it, nothing complex, but it all turned out fine.

Test System: Vista x64 (SP2)
Video Card: GeForce 8800 GTS (256 mb)
Video BIOS: 60.80.0D.00.01
Video Driver: 186.18
RAM: 8GB
Proc: C2D 6600 @ 2.40 GHz

SVN: 4169
Summary: No problems - Image was adjusted as expected

Log:

nona: using graphics card: NVIDIA Corporation GeForce 8800 GTS/PCI/SSE2
destStart=[17, 50]
destEnd=[3001, 2600]
destSize=[(2984, 2550)]
srcSize=[(4296, 2856)]
srcBuffer=0000000005DB0040
srcAlphaBuffer=0000000009DF0040
destBuffer=00000000080D0040
destAlphaBuffer=00000000096A0040


destGLInternalFormat=GL_RGBA8
destGLFormat=GL_RGB
destGLType=GL_UNSIGNED_BYTE
srcGLInternalFormat=GL_RGBA8
srcGLFormat=GL_RGB
srcGLType=GL_UNSIGNED_BYTE
srcAlphaGLType=GL_UNSIGNED_BYTE
destAlphaGLType=GL_UNSIGNED_BYTE
warparound=0
needsAtanWorkaround=0

maxTextureSize=8192
Source chunks:
[(0, 0) to (4296, 2856) = (4296x2856)]
Dest chunks:
[(0, 0) to (1492, 1275) = (1492x1275)]
[(1492, 0) to (2984, 1275) = (1492x1275)]
[(0, 1275) to (1492, 2550) = (1492x1275)]
[(1492, 1275) to (2984, 2550) = (1492x1275)]
Total GPU memory used: 181856208


Interpolator chunks:
[(0, 0) to (4, 4) = (4x4)]

#version 110
#extension GL_ARB_texture_rectangle : enable
uniform sampler2DRect SrcTexture;
float sinh(const in float x) { return (exp(x) - exp(-x)) / 2.0; }
float cosh(const in float x) { return (exp(x) + exp(-x)) / 2.0; }
float atan2_xge0(const in float y, const in float x) {
return atan(y, x);
}
float atan2_safe(const in float y, const in float x) {
return atan(y, x);
}
float atan_safe(const in float yx) {
return atan(yx);
}
void main(void)
{
float discardA = 1.0;
float discardB = 0.0;
vec2 src = gl_TexCoord[0].st;

src -= vec2(1500.0000000000000000, 1300.0000000000000000);

// rotate_erect(18000.000000000000000, -13.341048242108400000)
{
src.s += -13.341048242108400000;
float w = (abs(src.s) > 18000.000000000000000) ? 1.0 : 0.0;


float n = (src.s < 0.0) ? 0.5 : -0.5;

src.s += w * -36000.000000000000000 * ceil(src.s /
36000.000000000000000 + n);
}

// sphere_tp_erect(5729.5779513082325000)
{
float phi = src.s / 5729.5779513082325000;
float theta = -src.t / 5729.5779513082325000 +
1.5707963267948966000;


if (theta < 0.0) {
theta = -theta;

phi += 3.1415926535897931000;
}
if (theta > 3.1415926535897931000) {
theta = 3.1415926535897931000 - (theta - 3.1415926535897931000);
phi += 3.1415926535897931000;
}

float s = sin(theta);
vec2 v = vec2(s * sin(phi), cos(theta));
float r = length(v);

theta = 5729.5779513082325000 * atan2_safe(r, s * cos(phi));


src = v * (theta / r);
}

// persp_sphere(5729.5779513082325000)
{
mat3 m = mat3(0.86726875127917369000, 0.49776314112882103000,
-0.0087617571429537341000,
-0.49784024852824299000, 0.86713442538204355000,
-0.015263527203445419000,
0.00000000000000000000, 0.017599535531440003000,
0.99984511618003991000);
float r = length(src);
float theta = r / 5729.5779513082325000;


float s = 0.0;
if (r != 0.0) s = sin(theta) / r;
vec3 v = vec3(s * src.s, s * src.t, cos(theta));
vec3 u = v * m;
r = length(u.st);
theta = 0.0;

if (r != 0.0) theta = 5729.5779513082325000 * atan2_safe(r, u.p) /


r;
src = theta * u.st;
}

// rect_sphere_tp(5729.5779513082325000)
{
float r = length(src);
float theta = r / 5729.5779513082325000;
float rho = 0.0;
if (theta >= 1.5707963267948966000) rho = 1.6e16;


else if (theta == 0.0) rho = 1.0;
else rho = tan(theta) / theta;
src *= rho;
}

// resize(1.6728386297652815000, 1.6728386297652815000)
src *= vec2(1.6728386297652815000, 1.6728386297652815000);

src += vec2(2144.5000000000000000, 1427.5000000000000000);

src = src * discardA + vec2(-1000.0, -1000.0) * discardB;
gl_FragColor = vec4(src.s, 0.0, 0.0, src.t);
}
#version 110
#extension GL_ARB_texture_rectangle : enable
uniform sampler2DRect CoordTexture;
uniform sampler2DRect SrcTexture;
uniform sampler2DRect AccumTexture;
uniform vec2 SrcUL;
uniform vec2 SrcLR;
uniform vec2 KernelUL;
uniform vec2 KernelWH;
float w(const in float i, const in float f) {

float A = -0.75000000000000000000;
float c = abs(i - 1.0);
float m = (i > 1.0) ? -1.0 : 1.0;
float p = c + m * f;
if (i == 1.0 || i == 2.0) {
return (( A + 2.0 )*p - ( A + 3.0 ))*p*p + 1.0;
} else {
return (( A * p - 5.0 * A ) * p + 8.0 * A ) * p - 4.0 * A;
}
}

void main(void)
{
vec2 src = texture2DRect(CoordTexture, gl_TexCoord[0].st).sq;
vec4 accum = texture2DRect(AccumTexture, gl_TexCoord[0].st);

src -= SrcUL;
vec2 t = floor(src) + -0.50000000000000000000;


vec2 f = fract(src);
vec2 k = vec2(0.0, 0.0);

for (float ky = 0.0; ky < 4.0000000000000000000; ky += 1.0) {
k.t = ky + KernelUL.t;
float wy = w(k.t, f.t);
for (float kx = 0.0; kx < 4.0000000000000000000; kx += 1.0) {
k.s = kx + KernelUL.s;
float wx = w(k.s, f.s);
vec2 ix = t + k;
vec4 sp = texture2DRect(SrcTexture, ix);
float weight = wx * wy * sp.a;
accum += sp * weight;
}
}

gl_FragColor = accum;
}

#version 120


#extension GL_ARB_texture_rectangle : enable
uniform sampler2DRect NormTexture;
uniform sampler2DRect CoordTexture;
uniform sampler2DRect InvLutTexture;
uniform sampler2DRect DestLutTexture;
void main(void)
{
// Normalization
vec4 n = texture2DRect(NormTexture, gl_TexCoord[0].st);
vec4 p = vec4(0.0, 0.0, 0.0, 0.0);
if (n.a >= 0.2) p = n / n.a;

// Photometric
// invLutSize = 256.00000000000000000
// pixelMax = 255.00000000000000000
// destLutSize = 1024.0000000000000000

// destExposure = 0.95663269200564105000
// srcExposure = 0.95663269262595030000


// whiteBalanceRed = 1.0000000000000000000
// whiteBalanceBlue = 1.0000000000000000000
p.rgb = p.rgb * 255.00000000000000000;
vec2 invR = texture2DRect(InvLutTexture, vec2(p.r, 0.0)).sq;
vec2 invG = texture2DRect(InvLutTexture, vec2(p.g, 0.0)).sq;
vec2 invB = texture2DRect(InvLutTexture, vec2(p.b, 0.0)).sq;
vec3 invX = vec3(invR.x, invG.x, invB.x);
vec3 invY = vec3(invR.y, invG.y, invB.y);
vec3 invA = fract(p.rgb);
p.rgb = mix(invX, invY, invA);
// VigCorrMode=VIGCORR_RADIAL
float vig = 1.0;
{

vec2 vigCorrCenter = vec2(2144.5000000000000000,
1427.5000000000000000);
float radiusScale=0.00038806915548755783000;


float radialVigCorrCoeff[4] = float[4](1.0000000000000000000,

0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000);


vec2 src = texture2DRect(CoordTexture, gl_TexCoord[0].st).sq;
vec2 d = src - vigCorrCenter;
d *= radiusScale;
vig = radialVigCorrCoeff[0];
float r2 = dot(d, d);
float r = r2;
vig += radialVigCorrCoeff[1] * r;
r *= r2;
vig += radialVigCorrCoeff[2] * r;
r *= r2;
vig += radialVigCorrCoeff[3] * r;
}

vec3 exposure_whitebalance = vec3(0.99999999935157013000,
0.99999999935157013000, 0.99999999935157013000);


p.rgb = (p.rgb * exposure_whitebalance) / vig;
p.rgb = p.rgb * 1023.0000000000000000;
vec2 destR = texture2DRect(DestLutTexture, vec2(p.r, 0.0)).sq;
vec2 destG = texture2DRect(DestLutTexture, vec2(p.g, 0.0)).sq;
vec2 destB = texture2DRect(DestLutTexture, vec2(p.b, 0.0)).sq;
vec3 destX = vec3(destR.x, destG.x, destB.x);
vec3 destY = vec3(destR.y, destG.y, destB.y);
vec3 destA = fract(p.rgb);
p.rgb = mix(destX, destY, destA);

gl_FragColor = p;
}

gpu shader program compile time = 0.046
gpu shader texture/framebuffer setup time = 0
gpu dest chunk=[(0, 0) to (1492, 1275) = (1492x1275)] coord image render
time = 0.016
gpu dest chunk=[(0, 0) to (1492, 1275) = (1492x1275)] source chunk=[(0, 0)
to (4296, 2856) = (4296x2856)] src upload = 0.062
gpu dest chunk=[(0, 0) to (1492, 1275) = (1492x1275)] source chunk=[(0, 0)
to (4296, 2856) = (4296x2856)] src alpha upload = 0.016
gpu dest chunk=[(0, 0) to (1492, 1275) = (1492x1275)] source chunk=[(0, 0)
to (4296, 2856) = (4296x2856)] src+alpha render = 0.047
gpu dest chunk=[(0, 0) to (1492, 1275) = (1492x1275)] source chunk=[(0, 0)
to (4296, 2856) = (4296x2856)] interpolation chunk=[(0, 0) to (4, 4) =
(4x4)] setup = 0.015
gpu dest chunk=[(0, 0) to (1492, 1275) = (1492x1275)] source chunk=[(0, 0)
to (4296, 2856) = (4296x2856)] interpolation chunk=[(0, 0) to (4, 4) =
(4x4)] render = 0
gpu dest chunk=[(0, 0) to (1492, 1275) = (1492x1275)] normalization setup =
0
gpu dest chunk=[(0, 0) to (1492, 1275) = (1492x1275)] normalization render =
0.016
gpu dest chunk=[(0, 0) to (1492, 1275) = (1492x1275)] dest rgb disassembly
setup = 0
gpu dest chunk=[(0, 0) to (1492, 1275) = (1492x1275)] dest rgb disassembly
render = 0
gpu dest chunk=[(0, 0) to (1492, 1275) = (1492x1275)] rgb readback = 0.016
gpu dest chunk=[(0, 0) to (1492, 1275) = (1492x1275)] dest alpha disassembly
setup = 0
gpu dest chunk=[(0, 0) to (1492, 1275) = (1492x1275)] dest alpha disassembly
render = 0
gpu dest chunk=[(0, 0) to (1492, 1275) = (1492x1275)] alpha readback = 0
gpu dest chunk=[(1492, 0) to (2984, 1275) = (1492x1275)] coord image render
time = 0
gpu dest chunk=[(1492, 0) to (2984, 1275) = (1492x1275)] source chunk=[(0,
0) to (4296, 2856) = (4296x2856)] interpolation chunk=[(0, 0) to (4, 4) =
(4x4)] setup = 0
gpu dest chunk=[(1492, 0) to (2984, 1275) = (1492x1275)] source chunk=[(0,
0) to (4296, 2856) = (4296x2856)] interpolation chunk=[(0, 0) to (4, 4) =
(4x4)] render = 0.015
gpu dest chunk=[(1492, 0) to (2984, 1275) = (1492x1275)] normalization setup
= 0
gpu dest chunk=[(1492, 0) to (2984, 1275) = (1492x1275)] normalization
render = 0
gpu dest chunk=[(1492, 0) to (2984, 1275) = (1492x1275)] dest rgb
disassembly setup = 0
gpu dest chunk=[(1492, 0) to (2984, 1275) = (1492x1275)] dest rgb
disassembly render = 0
gpu dest chunk=[(1492, 0) to (2984, 1275) = (1492x1275)] rgb readback =
0.016
gpu dest chunk=[(1492, 0) to (2984, 1275) = (1492x1275)] dest alpha
disassembly setup = 0
gpu dest chunk=[(1492, 0) to (2984, 1275) = (1492x1275)] dest alpha
disassembly render = 0
gpu dest chunk=[(1492, 0) to (2984, 1275) = (1492x1275)] alpha readback = 0
gpu dest chunk=[(0, 1275) to (1492, 2550) = (1492x1275)] coord image render
time = 0
gpu dest chunk=[(0, 1275) to (1492, 2550) = (1492x1275)] source chunk=[(0,
0) to (4296, 2856) = (4296x2856)] interpolation chunk=[(0, 0) to (4, 4) =
(4x4)] setup = 0
gpu dest chunk=[(0, 1275) to (1492, 2550) = (1492x1275)] source chunk=[(0,
0) to (4296, 2856) = (4296x2856)] interpolation chunk=[(0, 0) to (4, 4) =
(4x4)] render = 0.015
gpu dest chunk=[(0, 1275) to (1492, 2550) = (1492x1275)] normalization setup
= 0
gpu dest chunk=[(0, 1275) to (1492, 2550) = (1492x1275)] normalization
render = 0
gpu dest chunk=[(0, 1275) to (1492, 2550) = (1492x1275)] dest rgb
disassembly setup = 0
gpu dest chunk=[(0, 1275) to (1492, 2550) = (1492x1275)] dest rgb
disassembly render = 0
gpu dest chunk=[(0, 1275) to (1492, 2550) = (1492x1275)] rgb readback =
0.016
gpu dest chunk=[(0, 1275) to (1492, 2550) = (1492x1275)] dest alpha
disassembly setup = 0
gpu dest chunk=[(0, 1275) to (1492, 2550) = (1492x1275)] dest alpha
disassembly render = 0
gpu dest chunk=[(0, 1275) to (1492, 2550) = (1492x1275)] alpha readback = 0
gpu dest chunk=[(1492, 1275) to (2984, 2550) = (1492x1275)] coord image
render time = 0
gpu dest chunk=[(1492, 1275) to (2984, 2550) = (1492x1275)] source
chunk=[(0, 0) to (4296, 2856) = (4296x2856)] interpolation chunk=[(0, 0) to
(4, 4) = (4x4)] setup = 0.016
gpu dest chunk=[(1492, 1275) to (2984, 2550) = (1492x1275)] source
chunk=[(0, 0) to (4296, 2856) = (4296x2856)] interpolation chunk=[(0, 0) to
(4, 4) = (4x4)] render = 0
gpu dest chunk=[(1492, 1275) to (2984, 2550) = (1492x1275)] normalization
setup = 0
gpu dest chunk=[(1492, 1275) to (2984, 2550) = (1492x1275)] normalization
render = 0.015
gpu dest chunk=[(1492, 1275) to (2984, 2550) = (1492x1275)] dest rgb
disassembly setup = 0
gpu dest chunk=[(1492, 1275) to (2984, 2550) = (1492x1275)] dest rgb
disassembly render = 0
gpu dest chunk=[(1492, 1275) to (2984, 2550) = (1492x1275)] rgb readback = 0
gpu dest chunk=[(1492, 1275) to (2984, 2550) = (1492x1275)] dest alpha
disassembly setup = 0.016
gpu dest chunk=[(1492, 1275) to (2984, 2550) = (1492x1275)] dest alpha
disassembly render = 0
gpu dest chunk=[(1492, 1275) to (2984, 2550) = (1492x1275)] alpha readback =
0
gpu destruct time = 0.047
gpu total time = 0.39

Guido Kohlmeyer

unread,
Aug 6, 2009, 6:36:11 AM8/6/09
to hugi...@googlegroups.com
Dear Ryan,

I commited an updated CMakeLists.txt file to find GLUT in the SDK. Here
I set only the root search path of GLUT.
In my current working environment I created a base directory "glut" in
the root of the SDK directory tree. The include file glut.h resides in
.\glut\include\GL\glut.h which is the default from the sources. The
library glut32.lib resides in .\glut\Release\glut32.lib which is the
location where the default FindGLUT from CMake 2.6.4 searches for the
library. Thats all on win32 platfrom to work. I used the same sources
from Nate Robins to build the static GLUT library (see also thread "Re:
The way to 2009.2").

Guido

Ryan Sleevi schrieb:

Yuval Levy

unread,
Aug 6, 2009, 8:05:15 AM8/6/09
to hugi...@googlegroups.com
Ryan Sleevi wrote:
> Test System: Vista x64 (SP2)
> Video Card: GeForce 8800 GTS (256 mb)
> Video BIOS: 60.80.0D.00.01
> Video Driver: 186.18
> RAM: 8GB
> Proc: C2D 6600 @ 2.40 GHz
>
> SVN: 4169
> Summary: No problems - Image was adjusted as expected

THANKS FOR THE GOOD NEWS, Ryan!

keep them coming.

Yuv

Yuval Levy

unread,
Aug 6, 2009, 8:08:00 AM8/6/09
to hugi...@googlegroups.com
Hi Zoran,

Zoran Zorkic wrote:
>> I'm not sure what to make of this. Any image at the end of the process?
>> and I guess this is still with the first nona-gpu binary by Guido?
>
> Yup.

yup = image at the end of the process? is it as expected.


> No problem. Glad I can help out.
> 4169 gives me this on my system:

I see no error message. was there a resulting image? does it look as
expected?


> I'll try to get ATI results later today.

thanks for your effort, Zoran.

Yuv

Yuval Levy

unread,
Aug 6, 2009, 8:10:53 AM8/6/09
to hugi...@googlegroups.com
Guido Kohlmeyer wrote:
> I commited an updated CMakeLists.txt file to find GLUT in the SDK.

Thank you, dear Guido and Ryan.

any update of <http://wiki.panotools.org/Hugin_SDK_(MSVC_2008)> needed?

Yuv

Guido Kohlmeyer

unread,
Aug 6, 2009, 8:24:04 AM8/6/09
to hugi...@googlegroups.com
Yuval Levy schrieb:
Yes of course, I have to add the description how to generate the library
from source package with MSVC 2008 EE. An updated SDK package is
mandatory too. I wanted to wait until some developers approve the
functionality of nona based on the static library. Due to the fact that
my graphic card is too old I cannot do such tests. Based on first
impressions in this thread it seams that the lib works or other way
around there are situations where it works thus it is no general broken
build of nona.
I will update the SDK in few days or hours ...

Guido

Zoran Zorkic

unread,
Aug 6, 2009, 11:35:25 AM8/6/09
to hugin and other free panoramic software


On Aug 6, 2:08 pm, Yuval Levy <goo...@levy.ch> wrote:
> Hi Zoran,
>
> Zoran Zorkic wrote:
> >> I'm not sure what to make of this. Any image at the end of the process?
> >> and I guess this is still with the first nona-gpu binary by Guido?
>
> > Yup.
>
> yup = image at the end of the process? is it as expected.

Ah, me posting before 2 cups of coffee :)
First nona-gpu binary on ATI gpu. It crashed, no image produced :)

> > No problem. Glad I can help out.
> > 4169 gives me this on my system:
>
> I see no error message. was there a resulting image? does it look as
> expected?

Crashed, no image produced.

> > I'll try to get ATI results later today.

Got results from ATI 4850 xp 32bit, nona 4169:
still crashes, no image produced.
---------------------------
G:\nona-test>nona -g -o test test-gui.pto
nona: using graphics card: ATI Technologies Inc. ATI Radeon HD 4800
Series
destStart=[3394, 0]
destEnd=[4706, 1857]
destSize=[(1312, 1857)]
srcSize=[(2144, 1424)]
srcBuffer=087B0020
srcAlphaBuffer=00000000
destBuffer=09070020
destAlphaBuffer=09770020
destGLInternalFormat=GL_RGBA8
destGLFormat=GL_RGB
destGLType=GL_UNSIGNED_BYTE
srcGLInternalFormat=GL_RGBA8
srcGLFormat=GL_RGB
srcGLType=GL_UNSIGNED_BYTE
srcAlphaGLType=GL_BYTE
destAlphaGLType=GL_UNSIGNED_BYTE
warparound=0
needsAtanWorkaround=1
maxTextureSize=8192
Source chunks:
[(0, 0) to (2144, 1424) = (2144x1424)]
Dest chunks:
[(0, 0) to (1312, 1857) = (1312x1857)]
Total GPU memory used: 128572288
Interpolator chunks:
[(0, 0) to (4, 4) = (4x4)]
#version 110
#extension GL_ARB_texture_rectangle : enable
uniform sampler2DRect SrcTexture;
float sinh(const in float x) { return (exp(x) - exp(-x)) / 2.0; }
float cosh(const in float x) { return (exp(x) + exp(-x)) / 2.0; }
float atan2_xge0(const in float y, const in float x) {
if (abs(y) > x) {
return sign(y) * (1.5707963267948966000 - atan(x, abs(y)));
} else {
return atan(y, x);
}
}
float atan2_safe(const in float y, const in float x) {
if (x >= 0.0) return atan2_xge0(y, x);
else return (sign(y) * 3.1415926535897931000) - atan2_xge0(y, -x);
}
float atan_safe(const in float yx) {
if (abs(yx) > 1.0) {
return sign(yx) * (1.5707963267948966000 - atan(1.0/abs(yx)));
} else {
nona: GL info log:
Fragment shader was successfully compiled to run on hardware.

nona: GL info log:
Fragment shader(s) linked, no vertex shader(s) defined.


nona: GL info log:
Fragment shader was successfully compiled to run on hardware.

nona: GL info log:
Fragment shader(s) linked, no vertex shader(s) defined.


nona: GL info log:
Fragment shader was successfully compiled to run on hardware.

nona: GL info log:
Fragment shader(s) linked, no vertex shader(s) defined.


gpu shader program compile time = 0.547
gpu shader texture/framebuffer setup time = 0.203
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] coord image
render
time =
0.078
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] source chunk=
[(0,
0) to (2
144, 1424) = (2144x1424)] src upload = 0.031
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] source chunk=
[(0,
0) to (2
144, 1424) = (2144x1424)] src render = 0.016
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] source chunk=
[(0,
0) to (2
144, 1424) = (2144x1424)] interpolation chunk=[(0, 0) to (4, 4) =
(4x4)]
setup =
0
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] source chunk=
[(0,
0) to (2
144, 1424) = (2144x1424)] interpolation chunk=[(0, 0) to (4, 4) =
(4x4)]
render
= 0.016
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] normalization
setup = 0.01
6
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] normalization
render = 0.0
15
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] dest rgb
disassembly setup
= 0
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] dest rgb
disassembly rende
r = 0
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] rgb readback =
0.563
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] dest alpha
disassembly set
up = 0.015
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] dest alpha
disassembly ren
der = 0
gpu dest chunk=[(0, 0) to (1312, 1857) = (1312x1857)] alpha readback =
0.219
gpu destruct time = 0
gpu total time = 1.734
destStart=[2246, 0]
destEnd=[3734, 1857]
destSize=[(1488, 1857)]
srcSize=[(2144, 1424)]
srcBuffer=087B0020
srcAlphaBuffer=00000000
destBuffer=09070020
destAlphaBuffer=0A320020
destGLInternalFormat=GL_RGBA8
destGLFormat=GL_RGB
destGLType=GL_UNSIGNED_BYTE
srcGLInternalFormat=GL_RGBA8
srcGLFormat=GL_RGB
srcGLType=GL_UNSIGNED_BYTE
srcAlphaGLType=GL_BYTE
destAlphaGLType=GL_UNSIGNED_BYTE
warparound=0
needsAtanWorkaround=1
maxTextureSize=8192
Source chunks:
[(0, 0) to (2144, 1424) = (2144x1424)]
Dest chunks:
[(0, 0) to (1488, 1857) = (1488x1857)]
Total GPU memory used: 142952896
Interpolator chunks:
[(0, 0) to (4, 4) = (4x4)]
#version 110
#extension GL_ARB_texture_rectangle : enable
uniform sampler2DRect SrcTexture;
float sinh(const in float x) { return (exp(x) - exp(-x)) / 2.0; }
float cosh(const in float x) { return (exp(x) + exp(-x)) / 2.0; }
float atan2_xge0(const in float y, const in float x) {
if (abs(y) > x) {
return sign(y) * (1.5707963267948966000 - atan(x, abs(y)));
} else {
return atan(y, x);
}
}
float atan2_safe(const in float y, const in float x) {
if (x >= 0.0) return atan2_xge0(y, x);
else return (sign(y) * 3.1415926535897931000) - atan2_xge0(y, -x);
}
float atan_safe(const in float yx) {
if (abs(yx) > 1.0) {
return sign(yx) * (1.5707963267948966000 - atan(1.0/abs(yx)));
} else {
nona: GL info log:
Fragment shader was successfully compiled to run on hardware.

nona: GL info log:
Fragment shader(s) linked, no vertex shader(s) defined.


nona: GL info log:
Fragment shader was successfully compiled to run on hardware.

nona: GL info log:
Fragment shader(s) linked, no vertex shader(s) defined.


nona: GL info log:
Fragment shader was successfully compiled to run on hardware.

nona: GL info log:
Fragment shader(s) linked, no vertex shader(s) defined.

G:\nona-test>

Gerry Patterson

unread,
Aug 6, 2009, 6:50:01 PM8/6/09
to hugi...@googlegroups.com

> --~--~---------~--~----~------------~-------

I messed around a bit last night and found that if I change the format
for the coordinate texture from luminance_alpha32f to say rgba32f, I
could get Nona to run a bit further. Perhaps that format is not
supported by the driver I am using?

I know very little about gpu programming, let alone gpgpu approaches
used here.

- Gerry

Yuval Levy

unread,
Aug 6, 2009, 7:58:19 PM8/6/09
to hugi...@googlegroups.com
Dear Guido,

Guido Kohlmeyer wrote:
>> any update of <http://wiki.panotools.org/Hugin_SDK_(MSVC_2008)> needed?
>>
> Yes of course, I have to add the description how to generate the library
> from source package with MSVC 2008 EE. An updated SDK package is
> mandatory too.

thank you.

there are a few other things I'd suggest doing for the next edition of
the SDK:

1. autopano-sift-C:
- replace autopano-sift-C with Tom newest build (that solved so many
memory leaks)
- *but* keep the old "generatekeys" in the same folder (Tom has now
deprecated but is still so useful to so many different workflows)

2. update to the latest Exiftool (this is a continuum - regressions are
very seldom and the continuous evolution of camera models and related
EXIF data must be tracked)

3. update exiv2 to 0.18.2

4. libpano:
- move pano13 up one folder to make more compatible with Tom's new CMake
build for pano13. The Cmake build for libpano is very new and still not
fully complete, but it is already useable and makes life *much* easier
- especially on Windows system. This will likely require a small change
in Hugin's CMakeLists.txt.
- use the latest SVN (with the fix for the locale mangling and with
Tom's latest CMake build

5. UnxUtils: for now leave them there but
http://gnuwin32.sourceforge.net/ is better maintained and officially
supports Vista too. At some point replacement should be tested and when
the test are positive GnuWin32 shall replace UnxUtils. There also seems
to be a 64bit relative https://sourceforge.net/projects/gnuwin64/ but I
have not looked into it.

6. Add enblend-enfuse. It is required to make the INSTALL target (and
later on the installer).

7. for each folder, it would be good to document if it is downloaded or
self-built, and the exact revision used for the build. I had to
investigate / guess some of these (e.g. autopano-sift-C does not print
the revision number in the help text).

8. a readme.txt file at the top level of the SDK would be helpful - a
simple URL to the wiki page is enough for a start.

9. nice to have: add LAPACK <http://www.netlib.org/lapack/> libraries.


> I wanted to wait until some developers approve the
> functionality of nona based on the static library. Due to the fact that
> my graphic card is too old I cannot do such tests. Based on first
> impressions in this thread it seams that the lib works or other way
> around there are situations where it works thus it is no general broken
> build of nona.

Everything is fine. Andrew has added GPU-stitching in a non-obtrusive
way. The current trunk does not break existing functionality (exception:
I have no reports yet if the new code breaks build or functionality on OSX).


> I will update the SDK in few days or hours ...

take your time (focus on robustness and quality), and thank you for the
effort.

Yuv

Yuval Levy

unread,
Aug 6, 2009, 8:21:16 PM8/6/09
to hugi...@googlegroups.com
Gerry Patterson wrote:
> I messed around a bit last night and found that if I change the format
> for the coordinate texture from luminance_alpha32f to say rgba32f, I
> could get Nona to run a bit further. Perhaps that format is not
> supported by the driver I am using?
>
> I know very little about gpu programming, let alone gpgpu approaches
> used here.

same here. Today I found out why on the same hardware I had a different
error: my windows nvidia driver was still 81.98 from January 2006. I
just updated to 190.38 and now I get the same error message as in Ubuntu:

gpu shader program compile time = 0.844


nona: GL error: Framebuffer incomplete, incomplete attachment in:

..\..\..\hugin
\src\hugin_base\vigra_ext\ImageTransformsGPU.cpp:700

Ryan is so far the only one who has reported success, with his
self-built version. I wonder if one of the pre-compiled (from Guido or
from me) yield the same result. This would exclude building errors.

His video card is a GeForce 8800 GTS (256 mb) - anybody else with that
same video card who can run nona -g successfully? and other video cards?

Yuv

Ryan Sleevi

unread,
Aug 6, 2009, 11:01:12 PM8/6/09
to hugi...@googlegroups.com
> Ryan is so far the only one who has reported success, with his
> self-built version. I wonder if one of the pre-compiled (from Guido or
> from me) yield the same result. This would exclude building errors.
>
> His video card is a GeForce 8800 GTS (256 mb) - anybody else with that
> same video card who can run nona -g successfully? and other video
> cards?
>
> Yuv


Yuv,

So I did some testing with your exe
(http://www.photopla.net/hugin/nona_4169.7z ). Several projects I was able
to run without error with mine generated the following error:

nona: GL error in
..\..\..\hugin\src\hugin_base\vigra_ext\ImageTransformsGPU.cpp:743: out of
memory

However, when I re-created the simple project I'd used when testing
my version (just a simple transform), that I previously posted, I was able
to run without errors.

Since there appears to be some degree of variance depending on the
project and the settings, is there any desire to create a sample setup (or
preferably few) that test different levels of complexity?

For what it's worth, here's the results using your exe:

nona: using graphics card: NVIDIA Corporation GeForce 8800 GTS/PCI/SSE2
destStart=[0, 0]
destEnd=[3000, 2800]
destSize=[(3000, 2800)]
srcSize=[(4296, 2856)]
srcBuffer=062D0020
srcAlphaBuffer=0A610020
destBuffer=085F0020
destAlphaBuffer=09E00020


destGLInternalFormat=GL_RGBA8
destGLFormat=GL_RGB
destGLType=GL_UNSIGNED_BYTE
srcGLInternalFormat=GL_RGBA8
srcGLFormat=GL_RGB
srcGLType=GL_UNSIGNED_BYTE

srcAlphaGLType=GL_UNSIGNED_BYTE
destAlphaGLType=GL_UNSIGNED_BYTE
warparound=0
needsAtanWorkaround=0
maxTextureSize=8192

Source chunks:
[(0, 0) to (4296, 2856) = (4296x2856)]
Dest chunks:
[(0, 0) to (1500, 1400) = (1500x1400)]
[(1500, 0) to (3000, 1400) = (1500x1400)]
[(0, 1400) to (1500, 2800) = (1500x1400)]
[(1500, 1400) to (3000, 2800) = (1500x1400)]
Total GPU memory used: 190555008


Interpolator chunks:
[(0, 0) to (4, 4) = (4x4)]
#version 110
#extension GL_ARB_texture_rectangle : enable
uniform sampler2DRect SrcTexture;
float sinh(const in float x) { return (exp(x) - exp(-x)) / 2.0; }
float cosh(const in float x) { return (exp(x) + exp(-x)) / 2.0; }
float atan2_xge0(const in float y, const in float x) {
return atan(y, x);
}
float atan2_safe(const in float y, const in float x) {
return atan(y, x);
}
float atan_safe(const in float yx) {
return atan(yx);
}
void main(void)
{
float discardA = 1.0;
float discardB = 0.0;
vec2 src = gl_TexCoord[0].st;

src -= vec2(1500.0000000000000000, 1400.0000000000000000);

// rotate_erect(18000.000000000000000, -0.00000000000000000000)
{
//src.s += -0.00000000000000000000;
float w = (abs(src.s) > 18000.000000000000000) ? 1.0 : 0.0;


float n = (src.s < 0.0) ? 0.5 : -0.5;

src.s += w * -36000.000000000000000 * ceil(src.s /
36000.000000000000000 + n);
}

// sphere_tp_erect(5729.5779513082325000)
{
float phi = src.s / 5729.5779513082325000;
float theta = -src.t / 5729.5779513082325000 +
1.5707963267948966000;


if (theta < 0.0) {
theta = -theta;

phi += 3.1415926535897931000;
}
if (theta > 3.1415926535897931000) {
theta = 3.1415926535897931000 - (theta - 3.1415926535897931000);
phi += 3.1415926535897931000;
}

float s = sin(theta);
vec2 v = vec2(s * sin(phi), cos(theta));
float r = length(v);

theta = 5729.5779513082325000 * atan2_safe(r, s * cos(phi));


src = v * (theta / r);
}

// persp_sphere(5729.5779513082325000)
{
mat3 m = mat3(0.80742836807921270000, -0.58996561799899783000,
0.00000000000000000000,
0.58996561799899783000, 0.80742836807921270000,
0.00000000000000000000,
0.00000000000000000000, 0.00000000000000000000,
1.0000000000000000000);
float r = length(src);
float theta = r / 5729.5779513082325000;


float s = 0.0;
if (r != 0.0) s = sin(theta) / r;
vec3 v = vec3(s * src.s, s * src.t, cos(theta));
vec3 u = v * m;
r = length(u.st);
theta = 0.0;

if (r != 0.0) theta = 5729.5779513082325000 * atan2_safe(r, u.p) /


r;
src = theta * u.st;
}

// rect_sphere_tp(5729.5779513082325000)
{
float r = length(src);
float theta = r / 5729.5779513082325000;
float rho = 0.0;
if (theta >= 1.5707963267948966000) rho = 1.6e16;


else if (theta == 0.0) rho = 1.0;
else rho = tan(theta) / theta;
src *= rho;
}

// resize(1.6728386297652815000, 1.6728386297652815000)
src *= vec2(1.6728386297652815000, 1.6728386297652815000);

src += vec2(2144.5000000000000000, 1427.5000000000000000);

src = src * discardA + vec2(-1000.0, -1000.0) * discardB;

gl_FragColor = accum;
}

// destExposure = 0.95663269200564105000
// srcExposure = 0.95663269262595030000

// whiteBalanceRed = 1.0000000000000000000
// whiteBalanceBlue = 1.0000000000000000000
p.rgb = p.rgb * 255.00000000000000000;
vec2 invR = texture2DRect(InvLutTexture, vec2(p.r, 0.0)).sq;
vec2 invG = texture2DRect(InvLutTexture, vec2(p.g, 0.0)).sq;
vec2 invB = texture2DRect(InvLutTexture, vec2(p.b, 0.0)).sq;
vec3 invX = vec3(invR.x, invG.x, invB.x);
vec3 invY = vec3(invR.y, invG.y, invB.y);
vec3 invA = fract(p.rgb);
p.rgb = mix(invX, invY, invA);
// VigCorrMode=VIGCORR_RADIAL
float vig = 1.0;
{

vec2 vigCorrCenter = vec2(2144.5000000000000000,
1427.5000000000000000);
float radiusScale=0.00038806915548755783000;


float radialVigCorrCoeff[4] = float[4](1.0000000000000000000,

0.00000000000000000000, 0.00000000000000000000, 0.00000000000000000000);


vec2 src = texture2DRect(CoordTexture, gl_TexCoord[0].st).sq;
vec2 d = src - vigCorrCenter;
d *= radiusScale;
vig = radialVigCorrCoeff[0];
float r2 = dot(d, d);
float r = r2;
vig += radialVigCorrCoeff[1] * r;
r *= r2;
vig += radialVigCorrCoeff[2] * r;
r *= r2;
vig += radialVigCorrCoeff[3] * r;
}

vec3 exposure_whitebalance = vec3(0.99999999935157013000,
0.99999999935157013000, 0.99999999935157013000);


p.rgb = (p.rgb * exposure_whitebalance) / vig;
p.rgb = p.rgb * 1023.0000000000000000;
vec2 destR = texture2DRect(DestLutTexture, vec2(p.r, 0.0)).sq;
vec2 destG = texture2DRect(DestLutTexture, vec2(p.g, 0.0)).sq;
vec2 destB = texture2DRect(DestLutTexture, vec2(p.b, 0.0)).sq;
vec3 destX = vec3(destR.x, destG.x, destB.x);
vec3 destY = vec3(destR.y, destG.y, destB.y);
vec3 destA = fract(p.rgb);
p.rgb = mix(destX, destY, destA);

gl_FragColor = p;
}

gpu shader program compile time = 0.032
gpu shader texture/framebuffer setup time = 0.015
gpu dest chunk=[(0, 0) to (1500, 1400) = (1500x1400)] coord image render
time = 0
gpu dest chunk=[(0, 0) to (1500, 1400) = (1500x1400)] source chunk=[(0, 0)
to (4296, 2856) = (4296x2856)] src upload = 0.063
gpu dest chunk=[(0, 0) to (1500, 1400) = (1500x1400)] source chunk=[(0, 0)
to (4296, 2856) = (4296x2856)] src alpha upload = 0.015
gpu dest chunk=[(0, 0) to (1500, 1400) = (1500x1400)] source chunk=[(0, 0)


to (4296, 2856) = (4296x2856)] src+alpha render = 0.047

gpu dest chunk=[(0, 0) to (1500, 1400) = (1500x1400)] source chunk=[(0, 0)
to (4296, 2856) = (4296x2856)] interpolation chunk=[(0, 0) to (4, 4) =
(4x4)] setup = 0.016
gpu dest chunk=[(0, 0) to (1500, 1400) = (1500x1400)] source chunk=[(0, 0)
to (4296, 2856) = (4296x2856)] interpolation chunk=[(0, 0) to (4, 4) =
(4x4)] render = 0
gpu dest chunk=[(0, 0) to (1500, 1400) = (1500x1400)] normalization setup =
0
gpu dest chunk=[(0, 0) to (1500, 1400) = (1500x1400)] normalization render =
0.015
gpu dest chunk=[(0, 0) to (1500, 1400) = (1500x1400)] dest rgb disassembly
setup = 0
gpu dest chunk=[(0, 0) to (1500, 1400) = (1500x1400)] dest rgb disassembly
render = 0
gpu dest chunk=[(0, 0) to (1500, 1400) = (1500x1400)] rgb readback = 0.016
gpu dest chunk=[(0, 0) to (1500, 1400) = (1500x1400)] dest alpha disassembly
setup = 0
gpu dest chunk=[(0, 0) to (1500, 1400) = (1500x1400)] dest alpha disassembly
render = 0
gpu dest chunk=[(0, 0) to (1500, 1400) = (1500x1400)] alpha readback = 0
gpu dest chunk=[(1500, 0) to (3000, 1400) = (1500x1400)] coord image render
time = 0
gpu dest chunk=[(1500, 0) to (3000, 1400) = (1500x1400)] source chunk=[(0,
0) to (4296, 2856) = (4296x2856)] interpolation chunk=[(0, 0) to (4, 4) =
(4x4)] setup = 0
gpu dest chunk=[(1500, 0) to (3000, 1400) = (1500x1400)] source chunk=[(0,
0) to (4296, 2856) = (4296x2856)] interpolation chunk=[(0, 0) to (4, 4) =
(4x4)] render = 0.015
gpu dest chunk=[(1500, 0) to (3000, 1400) = (1500x1400)] normalization setup
= 0
gpu dest chunk=[(1500, 0) to (3000, 1400) = (1500x1400)] normalization
render = 0
gpu dest chunk=[(1500, 0) to (3000, 1400) = (1500x1400)] dest rgb
disassembly setup = 0
gpu dest chunk=[(1500, 0) to (3000, 1400) = (1500x1400)] dest rgb
disassembly render = 0
gpu dest chunk=[(1500, 0) to (3000, 1400) = (1500x1400)] rgb readback =
0.016
gpu dest chunk=[(1500, 0) to (3000, 1400) = (1500x1400)] dest alpha
disassembly setup = 0
gpu dest chunk=[(1500, 0) to (3000, 1400) = (1500x1400)] dest alpha
disassembly render = 0
gpu dest chunk=[(1500, 0) to (3000, 1400) = (1500x1400)] alpha readback = 0
gpu dest chunk=[(0, 1400) to (1500, 2800) = (1500x1400)] coord image render
time = 0
gpu dest chunk=[(0, 1400) to (1500, 2800) = (1500x1400)] source chunk=[(0,
0) to (4296, 2856) = (4296x2856)] interpolation chunk=[(0, 0) to (4, 4) =
(4x4)] setup = 0
gpu dest chunk=[(0, 1400) to (1500, 2800) = (1500x1400)] source chunk=[(0,
0) to (4296, 2856) = (4296x2856)] interpolation chunk=[(0, 0) to (4, 4) =
(4x4)] render = 0.016
gpu dest chunk=[(0, 1400) to (1500, 2800) = (1500x1400)] normalization setup
= 0
gpu dest chunk=[(0, 1400) to (1500, 2800) = (1500x1400)] normalization
render = 0
gpu dest chunk=[(0, 1400) to (1500, 2800) = (1500x1400)] dest rgb
disassembly setup = 0
gpu dest chunk=[(0, 1400) to (1500, 2800) = (1500x1400)] dest rgb
disassembly render = 0
gpu dest chunk=[(0, 1400) to (1500, 2800) = (1500x1400)] rgb readback =
0.015
gpu dest chunk=[(0, 1400) to (1500, 2800) = (1500x1400)] dest alpha
disassembly setup = 0
gpu dest chunk=[(0, 1400) to (1500, 2800) = (1500x1400)] dest alpha
disassembly render = 0
gpu dest chunk=[(0, 1400) to (1500, 2800) = (1500x1400)] alpha readback = 0
gpu dest chunk=[(1500, 1400) to (3000, 2800) = (1500x1400)] coord image
render time = 0.016
gpu dest chunk=[(1500, 1400) to (3000, 2800) = (1500x1400)] source
chunk=[(0, 0) to (4296, 2856) = (4296x2856)] interpolation chunk=[(0, 0) to


(4, 4) = (4x4)] setup = 0

gpu dest chunk=[(1500, 1400) to (3000, 2800) = (1500x1400)] source
chunk=[(0, 0) to (4296, 2856) = (4296x2856)] interpolation chunk=[(0, 0) to
(4, 4) = (4x4)] render = 0
gpu dest chunk=[(1500, 1400) to (3000, 2800) = (1500x1400)] normalization
setup = 0
gpu dest chunk=[(1500, 1400) to (3000, 2800) = (1500x1400)] normalization
render = 0
gpu dest chunk=[(1500, 1400) to (3000, 2800) = (1500x1400)] dest rgb
disassembly setup = 0
gpu dest chunk=[(1500, 1400) to (3000, 2800) = (1500x1400)] dest rgb
disassembly render = 0
gpu dest chunk=[(1500, 1400) to (3000, 2800) = (1500x1400)] rgb readback =
0.015
gpu dest chunk=[(1500, 1400) to (3000, 2800) = (1500x1400)] dest alpha
disassembly setup = 0
gpu dest chunk=[(1500, 1400) to (3000, 2800) = (1500x1400)] dest alpha
disassembly render = 0
gpu dest chunk=[(1500, 1400) to (3000, 2800) = (1500x1400)] alpha readback =
0.016


gpu destruct time = 0.016

gpu total time = 0.344

Yuval Levy

unread,
Aug 6, 2009, 11:23:34 PM8/6/09
to hugi...@googlegroups.com
Hi Ryan, and everybody else

Ryan Sleevi wrote:
> So I did some testing with your exe
> (http://www.photopla.net/hugin/nona_4169.7z ). Several projects I was able
> to run without error with mine generated the following error:
>
> nona: GL error in
> ..\..\..\hugin\src\hugin_base\vigra_ext\ImageTransformsGPU.cpp:743: out of
> memory

mhh... is yours 64bit? or 32bit?


> Since there appears to be some degree of variance depending on the
> project and the settings, is there any desire to create a sample setup (or
> preferably few) that test different levels of complexity?

Yes, your suggestion makes sense. We need a set of typical test projects
(we anyway should have them and run them before we ship *any* version of
Hugin).

This is a task that any user (i.e. non-developer) can contribute to. We
need one volunteer to collect the test cases and organize them. And
everybody can contribute test cases.

I'm away for one week starting tomorrow morning. You don't need me for
this. There are a number of people around here who have access codes (I
actually don't even remember where I put the access code to
hugin.panotools.org).

I'd be happy to see the community put together the test cases.

Yuv

Ryan Sleevi

unread,
Aug 6, 2009, 11:55:08 PM8/6/09
to hugi...@googlegroups.com
>
> mhh... is yours 64bit? or 32bit?
>

64-bit through and through. Using a DLL version of Glut, rather than a
static library, simply because it was convenient at the time. However, the
error seemed to suggest to me it was a GPU allocation error and not a system
allocation error. The memory usage of nona only peaked at ~100 megs when
generating that error.

Zoran Zorkic

unread,
Aug 7, 2009, 9:33:30 AM8/7/09
to hugin and other free panoramic software


On Aug 7, 2:21 am, Yuval Levy <goo...@levy.ch> wrote:

> Ryan is so far the only one who has reported success, with his
> self-built version. I wonder if one of the pre-compiled (from Guido or
> from me) yield the same result. This would exclude building errors.
>
> His video card is a GeForce 8800 GTS (256 mb) - anybody else with that
> same video card who can run nona -g successfully? and other video cards?

I have a 9800GT which is pretty much the same card (just renamed, love
the marketing guys :).
If you can post a 32-bit windows binary, I'd love to test it out.

Guido Kohlmeyer

unread,
Aug 7, 2009, 10:03:40 AM8/7/09
to hugi...@googlegroups.com
Dear Zoran,

Zoran Zorkic schrieb:

You can find one here (built by me):
http://hugin.panotools.org/testing/hugin/nona.zip

Another one can be found here (I suppose Yuval has built it)
http://www.photopla.net/hugin/nona_4169.7z

Guido

Yuval Levy

unread,
Aug 8, 2009, 9:16:58 AM8/8/09
to hugi...@googlegroups.com
Dear Guido,

Guido Kohlmeyer wrote:
> You can find one here (built by me):
> http://hugin.panotools.org/testing/hugin/nona.zip
>
> Another one can be found here (I suppose Yuval has built it)
> http://www.photopla.net/hugin/nona_4169.7z

I guess Zoran was asking for Ryan to put one up. Ryan is the only one so
far to report success with his self build in 64bit. If he builds it with
32bit and Zoran test it to be working, then we would know that the issue
is not in the nona-gpu code but in the SDK on hugin.panotools.org (I've
used the SDK to build mine).

Yuv


Zoran Zorkic

unread,
Aug 8, 2009, 12:13:27 PM8/8/09
to hugin and other free panoramic software


On Aug 7, 5:23 am, Yuval Levy <goo...@levy.ch> wrote:

> >    Since there appears to be some degree of variance depending on the
> > project and the settings, is there any desire to create a sample setup (or
> > preferably few) that test different levels of complexity?
>
> Yes, your suggestion makes sense. We need a set of typical test projects
> (we anyway should have them and run them before we ship *any* version of
> Hugin).
>
> This is a task that any user (i.e. non-developer) can contribute to. We
> need one volunteer to collect the test cases and organize them. And
> everybody can contribute test cases.


I'm up for it.
But what do you propose as test cases?
What complexity?

How about:
2x5mp photos
4x5mp photos

Do we need a full spherical pano test?

Yuval Levy

unread,
Aug 8, 2009, 10:56:00 PM8/8/09
to hugi...@googlegroups.com
Hi Zoran,

Zoran Zorkic wrote:
> I'm up for it.

thanks for volunteering to put together a collection of test cases.


> But what do you propose as test cases?

see what this user community proposes. I would suggest that you set up a
framework and ask people to contribute to it. Then you can store the
contributed projects somewhere...


> What complexity?

increasing. Continuing from above: ... sort them out in terms of
complexity, put them up for download on hugin.panotools.org - each in
its own archive, linked from a Wiki page. On the wiki you could also
collect test results, i.e. put the tests in the column header and let
people add to the wiki one line for each test, with a description of
their hardware and the version of Hugin used as line header.


> How about:
> 2x5mp photos
> 4x5mp photos

that's a good start.


> Do we need a full spherical pano test?

Hugin is used for full sphericals too and I'm sure somebody will
contribute a project.


Yuv

mdw

unread,
Aug 22, 2009, 2:06:30 PM8/22/09
to hugin and other free panoramic software
Hi

I got this working using "nona version 2009.1.0.4169 built by Yuv" and
"nona: using graphics card: NVIDIA Corporation GeForce 9500 GT/PCI/
SSE2/3DNOW!".

I needed to do some tweaking though:

1) Doing "nona file.pto" only partially worked. The first image was
processed correctly, for the second image the opengl was built and
executed (I think I've seen the GPU time taken for the computation)
and then nona crashed. I just tried on another pto and nona does not
crash (I do not have the initial pto)
=> So I can only treat one image at a time to be safe.

2) I first got a message from hugin telling me that my driver was not
compatible for using the GPU and that I needed to upgrade it to the
latest version of the driver. I did that, but I can't get hugin to
add the '-g' option to the nona call.

3) I looked for an 'appropriate' make for the '.mk' file that is
generated. The Cygwin GNU make did not work. I found a Borland Make
which worked better, but not fully. I had to:
a) Change stuff like "Program Files" and "My videos" to "PROGRA~1" and
"MYVIDE~1" because the Make does not like the spaces in the names for
the paths.
b) remove the test target - it seems to be illegal in some way (I did
not look further) because the targets after the test target are not
found by Make if I do not remove it.
c) Add "-g" to the NONA macro definition.
d) execute make using "make -f file.pto.mk target.jpg" where file
and target have to be replaced with whatever is your specific case.

That will execute nona as many times as there are source images. The
speedup is very big. It normally takes about 2 minutes per image on
my Athlon 2x 3800+ - the GPU takes between 4 and 9 seconds according
to nona - saving the image takes about the same.

Yuv

unread,
Aug 22, 2009, 6:08:34 PM8/22/09
to hugin and other free panoramic software
Hi and thanks for the report.

On Aug 22, 2:06 pm, mdw <mario.de.we...@gmail.com> wrote:
> 1) Doing "nona file.pto" only partially worked.

you mean nona -g file.pto?

> 2) I first got a message from hugin telling me that my driver was not
> compatible for using the GPU and that I needed to upgrade it

yes, it's a combination of GPU and driver. Do you recall what driver
version you had previoulsy? we need to collect this kind of
information as it will become an FAQ. I had a driver from 2006 before
updating.


> 3) I looked for an 'appropriate' make for the '.mk' file that is
> generated.

why not the one that comes with hugin? it's in program files / hugin /
bin

> c) Add "-g" to the NONA macro definition.

yes, this is currently not implemented in the snapshots and must be
done manually.


> That will execute nona as many times as there are source images.  The
> speedup is very big.  It normally takes about 2 minutes per image on
> my Athlon 2x 3800+ - the GPU takes between 4 and 9 seconds according
> to nona - saving the image takes about the same.

sounds cool. Currently nona GPU support is a CLI / back end only
thing. Gerry has a patch to do give access to the Hugin GUI. I think
the next step is for him to apply it, then we're ready to start the
release cycle.

Yuv (currently traveling and with very limited access)

mdw

unread,
Aug 22, 2009, 7:49:42 PM8/22/09
to hugin and other free panoramic software
> > 1) Doing "nona file.pto" only partially worked.
>
> you mean nona -g file.pto?
>
Yes -I forgot toç mention the '-g'.

> > 2) I first got a message from hugin telling me that my driver was not
> > compatible for using theGPUand that I needed to upgrade it
>
> yes, it's a combination ofGPUand driver. Do you recall what driver
> version you had previoulsy? we need to collect this kind of
> information as it will become an FAQ. I had a driver from 2006 before
> updating.

My driver was more recent than that - less than a year - I can't tell
which version it was though.
>
> > 3) I looked for an 'appropriate' make for the '.mk' file that is
> > generated.
>
> why not the one that comes with hugin? it's in program files / hugin /
> bin

It did not pop up in my 'locate' command I think. I'll try it.
>
> > c) Add "-g" to the NONA macro definition.
>
> yes, this is currently not implemented in the snapshots and must be
> done manually.
>
> > That will execute nona as many times as there are source images.  The
> > speedup is very big.  It normally takes about 2 minutes per image on
> > my Athlon 2x 3800+ - theGPUtakes between 4 and 9 seconds according
> > to nona - saving the image takes about the same.
>
> sounds cool. Currently nonaGPUsupport is a CLI / back end only
> thing. Gerry has a patch to do give access to the Hugin GUI. I think
> the next step is for him to apply it, then we're ready to start the
> release cycle.
Looking forward to that - it will be another timesaver compared to
modifying the makefiles (after each change) and launching them
separately.

Zoran Zorkic

unread,
Aug 23, 2009, 4:18:11 PM8/23/09
to hugin and other free panoramic software


On Aug 22, 8:06 pm, mdw <mario.de.we...@gmail.com> wrote:
> Hi
>
> I got this working using "nona version 2009.1.0.4169 built by Yuv" and
> "nona: using graphics card: NVIDIA Corporation GeForce 9500 GT/PCI/
> SSE2/3DNOW!".
>
> I needed to do some tweaking though:
>
> 1) Doing "nona file.pto" only partially worked.  The first image was
> processed correctly, for the second image the opengl was built and
> executed (I think I've seen the GPU time taken for the computation)
> and then nona crashed.  I just tried on another pto and nona does not
> crash (I do not have the initial pto)
> => So I can only treat one image at a time to be safe.

Reading your post I decided to try again, and after a few tries got it
to working partially.
It is very picky about which project it'll remap without crashing.

I made a simple 4 x 3mp photo pano and it crashed always, but remapped
26 out of 33 13mp photos from a larger project :/
I'll test it out some more and report back.


Harry van der Wolf

unread,
Aug 24, 2009, 1:01:18 PM8/24/09
to hugi...@googlegroups.com
I patched the nona part on OSX to make it compile, and that works. Running it with the -g option is failing. See the complete log below. As I'm not a programmer I do not really have a clue what I have to change in the code itself.

Someone??



nona -g -o pipo.tif 20090804-003-20090804-006.pto

nona: using graphics card: NVIDIA Corporation NVIDIA GeForce 9600M GT OpenGL Engine
destStart=[2221, 0]
destEnd=[5701, 2591]
destSize=[(3480, 2591)]
srcSize=[(3456, 2592)]
srcBuffer=0x1fb70000
srcAlphaBuffer=0
destBuffer=0x21511000
destAlphaBuffer=0x22ede000

destGLInternalFormat=GL_RGBA8
destGLFormat=GL_RGB
destGLType=GL_UNSIGNED_BYTE
srcGLInternalFormat=GL_RGBA8
srcGLFormat=GL_RGB
srcGLType=GL_UNSIGNED_BYTE
srcAlphaGLType=GL_BYTE

destAlphaGLType=GL_UNSIGNED_BYTE
warparound=0
needsAtanWorkaround=0
maxTextureSize=8192
Source chunks:
    [(0, 0) to (3456, 2592) = (3456x2592)]
Dest chunks:
    [(0, 0) to (1740, 1296) = (1740x1296)]
    [(1740, 0) to (3480, 1296) = (1740x1296)]
    [(0, 1296) to (1740, 2591) = (1740x1295)]
    [(1740, 1296) to (3480, 2591) = (1740x1295)]
Total GPU memory used: 161927424

Interpolator chunks:
    [(0, 0) to (4, 4) = (4x4)]
#version 110
#extension GL_ARB_texture_rectangle : enable
uniform sampler2DRect SrcTexture;
float sinh(const in float x) { return (exp(x) - exp(-x)) / 2.0; }
float cosh(const in float x) { return (exp(x) + exp(-x)) / 2.0; }
float atan2_xge0(const in float y, const in float x) {
    return atan(y, x);
}
float atan2_safe(const in float y, const in float x) {
    return atan(y, x);
}
float atan_safe(const in float yx) {
    return atan(yx);
}
void main(void)
{
    float discardA = 1.0;
    float discardB = 0.0;
    vec2 src = gl_TexCoord[0].st;
    src -= vec2(6662.0000000000000000, 1357.0000000000000000);

    // rotate_erect(22206.666666666667879, 2720.7326976096633189)
    {
        src.s += 2720.7326976096633189;
        float w = (abs(src.s) > 22206.666666666667879) ? 1.0 : 0.0;

        float n = (src.s < 0.0) ? 0.5 : -0.5;
        src.s += w * -44413.333333333335759 * ceil(src.s / 44413.333333333335759 + n);
    }

    // sphere_tp_erect(7068.6015391880455354)
    {
        float phi = src.s / 7068.6015391880455354;
        float theta = -src.t / 7068.6015391880455354 + 1.5707963267948965580;

        if (theta < 0.0) {
            theta = -theta;
            phi += 3.1415926535897931160;
        }
        if (theta > 3.1415926535897931160) {
            theta = 3.1415926535897931160 - (theta - 3.1415926535897931160);
            phi += 3.1415926535897931160;

        }
        float s = sin(theta);
        vec2 v = vec2(s * sin(phi), cos(theta));
        float r = length(v);
        theta = 7068.6015391880455354 * atan2_safe(r, s * cos(phi));

        src = v * (theta / r);
    }

    // persp_sphere(7068.6015391880455354)
    {
        mat3 m = mat3(0.99990060837645422520, 0.014097218534442075566, 0.00020444556445244871476,
                      -0.014098700947131314817, 0.99979547348811281804, 0.014499580141634665562,
                      0.0000000000000000000, -0.014501021421696839997, 0.99989485466109262468);
        float r = length(src);
        float theta = r / 7068.6015391880455354;

        float s = 0.0;
        if (r != 0.0) s = sin(theta) / r;
        vec3 v = vec3(s * src.s, s * src.t, cos(theta));
        vec3 u = v * m;
        r = length(u.st);
        theta = 0.0;
        if (r != 0.0) theta = 7068.6015391880455354 * atan2_safe(r, u.p) / r;

        src = theta * u.st;
    }

    // rect_sphere_tp(7068.6015391880455354)
    {
        float r = length(src);
        float theta = r / 7068.6015391880455354;
        float rho = 0.0;
        if (theta >= 1.5707963267948965580) rho = 1.6e16;

        else if (theta == 0.0) rho = 1.0;
        else rho = tan(theta) / theta;
        src *= rho;
    }

    // resize(1.0170207690535806311, 1.0170207690535806311)
    src *= vec2(1.0170207690535806311, 1.0170207690535806311);

    // radial(0.98315422436450938815, 0.0000000000000000000, 0.016845775635490601446, 0.0000000000000000000, 1296.0000000000000000, 1000.0000000000000000)
    {
        float r = length(src) / 1296.0000000000000000;
        float scale = 1000.0;
        if (r < 1000.0000000000000000) {
            scale = ((0.0000000000000000000 * r + 0.016845775635490601446) * r + 0.0000000000000000000) * r + 0.98315422436450938815;
        }
        src *= scale;
    }

    src += vec2(1727.5000000000000000, 1295.5000000000000000);
    // destExposure = 4.7257407104992449440e-05
    // srcExposure = 4.7258981791520546516e-05

    // whiteBalanceRed = 1.0000000000000000000
    // whiteBalanceBlue = 1.0000000000000000000
    p.rgb = p.rgb * 255.00000000000000000;
    vec2 invR = texture2DRect(InvLutTexture, vec2(p.r, 0.0)).sq;
    vec2 invG = texture2DRect(InvLutTexture, vec2(p.g, 0.0)).sq;
    vec2 invB = texture2DRect(InvLutTexture, vec2(p.b, 0.0)).sq;
    vec3 invX = vec3(invR.x, invG.x, invB.x);
    vec3 invY = vec3(invR.y, invG.y, invB.y);
    vec3 invA = fract(p.rgb);
    p.rgb = mix(invX, invY, invA);
    // VigCorrMode=VIGCORR_RADIAL
    float vig = 1.0;
    {
        vec2 vigCorrCenter = vec2(1727.5000000000000000, 1295.5000000000000000);
        float radiusScale=0.00046296296296296298063;
        float radialVigCorrCoeff[4] = float[4](1.0000000000000000000, 0.0000000000000000000, 0.0000000000000000000, 0.0000000000000000000);

        vec2 src = texture2DRect(CoordTexture, gl_TexCoord[0].st).sq;
        vec2 d = src - vigCorrCenter;
        d *= radiusScale;
        vig = radialVigCorrCoeff[0];
        float r2 = dot(d, d);
        float r = r2;
        vig += radialVigCorrCoeff[1] * r;
        r *= r2;
        vig += radialVigCorrCoeff[2] * r;
        r *= r2;
        vig += radialVigCorrCoeff[3] * r;
    }
    vec3 exposure_whitebalance = vec3(0.99996667963488838904, 0.99996667963488838904, 0.99996667963488838904);

    p.rgb = (p.rgb * exposure_whitebalance) / vig;
    p.rgb = p.rgb * 1023.0000000000000000;
    vec2 destR = texture2DRect(DestLutTexture, vec2(p.r, 0.0)).sq;
    vec2 destG = texture2DRect(DestLutTexture, vec2(p.g, 0.0)).sq;
    vec2 destB = texture2DRect(DestLutTexture, vec2(p.b, 0.0)).sq;
    vec3 destX = vec3(destR.x, destG.x, destB.x);
    vec3 destY = vec3(destR.y, destG.y, destB.y);
    vec3 destA = fract(p.rgb);
    p.rgb = mix(destX, destY, destA);

    gl_FragColor = p;
}

nona: normalization/photometric shader program could not be compiled.
nona: GL info log:
ERROR: 0:35: 'array of float' : constructor not supported for type
ERROR: 0:35: 'array of float' : no matching overloaded function found
ERROR: 0:35: 'radialVigCorrCoeff' : redefinition


Yuval Levy

unread,
Aug 24, 2009, 7:42:12 PM8/24/09
to hugi...@googlegroups.com
Harry van der Wolf wrote:
> I patched the nona part on OSX to make it compile, and that works. Running
> it with the -g option is failing. See the complete log below. As I'm not a
> programmer I do not really have a clue what I have to change in the code
> itself.
>
> Someone??

thanks for the work and for the log, Harry.


> nona: normalization/photometric shader program could not be compiled.
> nona: GL info log:
> ERROR: 0:35: 'array of float' : constructor not supported for type
> ERROR: 0:35: 'array of float' : no matching overloaded function found
> ERROR: 0:35: 'radialVigCorrCoeff' : redefinition

I'm no expert either, so maybe my hypothesis is completely wrong. I
would like to exclude a driver issue and the best way to find out is if
you can set up an Ubuntu partition on your IntelMac and try to run the
exact same nona project there.

I had a different error at the same stage. Failed on Windows. Passed on
Ubuntu. Updated Windows driver and it passed there too. Now on my system
it fails at a later point, on both Windows and Ubuntu.

Yuv

Tduell

unread,
Aug 24, 2009, 10:08:59 PM8/24/09
to hugin and other free panoramic software
Hullo All,

Just to add to the story on experiences using the nona-gpu option.

I have a Fedora 11 x86_64 version working (svn 4263) and it has seemed
generally OK.
I have just done a simple little project in which I cropped the output
in the fast preview window and found that the resulting stitch took no
notice of the crop boundaries.
A run with nona-gpu option turned off produced a correctly cropped
result.

Hope this helps.

Cheers,
Terry

Yuval Levy

unread,
Aug 25, 2009, 8:03:41 AM8/25/09
to hugi...@googlegroups.com
Tduell wrote:
> Hope this helps.

thanks. bug report at

<https://sourceforge.net/tracker/?func=detail&aid=2844187&group_id=77506&atid=550441>

Yuv

tennevin yves

unread,
Aug 25, 2009, 9:47:53 AM8/25/09
to hugi...@googlegroups.com
Got a few random questions about gpu with enfuse/enblend:
I noticed a gpu option,
Is this option usable or doing anything particular?

I did a few test with my new machine:
Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz
GeForce 9500 GT.
8 gb of RAM.
Ubuntu 64 bits,
hugin svn 4233.
enblend / enfuse 3.2-staging-rev350

I did a processing of 12000x6000 pixels - 360x180 equirectangular
panorama (32 images, 10 at 36/0, 10 at 36/45, 10 at 36/-45 , zenith &
nadir)
The images were already processed, so I just runned the enfuse command
twice, while measuring the time and (trying) to do nothing else during it.

I got 309s without the gpu option used and 282s with it.

Is this linked to my cheap gpu?
There were some NaN (not a number) warnings during the processing with nona.
Is it linked to the size of the panorama?
What are the gain that could be achieved or estimated with a better gpu?

esby / Y. Tennevin

Andrew Mihal

unread,
Aug 25, 2009, 1:42:12 PM8/25/09
to hugi...@googlegroups.com
Hi,
The error means that the GLSL compiler that is built in to your
video card driver is unhappy with the syntax nona-gpu is giving it.
This is a bug in the video card driver. I checked in a possible
workaround to hugin svn. Please give it another try.

Thanks,
Andrew

Harry van der Wolf

unread,
Aug 25, 2009, 4:11:36 PM8/25/09
to hugi...@googlegroups.com
Hi Andrew,

I just compiled hugin 4276 on Ubuntu jaunty 32bit inside Virtualbox on a MacOSX (after the suggestion from yuv yesterday) and nona now gives a segmentation fault on Ubuntu but that might be due to the virtual part somehow "blocking" the video.

Then I svn synced to 4277 in OSX and built Hugin (needed to patch James' patch) and nona -g now gives (only last part):

gpu shader program compile time = 0.04
nona: GL error: Framebuffer incomplete, incomplete attachment in: /Users/Shared/development/hugin_related/hugin/src/hugin_base/vigra_ext/ImageTransformsGPU.cpp:705

Harry


2009/8/25 Andrew Mihal <andrew...@gmail.com>

T. Modes

unread,
Aug 27, 2009, 1:34:17 AM8/27/09
to hugin and other free panoramic software
Hi group,

I tested nona-gpu a bit further.

After update of graphic driver, I got the follow error from nona:

nona: GL info log:
Fragment shader failed to compile with the following errors:
ERROR: 0:4: 'const in' : overloaded functions must have the same
parameter qualifiers
ERROR: 0:5: 'const in' : overloaded functions must have the same
parameter qualifiers
ERROR: compilation errors. No code generated.


nona: coordinate transform shader program could not be compiled.

The following patch (changed interface of sinh and cosh) solved this
problem:

Index: src/hugin_base/vigra_ext/ImageTransformsGPU.cpp
===================================================================
--- src/hugin_base/vigra_ext/ImageTransformsGPU.cpp (Revision 4282)
+++ src/hugin_base/vigra_ext/ImageTransformsGPU.cpp (Arbeitskopie)
@@ -391,8 +391,8 @@
oss << "#version 110" << endl
<< "#extension GL_ARB_texture_rectangle : enable" << endl
<< "uniform sampler2DRect SrcTexture;" << endl
- << "float sinh(const in float x) { return (exp(x) - exp(-
x)) / 2.0; }" << endl
- << "float cosh(const in float x) { return (exp(x) + exp(-
x)) / 2.0; }" << endl;
+ << "float sinh(in float x) { return (exp(x) - exp(-x)) /
2.0; }" << endl
+ << "float cosh(in float x) { return (exp(x) + exp(-x)) /
2.0; }" << endl;

if (needsAtanWorkaround) {
oss << "float atan2_xge0(const in float y, const in float x)
{" << endl

Now nona is partially running:

From inside hugin the output is ok when
* project with cropped tif output (n"TIFF_m c:NONE r:CROP") with and
without cropped output area
* project without cropped tif output (n"TIFF_m c:NONE") without
cropped output area

When using a project without cropped tif output and with cropped
output area nona is crashing after the first image (but the first
image is correctly saved, then crashing)

Using nona from the command line (nona -g -o test project.pto) does
not work. Nona is always crashing after the first image. With some
settings the first image is saved, with other only an empty file is
created.

Maybe there's a bug in the save procedure?

Thomas

PS: When enabling GPU remapping in hugin preferences and open a new
project the normal preview window is not working. Hugin is crashing or
I'm getting the hugin error: ERROR: (..\..\..\..\hugin-trunk\src
\hugin1\hugin\PreviewPanel.cpp:487) PreviewPanel::updatePreview():
error during stitching: Precondition violation!
RemappedPanoImage<RemapImage,AlphaImage>::remapImage(): image sizes
not consistent
(After deactivating the GPU option and reloading the project the
normal preview window is working again.)

Harry van der Wolf

unread,
Aug 27, 2009, 4:13:01 PM8/27/09
to hugi...@googlegroups.com
I "merged" the nona-gpu source into the XCode bundle. However, "nona -g" still doesn't work on OSX. (Not from the command line and not from the bundle).

Yuv

unread,
Sep 5, 2009, 8:47:13 AM9/5/09
to hugin and other free panoramic software
Hi Terry,

On Aug 24, 10:08 pm, Tduell <tdu...@iinet.net.au> wrote:
> I have a Fedora 11 x86_64 version working (svn 4263) and it has seemed
> generally OK.
> I have just done a simple little project in which I cropped the output
> in the fast preview window and found that the resulting stitch took no
> notice of the crop boundaries.
> A run with nona-gpu option turned off produced a correctly cropped
> result.

have you tested this recently too? can you confirm if it was fixed or
if it still occurs?

Yuv
Reply all
Reply to author
Forward
0 new messages