Crashing Gaffer with Yeti procedural

88 views
Skip to first unread message

Carlo Giesa

unread,
May 5, 2020, 1:22:01 PM5/5/20
to gaffer-dev
Hi there!

I'm not sure if that arrived since the switch to Arnold 6, but I have a crashing Gaffer when doing Arnold renders with a pgYetiArnold procedural in it.

What is interesting is that the render is successful and when I print the complete verbosity, Arnold does a complete shutdown, and then, I get a segmentation fault from Gaffer. The images are written out and look as expected. I will try to provide a simple test scene that I can share. I keep you posted about this.

I also tried to use different versions of Gaffer, Arnold and Yeti without any success. Even the latest versions of everything has the same behavior.

Did anyone encounter this problem already?

Thanks,
Carlo

Carlo Giesa

unread,
May 5, 2020, 1:31:15 PM5/5/20
to gaffer-dev
Here is a test scene to reproduce the issue. This renders out a beauty with the expected alpha. But Gaffer crashes after Arnold does shutdown.

Can you reproduce the issue?

Greets,
Carlo
crashing_yeti.zip

John Haddon

unread,
May 5, 2020, 1:43:02 PM5/5/20
to gaffe...@googlegroups.com
I don't have Yeti here, but from what you describe, my first guess would be a conflict between the libraries used by Yeti and those used by Gaffer. Do you get a stack trace from the crash? Can you post it here?

--
You received this message because you are subscribed to the Google Groups "gaffer-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gaffer-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gaffer-dev/a70b01e2-6c2f-449f-853d-5abf1e6a24ef%40googlegroups.com.


--
John Haddon - R&D Programmer
Image Engine
studio: +1-604-874-5634 | jo...@image-engine.com | www.image-engine.com



15 West 5th Avenue, Vancouver, BC, V5Y 1H4, Canada

If you are not the intended recipient, disclosure, copying, distribution and use of this email is prohibited. Please notify us immediately and delete this email from your systems. You may contact us at in...@image-engine.com if you do not wish to receive further commercial electronic messages. We may still send you messages for which we do not require consent.

Carlo Giesa

unread,
May 5, 2020, 3:13:09 PM5/5/20
to gaffe...@googlegroups.com
Hi John!

Unfortunately, no stack trace. I only get a segmentation fault line followed by the gaffer command line that was executed. Is there a way to have more verbosity when running Gaffer?

Greets,
Carlo

John Haddon

unread,
May 6, 2020, 10:52:47 AM5/6/20
to gaffe...@googlegroups.com
If you have GDB installed, you could try running Gaffer in a debugger. If you know the command line of the crashed render, you can prepend `env GAFFER_DEBUG=1` to it to launch via GDB. e.g. :

    `env GAFFER_DEBUG=1 gaffer execute ...`

Then when GDB has loaded, type `r` and hit enter to actually run Gaffer. When it crashes and returns to GDB, type `bt` and you should get a stacktrace.

The other thing you could try is to write out an `.ass` file from Gaffer, so you can repeat the render directly via `kick` without Gaffer involved. That might help us narrow it down a bit...

Cheers...
John

Carlo Giesa

unread,
May 6, 2020, 12:02:34 PM5/6/20
to gaffe...@googlegroups.com
Hi John!

Thanks for all this information. I did a quick run, following your steps. I got following stacktrace:

#0  0x00007fffa54d4680 in ?? ()
#1  0x00007ffff77aabd2 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
#2  0x00007ffff77aade3 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ffff6dcaead in clone () from /lib64/libc.so.6

I also managed in the meantime to build Gaffer and I tried to use BUILD_TYPE=DEBUG, but this does not give more information than what I pasted above. Maybe I did something wrong in the way of how to build the debug version although the compiler flags looked ok to me.

And those are the last lines of my verbosity output until the segmentation fault:

[...]
00:00:03   496MB         |  unloading 6 plugins
00:00:03   496MB         |   closing ieOutputDriver.so ...
00:00:03   496MB         |   closing pgYetiArnold.so ...
00:00:03   491MB         |   closing mvUsdArnoldProcedural60.so ...
00:00:03   491MB         |   closing alembic_proc.so ...
00:00:03   490MB         |   closing usd_proc.so ...
00:00:03   490MB         |   closing cryptomatte.so ...
00:00:03   490MB         |  unloading plugins done
00:00:03   490MB         | Arnold shutdown
Detaching after fork from child process 17857.
Detaching after fork from child process 17862.
[Thread 0x7fffacae5700 (LWP 17708) exited]
[Thread 0x7fffaf88f700 (LWP 17707) exited]
[Thread 0x7fffa7fff700 (LWP 17709) exited]

Program received signal SIGSEGV, Segmentation fault.

I will try now to export an ass and to render this with kick.

Greets,
Carlo

Carlo Giesa

unread,
May 6, 2020, 12:14:40 PM5/6/20
to gaffe...@googlegroups.com
When exporting the Gaffer script to an ass and using kick, it renders fine. I just had to use a local Gaffer installation, otherwise, I got following error:

00:00:00   162MB ERROR   |    [osl] error: Could not open "/opt/software/opensource/gaffer/0.56.1.0/shaders/metal.oso.1cd9-9528-b03a-6065.tmp"

This is normal, since this place does not have any write permission. I don't know if that is a problem when running the arnold render directly from within Gaffer. I don't know if there is a way to tell kick where to compile its osl stuff. I never saw those errors before.

Greets,
Carlo

Carlo Giesa

unread,
May 6, 2020, 12:23:38 PM5/6/20
to gaffe...@googlegroups.com
I see as well following in my terminal:

Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 cyrus-sasl-lib-2.1.26-23.el7.x86_64 glibc-2.17-260.el7_6.3.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libICE-1.0.9-9.el7.x86_64 libSM-1.2.2-2.el7.x86_64 libX11-1.6.5-2.el7.x86_64 libXau-1.0.8-2.1.el7.x86_64 libXext-1.3.3-3.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libcom_err-1.42.9-13.el7.x86_64 libcurl-7.29.0-51.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 libglvnd-1.0.1-0.8.git5baa1e5.el7.x86_64 libglvnd-glx-1.0.1-0.8.git5baa1e5.el7.x86_64 libidn-1.28-4.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libssh2-1.4.3-12.el7.x86_64 libstdc++-4.8.5-36.el7.x86_64 libuuid-2.23.2-59.el7.x86_64 libxcb-1.13-1.el7.x86_64 mesa-libGLU-9.0.0-4.el7.x86_64 nspr-4.19.0-1.el7_5.x86_64 nss-3.36.0-7.1.el7_6.x86_64 nss-softokn-freebl-3.36.0-5.el7_5.x86_64 nss-util-3.36.0-1.1.el7_6.x86_64 openldap-2.4.44-21.el7_6.x86_64 openssl-libs-1.0.2k-16.el7.x86_64 pcre-8.32-17.el7.x86_64 sssd-client-1.16.2-13.el7_6.5.x86_64 zlib-1.2.7-18.el7.x86_64

Should I ask IT to install all this on my machine to have more feedback?

Greets,
Carlo

John Haddon

unread,
May 6, 2020, 12:42:00 PM5/6/20
to gaffe...@googlegroups.com
Ooh, that's nicely esoteric :) A bit of googling leads to this :


This could potentially be the same thing. Are you using OSL in your render at all, either in Arnold or via OSLObject/OSLImage/OSLExpression in Gaffer? And do you know if Yeti uses OSL internally at all?

If you're using Gaffer's OSL functionality, could you try eliminating it to see if the problem goes away?



Carlo Giesa

unread,
May 6, 2020, 3:29:52 PM5/6/20
to gaffe...@googlegroups.com
Hey John!

Looks like a good catch. I will check with the guys from Peregrine Labs. In my simplified example, there is no OSL used on my side, so I guess it is Yeti that does this.

Thanks a lot!
Carlo

Carlo Giesa

unread,
May 7, 2020, 7:16:09 AM5/7/20
to gaffe...@googlegroups.com
Hi John!

So, I got a confirmation from Colin Doncaster that no OSL code is involved in Yeti. Maybe, it's a level below and the boost thread_specific_ptr what is discussed in this ticket. I did a quick search in Gaffer and could not find anything. Maybe this is used in Yeti. Or am I totally wrong here?

Any other ideas?

Greets,
Carlo

john haddon

unread,
May 8, 2020, 4:35:33 AM5/8/20
to gaffer-dev
On Thursday, May 7, 2020 at 12:16:09 PM UTC+1, Carlo Giesa wrote:
> Hi John!
> So, I got a confirmation from Colin Doncaster that no OSL code is involved in Yeti. Maybe, it's a level below and the boost thread_specific_ptr what is discussed in this ticket. I did a quick search in Gaffer and could not find anything. Maybe this is used in Yeti. Or am I totally wrong here?

Yes, that does seem plausible. I checked the places where we’re using OSL and I’m fairly sure we’re managing our OSL contexts correctly to avoid the bug. So it could well be a different use of thread_specific_ptr somewhere else.

> Any other ideas?

It might be worth running with LD_DEBUG=files, which will make the dynamic loader print out what libraries are loaded and unloaded and when. There’s a chance this could give us a clue about where the problem is.
> 15 West 5th Avenue, Vancouver, BC, V5Y 1H4, Canada
>
> If you are not the intended recipient, disclosure, copying, distribution and use of this email is prohibited. Please notify us immediately and delete this email from your systems. You may contact us at in...@image-engine.com if you do not wish to receive further commercial electronic messages. We may still send you messages for which we do not require consent.
>
>
>
>
>
> --
>
> You received this message because you are subscribed to the Google Groups "gaffer-dev" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to gaffer-dev+...@googlegroups.com.
>
> To view this discussion on the web visit https://groups.google.com/d/msgid/gaffer-dev/CAEVDW42SSzN5c0r5orbtXwxAHTEyMq5EuLqTuioyef3ue1r%3DiQ%40mail.gmail.com.
>
>
>
>
>
>
> --
>
> You received this message because you are subscribed to the Google Groups "gaffer-dev" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to gaffer-dev+...@googlegroups.com.
>
> To view this discussion on the web visit https://groups.google.com/d/msgid/gaffer-dev/CANHZvCPYuKGyryxY3QBW4htkOZEDEr7jjbzs8NvgYR%2BQ%2BH%3DVFA%40mail.gmail.com.
>
>
>
> --
>
>
>
>
> John Haddon - R&D Programmer
> Image Engine
> studio: +1-604-874-5634 | jo...@image-engine.com | www.image-engine.com
>
>
>
> 15 West 5th Avenue, Vancouver, BC, V5Y 1H4, Canada
>
> If you are not the intended recipient, disclosure, copying, distribution and use of this email is prohibited. Please notify us immediately and delete this email from your systems. You may contact us at in...@image-engine.com if you do not wish to receive further commercial electronic messages. We may still send you messages for which we do not require consent.
>
>
>
>
>
> --
>
> You received this message because you are subscribed to the Google Groups "gaffer-dev" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to gaffer-dev+...@googlegroups.com.
>
> To view this discussion on the web visit https://groups.google.com/d/msgid/gaffer-dev/CAEVDW41vz2xK834tjRy0pH%3Dq5HqQr_JkW%2Bp7_F5ksviqEZXgjA%40mail.gmail.com.
>
>
>
>
>
>
> --
>
> You received this message because you are subscribed to the Google Groups "gaffer-dev" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to gaffer-dev+...@googlegroups.com.
>
> To view this discussion on the web visit https://groups.google.com/d/msgid/gaffer-dev/CANHZvCOtYM8T5aU%2BYybSMKAwR%2BdwTAEcBtNbu%2B90wTtdmKb2Uw%40mail.gmail.com.
>
>
>
> --
>
>
>
>
> John Haddon - R&D Programmer
> Image Engine
> studio: +1-604-874-5634 | jo...@image-engine.com | www.image-engine.com
>
>
>

Carlo Giesa

unread,
May 12, 2020, 3:57:04 AM5/12/20
to gaffe...@googlegroups.com
Thanks John!

I will give this a try asap. I'll come back when I have some news.

Greets,
Carlo

John Haddon

unread,
May 12, 2020, 4:56:42 AM5/12/20
to gaffe...@googlegroups.com
What version of Arnold are you using for this Carlo? I ask because I've received a separate report of crashes in 6.0.2.0, due to Arnold's new `usd_proc.so` leaking TBB symbols that conflict with Gaffer. I'm wondering if this could be related...



--
John Haddon - R&D Programmer
Image Engine
studio: +1-604-874-5634 | jo...@image-engine.com | www.image-engine.com



Carlo Giesa

unread,
May 12, 2020, 8:49:11 AM5/12/20
to gaffe...@googlegroups.com
Hey John!

I tested with Arnold 6.0.2.0 and 6.0.3.0. I have the same issue with both.

Greets,
Carlo

John Haddon

unread,
May 12, 2020, 9:48:42 AM5/12/20
to gaffe...@googlegroups.com
Are you able to test with 6.0.1.0?

Carlo Giesa

unread,
May 13, 2020, 3:56:50 AM5/13/20
to gaffe...@googlegroups.com
I just tested and have the same issue.

Reply all
Reply to author
Forward
0 new messages