Deadlock when rendering in 3delight with yeti using multithreading

86 views
Skip to first unread message

daweis...@gmail.com

unread,
Jan 15, 2016, 1:02:29 PM1/15/16
to cortexdev
We (Toonbox) have a deadlock problem when rendering cortex custom procedural and yeti fur in 3delight using multithreading. (deadlock happens 90% of the time ) the deadlock will not appear when rendering with single thread. it will not happen when only yeti fur or only cortex procedural in the rib file. at first we suspect that it might be the python GIL, but yeti confirmed that they do not use python nor python global lock. do you guys have any idea what could be the cause of this deadlock problem?

thanks


Andrew Kaufman

unread,
Jan 15, 2016, 1:12:48 PM1/15/16
to cort...@googlegroups.com
I presume you're using a Cortex procedural written in python rather than c++? It's been a while since we've used python procedurals here (we use gaffer's SceneProcedural now days, which is largely c++). Even when we were using python procedurals, I don't think we ever had a situation with multiple procedural types in the rib, so we probably never hit a similar case. Back then, we had a proprietary fur system written in c++, but executed from within the python procedural (forcing a GIL release before entering python).

We've definitely used a Yeti Procedural within a Gaffer SceneProcedural via the IECore.ExternalProcedural mechanism and not had any threading issues I can remember. But like I say, that's largely c++, so probably avoiding any GIL problems.

Have you tried embedding the Yeti procedural inside the Cortex python procedural (using IECore.ExternalProcedural)? It may be that approach releases the GIL from the Cortex side cleaner than 2 separate procedurals. Just a guess though...

Andrew

On 01/15/2016 09:54 AM, daweis...@gmail.com wrote:
We (Toonbox) have a deadlock problem when rendering cortex custom procedural and yeti fur in 3delight using multithreading. (deadlock happens 90% of the time ) the deadlock will not appear when rendering with single thread. it will not happen when only yeti fur or only cortex procedural in the rib file. at first we suspect that it might be the python GIL, but yeti confirmed that they do not use python nor python global lock. do you guys have any idea what could be the cause of this deadlock problem?

thanks


--
--
You received this message because you are subscribed to the "cortexdev" group.
To post to this group, send email to cort...@googlegroups.com
To unsubscribe from this group, send email to cortexdev-...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/cortexdev?hl=en
---
You received this message because you are subscribed to the Google Groups "cortexdev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cortexdev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John Haddon

unread,
Jan 15, 2016, 1:45:11 PM1/15/16
to cort...@googlegroups.com
Hi,
I suspect it's not a GIL problem at all, but actually a TBB initialisation bug. Here's a snippet from an email where I was debugging something similar sounding a long while ago :

"""
Interestingly, my simplified "yeti.rib" test case still hangs indefinitely with both yeti builds. It turns out that in trying to simplify my original problem I had actually discovered an entirely different one. What appears to be happening in this case is that 3delight is loading the Yeti procedural on one thread, and on another thread loading our python procedural and importing IECore. Both IECore and Yeti link against TBB, and we seem to be hitting a deadlock as the two threads both try to initialise TBB at the same time. I've attached stack traces for the two stalled threads in case you're interested.

When rendering directly from Gaffer we don't hit this deadlock, because the Gaffer procedural loads first and then emits a call to the Yeti procedural. If we were to try to render our old Python procedurals out of Maya alongside Yeti procedurals though, then I think we could hit this issue. Since we're not planning on doing that as far as I know, I don't think this is a priority at all at this point, but I thought it was worth flagging the problem.
"""

So although it's not a GIL problem, I think Andrew's suggestion of emitting the Yeti procedural from within a Cortex procedural might actually do they job. Alternatively it's a case of trying to understand why TBB doesn't like being dlopen()ed in separate threads and fixing the problem at source. One thing that might work (but certainly wouldn't be pretty) would be to use LD_PRELOAD to make sure the TBB library is loaded before either of the procedurals are.

Cheers…
John

p.s. I've attached my original stack traces in case you want to compare them to yours.


From: cort...@googlegroups.com [cort...@googlegroups.com] on behalf of daweis...@gmail.com [daweis...@gmail.com]
Sent: Friday, January 15, 2016 9:54 AM
To: cortexdev
Subject: [cortex] Deadlock when rendering in 3delight with yeti using multithreading

We (Toonbox) have a deadlock problem when rendering cortex custom procedural and yeti fur in 3delight using multithreading. (deadlock happens 90% of the time ) the deadlock will not appear when rendering with single thread. it will not happen when only yeti fur or only cortex procedural in the rib file. at first we suspect that it might be the python GIL, but yeti confirmed that they do not use python nor python global lock. do you guys have any idea what could be the cause of this deadlock problem?

thanks


tbbInitialisationDeadlock.txt

daweis...@gmail.com

unread,
Jan 18, 2016, 10:39:17 AM1/18/16
to cortexdev, jo...@image-engine.com
Thanks Andrew and John,

Do you have an example on how to emit the Yeti procedural from within a Cortex procedural? I would like to try this first.

Thanks

John Haddon

unread,
Jan 18, 2016, 11:26:20 AM1/18/16
to cort...@googlegroups.com
There's an example of outputting a generic DynamicLoad procedural via Cortex/Python here :

https://github.com/ImageEngine/cortex/blob/f14c5dd9a0389a9cf193c1bc5dd8d8183d25eaf1/test/IECoreRI/Renderer.py#L628https://github.com/ImageEngine/cortex/blob/f14c5dd9a0389a9cf193c1bc5dd8d8183d25eaf1/test/IECoreRI/Renderer.py#L628

In RenderMan all the arguments for the procedural have to be packed into a single string, which is something of a pain. In the example you'll see two ways of doing this :

1. Specify the string in full using an "ri:data" parameter.
2. Specify arbitrary parameters, which will automatically be formatted into "--name value" pairs for you.

The first one is super simple if you just want to copy/paste an existing procedural out of a RIB for testing, and the second is probably more useful if you want to programmatically generate the arguments to the procedural - it just so happens that the formatting that Cortex performs is the same as what the Yeti procedural expects...

Cheers...
John


daweis...@gmail.com

unread,
Jan 18, 2016, 5:16:23 PM1/18/16
to cortexdev, jo...@image-engine.com
Thanks John, the LD_PRELOAD works, I will try out this one.

daweis...@gmail.com

unread,
Jan 20, 2016, 12:44:12 PM1/20/16
to cortexdev, jo...@image-engine.com
Hi John,

actually, LD_PRELOAD is not working very well, it solved the thread deadlock problem but introducing a new one: 

When I am using LD_PRELOAD, the image rendered, but we got this error at the begining: 

Traceback (most recent call last):                                                                                                                            
  File "<string>", line 6, in <module>                                                                                                                        
  File "./IECore/__init__.py", line 47, in <module>
    from _IECore import *                                                                                                                                                     
ImportError: ./Release/lib/libIECore.so: undefined symbol: _ZN3tbb10interface58internal9task_base7destroyERNS_4taskE                                                                                                                                                                                       
Traceback (most recent call last):                                                                                                                                                               
  File "<string>", line 1, in <module>                                                                                                                                                           
NameError: name 'IECoreRI' is not defined







On Monday, January 18, 2016 at 11:26:20 AM UTC-5, John Haddon wrote:

daweis...@gmail.com

unread,
Jan 21, 2016, 9:55:22 AM1/21/16
to cortexdev, jo...@image-engine.com
Sorry, this is my fault, I set the LD_PRELOAD to the wrong libtbb.so, I set it to the correct .so file, the error is gone, but the deadlock come back. any idea?
we are using a older version of cortex so we could not use "ExternalProcedural" to run yeti. 

john haddon

unread,
Jan 23, 2016, 5:57:24 PM1/23/16
to cortexdev, jo...@image-engine.com
Perhaps you could try one of these alternatives :

1.  Set `Attribute "procedural" "integer reentrant" 0` in the RIB, to stop 3delight expanding procedurals in parallel. If you spawn further procedurals from your own, you could set this back to 1 to get your child procedurals running in parallel.
2. Use `renderer.command( "ri:archiveRecord", ... )` to emit the yeti procedural from your python procedural, as an alternative to using an ExternalProcedural. See https://github.com/ImageEngine/cortex/blob/master/test/IECoreRI/ArchiveRecord.py for an example.

Cheers...
John

Dawei Sun

unread,
Jan 26, 2016, 9:20:03 AM1/26/16
to cortexdev, jo...@image-engine.com
Hi John,

setting reentrant to 0 for both yeti and our cortex procedural seems working, thanks for the suggestion. 

I am not very clear on how to use the renderer.command() to run yeti procedural, could you please explain a little more, my understanding is that you put the yeti procedural with its parameters in a string, create a IECoreRI.Renderer(test.rib) and using r.command( "ri:archiveRecord", stringData) to export command to the test.rib?

thanks

John Haddon

unread,
Jan 26, 2016, 11:32:43 AM1/26/16
to Dawei Sun, cortexdev
From what I remember, you don't actually need to be generating a RIB to use "ri:archiveRecord". You should be able to do `r.command( "ri:archiveRecord" ... ) from a procedural during a live render, and 3delight will parse it on the fly as if it came from a RIB...


Dawei Sun

unread,
Jan 27, 2016, 4:07:10 PM1/27/16
to cortexdev, daweis...@gmail.com, jo...@image-engine.com
Thanks John
Reply all
Reply to author
Forward
0 new messages