Arnold Crash (Access Violation - code c0000005) in Gaffer-1.3.x with Bifrost Procedural and Arnold-7.2.x

166 views
Skip to first unread message

Sudarshan Havale

unread,
Jul 11, 2024, 8:33:44 AMJul 11
to gaffer-dev

Hi,

We are experiencing an Arnold crash with the error Access violation - code c0000005 when rendering a Bifrost procedural in `Gaffer-1.3.x`. This crash occurs after Arnold writes the rendered image to the file system but it does not show any traceback in the render log. This issue happens only when using `Arnold-7.2.x`. But if I try to kick the gaffer sceneDescription.ass file it renders fine without any crash. This makes me think this issue might be more specific to Gaffer than Arnold, which is why I'm posting it here.

We are seeking help to resolve this problem. I have also tried setting ARNOLD_ADP_DISABLE=1, as suggested by Murray in an older thread, but that didn't work in our case.

I am sharing the following details in an attached zip file, to help you understand this problem.

Data:

  • Gaffer Sample Scene: ...\gafferArnoldCrash\gafferArnoldCrash.gfr
  • Sourced Ass File: ...\gafferArnoldCrash\asses\gafferArnoldCrash\basicDistribution.ass
  • Environment Variables: ...\gafferArnoldCrash\env.log

Batch Command Used:

```
J:/tools/bin/rez-2.112.0/Scripts/rez\rez.exe env rez arnold-7.2.5.3 mtoa-5.3.5.3 gaffer-1.3 bifrost-2.8.0.0 hyuu_compounds mjcg_compounds rebel_pack gaffer_deadline-1.0.0.5 -- J:/tools/bin/deadline/gaffer.bat execute -script "D:/workspace/forums/gafferArnoldCrash/gafferArnoldCrash.gfr" -nodes ArnoldRender -frames 1001-1001 >> D:\workspace\forums\gafferArnoldCrash\logs\gafferArnoldCrash_72.log
```

Logs and Proof:

  • Batch Render Crash Log: ...\gafferArnoldCrash\logs\gafferArnoldCrash_72.log
  • Proof of Rendering Fine in Older Arnold 7.1.x: ...\gafferArnoldCrash\logs\gafferArnoldCrash_71.log

Arnold Crash Report After Closing Gaffer Session Post Successful Interactive Render:

```
Date/Time: 2024-07-11 15:05:35 +05:30
Application: python.exe
Error: Access violation - code c0000005 (first/second chance not available)
Crashed Module Name: AminoPrivateDM_2_2_0.dll
Exception Address: 0x00007ffceb298a55
Exception Code: c0000005
```

This popup UI also points to the following files:

  • ...\gafferArnoldCrash\logs\dmpuserinfo.xml
  • ...\gafferArnoldCrash\logs\arnold.dmp

Kick the gaffer exported sceneDescription.ass in Arnold-7.2.x

Command Used: 
```
rez env --inherited python-3 arnold-7.2.5.2 mtoa-5.3.5.2 gaffer-1.3 bifrost-2.8.0.0 hyuu_compounds mjcg_compounds rebel_pack -- kick -dp -dw -v 6 -o D:\workspace\forums\gafferArnoldCrash\logs\kick_gafferArnoldCrash.log -i D:\workspace\forums\gafferArnoldCrash\asses\gafferArnoldCrash\gafferArnoldCrash.1001.ass >> D:\workspace\forums\gafferArnoldCrash\logs\kick_gafferArnoldCrash.log
```

"...\gafferArnoldCrash\logs\kick_gafferArnoldCrash.log"


Can you please look into this and help us with possible fixes?


Thanks,

Sudarshan

gafferArnoldCrash.zip

Carlo Giesa

unread,
Jul 11, 2024, 1:53:16 PMJul 11
to gaffe...@googlegroups.com
Hi Sudarshan!

I'm sorry, but I won't be able to help you on those crashes. But this reminds me of issues that we have when rendering Yeti procedurals in Gaffer (here as well). The rendered image is written and looks correct, but on shut-down, at some point, Gaffer, Arnold or Yeti is crashing.

Our current work around is to simply ignore segmentation faults on Gaffer render jobs that use Yeti. Of course, this is far from ideal, but is currently the only way to move forward on our side.

From memory, I remember that the TBB library can be a problem when used in different libraries if different versions of it are involved. In general, with Yeti, we were able to "fix" this by juggling around with versions until we met the perfect combination. But now, we need a specific version of Yeti and Arnold and we are not able to find the perfect combination that does not crash. But I'm not sure if TBB is really the issue here.

Greets,
Carlo

--
You received this message because you are subscribed to the Google Groups "gaffer-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gaffer-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gaffer-dev/8bf370f0-1b0d-4d88-8759-ae7ec3b5fef3n%40googlegroups.com.

Sudarshan Havale

unread,
Jul 11, 2024, 3:09:31 PMJul 11
to gaffer-dev
Thanks for the response, Carlo. Please correct me if I'm wrong, but as I mentioned, there are no crashes when we kick the .ass file exported from Gaffer. So, I'm curious why this issue only occurs when using Gaffer's execution commands.

Additionally, the Arnold support team suggests that the problem might be related to the Gaffer Python command, which I'm currently trying to validate.

ArnoldSupportTeam.PNG

Thanks,
Sudarshan



Sudarshan Havale

unread,
Jul 11, 2024, 3:14:04 PMJul 11
to gaffer-dev
Hi,

Furthermore, the described issue persists in the latest versions of Gaffer 1.4 and Arnold 7.4. :(

Thanks,
Sudarshan

Murray Stevenson

unread,
Jul 11, 2024, 5:58:44 PMJul 11
to gaffe...@googlegroups.com
Hi Sudarshan,

Sorry to hear that you're running into trouble! From first impressions, it does look like that crash is originating in Bifrost - as the Arnold support team mention "AnimoPrivateDM_2_2_0.dll" is part of Bifrost so this may end up being outside of Gaffer's or Arnold's direct control.


> Please correct me if I'm wrong, but as I mentioned, there are no crashes when we kick the .ass file exported from Gaffer. So, I'm curious why this issue only occurs when using Gaffer's execution commands.

As far as kick not crashing, our current theory about these sorts of crashes is down to library conflicts with procedurals run in Arnold in the Gaffer environment, much like what Carlo has been seeing with the Yeti procedural, and running kick in isolation removes the potential for Bifrost's dependencies to conflict with Gaffer's.

One further test you could try would be to run "gaffer env kick" rather than "kick" to see if you see the same crash while running kick within the Gaffer environment. That test may not be particularly conclusive, but I'd still be interested to know the outcome.


> Additionally, the Arnold support team suggests that the problem might be related to the Gaffer Python command, which I'm currently trying to validate.

In my experience, it's often been the other way round, where the Windows Gaffer wrapper catches a crash within a renderer and reports "Error(s) running Gaffer". We actually removed that message from the Windows wrapper in Gaffer 1.4.0.0 as it was quite misleading to see "Error(s) running Gaffer" printed as a result of an Arnold/Cycles/etc crash and its removal more closely aligned the behaviour on Linux and Windows...

Cheers,

Murray

Sudarshan Havale

unread,
Jul 12, 2024, 8:41:39 AMJul 12
to gaffer-dev
Thanks Murray,

gaffer env kick also rendering fine without any crash in the end. 

Batch Command Failed
J:/tools/bin/rez-2.112.0/Scripts/rez\rez.exe env rez arnold-7.2.5.2 mtoa-5.3.5.2 gaffer-1.2 bifrost-2.8.0.0 hyuu_compounds mjcg_compounds rebel_pack gaffer_deadline-1.0.0.5 -- J:/tools/bin/deadline/gaffer.bat execute -script "D:/workspace/forums/gafferArnoldCrash/gafferArnoldCrash.gfr" -nodes ArnoldRender -frames 1001-1001

Kick Command Suessful
J:/tools/bin/rez-2.112.0/Scripts/rez\rez.exe env rez arnold-7.2.5.2 mtoa-5.3.5.2 gaffer-1.3 bifrost-2.8.0.0 hyuu_compounds mjcg_compounds rebel_pack gaffer_deadline-1.0.0.5 -- gaffer env kick -dp -dw -v 6 -o D:\workspace\forums\gafferArnoldCrash\logs\kick_gafferArnoldCrash.exr -i D:\workspace\forums\gafferArnoldCrash\asses\gafferArnoldCrash\gafferArnoldCrash.1001.ass

So we are moving on to debugging possible library conflicts like you and Carlo mentioned. 

Thanks,
Sudarshan

Sachin Shrestha

unread,
Jul 26, 2024, 2:51:41 AMJul 26
to gaffer-dev
Hi Sudarshan :-)

Is it possible for you to also run this test on Linux as you may be able to run it with a backtrace  there to gain more insights? Also, it would be worth testing this with the latest version of bifrost if possible (and if compatible since 2.8 is fairly old now). Also, from the list of compounds in the rez env, from what I remember the hyu compounds possibly could contain custom bifrost node (not a compound but a whole custom operator compiled using bifrost sdk) so if that is not really being used, then it’s worth removing that from the env in this test to try to limit bifrost to only the default factory compounds or the packs that don’t add any additional new operator. This will help rule out any 3rd party issues.

I only have access to a Mac right now so can’t test your repro but will figure out something. Until then, please let me know on this thread how you get on. Also, if you haven’t reported this to the bifrost guys yet, then please let me know here so I can check with one of them.

-Sachin

Sudarshan Havale

unread,
Jul 30, 2024, 3:25:59 AMJul 30
to gaffer-dev

Hi Sachin,

Thanks for your response on this ticket, and sorry for the delay in getting back to you.

For now, wranglers are manually stoping the deadline job re-queue due to this error. But I will try to reproduce this issue on Linux and see what's going on under the hood. 

Also, as shown in the command below, I've already tried the latest and bare minimum factory packages, but I still see the error at the end of the render logs.

```
J:/tools/bin/rez-2.112.0/Scripts/rez\rez.exe env rez arnold-7.3 mtoa-5.4 gaffer-1.4 bifrost-2.10.0.0 -- J:/tools/bin/deadline/gaffer.bat execute -script "D:/workspace/forums/gafferArnoldCrash/gafferArnoldCrash14.gfr" -nodes LocalDispatcher -frames 1001-1001
```

I've reported this issue to Autodesk, and they forwarded it to the Arnold support team, and as you can see in my previous responses they suggested that the issue could be in Gaffer Python. I'm not sure how to validate that, but escalating this to the Bifrost team could be helpful. It would be greatly appreciated if you could assist with this.

I'll keep you updated on my progress.


Thanks, 

Sudarshan

Sachin Shrestha

unread,
Jul 31, 2024, 4:01:36 AMJul 31
to gaffe...@googlegroups.com
Hi Sudarshan,

Thanks for the details. I am not sure if you have tested this already but may be it is worth removing rez from this equation and just doing a native gaffer batch command test with the batch script you have above in your message with the bare minimum envs dumped in? The env.log is too polluted to look for any meaningful conflicts right now. This is just to ensure there are no conflicts being introduced by rez or any other env var. Also not sure if I missed any previous mentions of this in the thread but did you try a batch render within gaffer and did that run successfully? From your initial zip package, the interactive log seems fine but not sure if that is a preview render or a batch render. If the non rez batch render for gaffer works on a shell, then may be it is also worth testing with a simple shell wrapper in deadline to rule out any convoluted clashes happening with gaffer python, rez, arnold, biforst and deadline in the mix.

And while this may be a pointless wild goose chase but also probably can be tested - in a simple bifrost graph, can you not use any maya geo at all? Just create an empty bifrost graph and try including that in the main gaffer scene and check if that also crashes? And have another bifrost graph in another test with just a bifrost sphere or plane primitive (not maya but bifrost's built-in sphere or plane) being output to render and use that to test? No scattering, etc.

Since the complete render execution happens and then reports a crash the above may not change the outcome but it is worth trying different permutations and combinations.

I'll try setting up a linux VM today to test this in a barebones environment. I don't have deadline/rez/etc. so that should also help isolate just the primary apps i.e. gaffer, arnold and bifrost. Will also try to reach out to a bifrost dev today but I suspect they would also like to see some of the above validated to ensure it is not an issue on the rez or deadline side so if we can isolate this to a simple batch or shell execution, then it will help narrow down the RCA.

-Sachin

Sachin Shrestha

unread,
Jul 31, 2024, 8:06:52 AMJul 31
to gaffe...@googlegroups.com
Also, Sudarshan, is this bifrost graph using some usd inside it? I got some USD related warnings when testing an interactive render with your .ass file. The interactive render did happen with the USD warning dialogs which I ignored but thought I should check with you. Perhaps, even better to test this with the simple bifrost example I described above. Will report what I find at my end with further tests.

-Sachin

Sachin Shrestha

unread,
Jul 31, 2024, 8:24:39 AMJul 31
to gaffer-dev
So, I am able to reproduce the same crash (on Windows) at my end in stock gaffer/arnold/bifrost so looks like there is nothing from rez or deadline end at play here. I am getting an additional backtrace in the log at my end which I didn't see in the previous logs that were posted so copying it here in case it offers any insights. I will report this to bifrost/arnold devs In case the compat issues are from their end. In the meantime, I think the kick process seems to be the acceptable workaround for now until there is an official fix. Shouldn't be hard to setup with system or python command nodes and dispatcher pre/post process in gaffer as done in the past.

Or...you could try to have the bifrost ass files expanded during export from within Maya and if you do that, I suggest using the .usd format instead of .ass for reduced file size as well as more richer interfacing in gaffer in case any elements need to be edited (since .usd will load natively in gaffer) but that's secondary. Assuming that the bifrost ass files are for env so it would be mostly static and hence won't require per frame export. So, it may be a quicker fix to use expanded ass files than setting up the whole gaffer>kick render workflow in deadline. 

****
* Arnold 7.2.1.0 [c4d93677] windows x86_64 clang-15.0.7 oiio-2.4.1 osl-1.12.9 vdb-7.1.1 adlsdk-7.4.2.47 clmhub-3.1.1.43 rlm-14.2.5 optix-6.6.0 2023/03/23 00:57:45
* CRASHED in 0x00007ff80c648a56
* signal caught: error C0000005 -- access violation
*
* backtrace:
>> 0 0x00007ff80c648a56 [ ]
* 1 0x00007ff80c646f62 [ ]
* 2 0x00007ff80c5b6344 [ ]
* 3 0x00007ff80c5bbce3 [ ]
* 4 0x00007ff80c769f29 [ ]
* 5 0x00007ff80c60eb20 [ ]
* 6 0x00007ff80d1838fd [ ]
* 7 0x00007ff80fa96063 [ ]
* 8 0x00007ff80fa97f2c [ ]
* 9 0x00007ff80fad1fc2 [ ]
* 10 0x00007ff80faede53 [ ]
* 11 0x00007ff80faedcf2 [ ]
* 12 0x00007ff80faa7e53 [ ]
* 13 0x00007ff80faed9b2 [ ]
* 14 0x00007ff80faee4bd [ ]
* 15 0x00007ff80faccc8f [ ]
* 16 0x00007ff80fea7575 [ ]
* 17 0x00007ff80feac3fc [ ]
* 18 0x00007ff816aa7a05 [ai ] AiOutputIteratorDestroy
* 19 0x00007ff816792ac9 [ai ] Ordinal0
* 20 0x00007ff88eee0c3c [ucrtbase ] realloc
* 21 0x00007ff88ee500b0 [ucrtbase ] realloc
* 22 0x00007ff88ee3493c [ucrtbase ] realloc
* 23 0x00007ff88ee348b8 [ucrtbase ] realloc
* 24 0x00007ff88eee069c [ucrtbase ] realloc
* 25 0x00007ff81f8f4ed1 [IECoreArnold] IECore::SharedDataHolder<std::vector<unsigned char,std::allocator<unsigned char> > >::Shareable::~Shareable
* 26 0x00007ff81f8f4ff6 [IECoreArnold] IECore::SharedDataHolder<std::vector<unsigned char,std::allocator<unsigned char> > >::Shareable::~Shareable
* 27 0x00007ff892e521cc [ntdll ] ZwWaitLowEventPair
* 28 0x00007ff892d36834 [ntdll ] ZwWaitLowEventPair
* 29 0x00007ff892d46a88 [ntdll ] ZwWaitLowEventPair
* 30 0x00007ff892d5b684 [ntdll ] ZwWaitLowEventPair
* 31 0x00007ff88eeb7e9c [ucrtbase ] realloc
* 32 0x00007ff88eee0454 [ucrtbase ] realloc
* 33 0x00007ff862fea30f [python310 ] Py_Exit
* 34 0x00007ff8630107e1 [python310 ] Py_HandleSystemExit
* 35 0x00007ff86300f688 [python310 ] PyRun_SimpleFileObject
* 36 0x00007ff862e0ba5a [python310 ] PyObject_GC_IsFinalized
* 37 0x00007ff862e0c507 [python310 ] PyObject_GC_IsFinalized
* 38 0x00007ff862e0c930 [python310 ] Py_Main
* 39 0x00007ff626321230 [python ]
* 40 0x00007ff88f2909bc [KERNEL32 ] uaw_wcsrchr
* 41 0x00007ff88f227ba0 [KERNEL32 ] uaw_wcsrchr
* 42 0x00007ff892d5b8b8 [ntdll ] ZwWaitLowEventPair
*
* loaded modules:
* 0x00007ff816220000 ai
* 0x00007ff88ed70000 ucrtbase
* 0x00007ff81f8a0000 IECoreArnold
* 0x00007ff892ba0000 ntdll
* 0x00007ff862d80000 python310
* 0x00007ff626320000 python
* 0x00007ff88f1a0000 KERNEL32
****


-Sachin

Sudarshan Havale

unread,
Jul 31, 2024, 10:04:11 AMJul 31
to gaffer-dev
Hi Sachin,

Sure, the kick .ass workflow could be helpful for now, so I'll set that up.

To answer your question, the .ass file I used contains a Maya cube scattered using Bifrost nodes, and there isn't any USD data in it. I'm not sure why those USD warnings appear, but I'll try exporting another .ass and .USD using the Bifrost sphere as you suggested and test it in the factory gaffer.bat with factory Arnold and Bifrost environments. Additionally, the log details you shared look similar to my interactive render (not batch) logs from the shared zip file, in case you want to verify.

Please let me know if there's anything else that needs to be tested.


Thanks

Sachin Shrestha

unread,
Jul 31, 2024, 10:53:54 AMJul 31
to gaffe...@googlegroups.com
I would definitely test on Linux as well on priority to cross check any OS specific lib version issues or to be able to cross check dependencies a bit easier. I couldn’t get anything other than Ubuntu to setup and that was not reliable when running gaffer so I gave up. Also, when the first windows version of gaffer came out, I do remember using bifrost .ass files inside gaffer successfully so it may be worth testing in an environment of gaffer, Arnold and bifrost from the older versions and seeing where this compat breaks. As Carlo also mentioned earlier, it’s probably one of the lib versions messing this up so if there’s a version that works across all the apps, then it may be worth testing. It’s also a bit weird that I got the USD dialog boxes when rendering in interactive even though the bifrost ass file didn’t have anything to do with USD (I didn’t install any USD stuff explicitly) so it may be worth checking usd lib versions too. 

Another point that could be tested is if the Arnold procedural multi-threading option is in or off by default. Not sure what it’s set to for bifrost procedural but may be worth forcing it to a single thread and rendering..?

Sudarshan Havale

unread,
Aug 1, 2024, 8:43:53 AMAug 1
to gaffer-dev
Hi Sachin,

We have confirmed that it works fine on Linux with the same packages I am using on Windows. You're right about the older versions; I already mentioned in my previous responses that compatible Bifrost .ass files work fine with older Arnold versions like Arnold-7.1.x. However, the crash starts occurring with Arnold-1.2.x and later versions, regardless of whether parallel_node_init is ON or OFF.

We are comparing libraries and will keep you posted if we notice anything suspicious. I've attached all logs for your reference.

Commands Used:
```
rez env arnold-7.3 mtoa-5.4 gaffer-1.4 bifrost-2.10.0.0 -- gfr execute -script "/fs/shows/tools/pipecrew/shavale/forums/gafferArnoldCrash/gafferArnoldCrash14.gfr" -nodes LocalDispatcher -frames 1001-1001 >> /fs/shows/tools/pipecrew/shavale/forums/gafferArnoldCrash/logs/gafferArnoldCrash_linux_73.log

J:/tools/bin/rez-2.112.0/Scripts/rez\rez.exe env arnold-7.2 mtoa-5.3 gaffer-1.3 bifrost-2.8.0.0 -- gfr execute -script "J:/tools/pipecrew/shavale/forums/gafferArnoldCrash/gafferArnoldCrash13.gfr" -nodes ArnoldRender -frames 1001-1001 >> J:\tools\pipecrew\shavale\forums\gafferArnoldCrash\logs\gafferArnoldCrash_win_72.log

J:/tools/bin/rez-2.112.0/Scripts/rez\rez.exe env arnold-7.1 mtoa-5.1 gaffer-1.3 bifrost-2.4.0.0 -- gfr execute -script "J:/tools/pipecrew/shavale/forums/gafferArnoldCrash/gafferArnoldCrash13.gfr" -nodes ArnoldRender -frames 1001-1001 >> J:\tools\pipecrew\shavale\forums\gafferArnoldCrash\logs\gafferArnoldCrash_win_71.log
```

interactive_win_73.log
gafferArnoldCrash_win_73.log
gafferArnoldCrash_linux_73.log
gafferArnoldCrash_win_71.log
interactive_linux_73.log
gafferArnoldCrash_win_72.log
Reply all
Reply to author
Forward
0 new messages