Hello,
I am looking for some advice debugging a RenderMan procedural, which by only loading an OpenVDB file appears to cause noise artifacts in the output of PxrCryptomatte.
We are using OpenVDB 8.1.0 and RenderMan 24.3. I've managed to delete parts of the procedural down to where it only loads a file. I am compiling both the procedural and OpenVDB with gcc 9.3 on CentOS 7.9, but I can also see the issue on gcc 6.3 and Rocky Linux 8.6. I've also tried other OpenVDB versions 4.1.0 and 9.1.0. I'm happy to try the latest commit on GitHub, but I'll need to do a little extra work to be able to build it.
Also important to note, this is unfortunately caused by a proprietary asset. So far I haven't reproduced this on anything but proprietary assets. But as far as I can tell there is nothing special about it. It is 4.5G on disk, produced by Houdini 19.0, has 7 grids (but I forced OpenVDB to load the one named "surface"), and the problematic grid is just a normal FloatGrid level-set. I'm including the vdb_print output if that is helpful at the end of this message.
In the case I'm presenting here, there is actually nothing in the scene. I just have RenderMan invoke my procedural, I run a few lines of OpenVDB code, and stop there. The outputs I get are black images with some noise artifacts. It's behaving like some memory corruption is going on somewhere, and somehow PxrCryptomatte is the thing that is affected most often. I have seen the procedural crash before on "memory corruption" problems --but not in this particular case I am presenting.
I did some systematic disabling of the OpenVDB code base to try and narrow down where the error might be originating. After disabling large blocks of code, I eventually got to this line of code which I can disable and the problem goes away.
https://github.com/AcademySoftwareFoundation/openvdb/blob/ea786c46b7a1b5158789293d9b148b379fc9914c/openvdb/openvdb/tree/LeafNode.h#L1371It looks like `meta` comes from the result of this function. Best I can tell, it wasn't returning a nullptr or anything like that.
https://github.com/AcademySoftwareFoundation/openvdb/blob/ea786c46b7a1b5158789293d9b148b379fc9914c/openvdb/openvdb/io/Archive.cc#L917It's important to note: I can't
only disable the line of code to make the problem go away. I think something more fundamental is wrong, and can be triggered by other parts of the code base as well.
Since I suspect memory corruption, I've also run through valgrind to see if it can detect anything. I have tried valgrind on both an equivalent simple program, and directly on the procedural. The simple program shows possible innocuous-looking leaks which seem to stem from TBB. Of course, it detects a lot more when I actually run prman through it. The most related one is a use of uninitialized values coming from PxrCryptomatte.so. Though, this seems to happen whether or not OpenVDB is involved.
==37293== Conditional jump or move depends on uninitialised value(s)
==37293== at 0x29116BFB: ??? (in /local/prman/24.3.2208291/lib/plugins/PxrCryptomatte.so)
==37293== by 0x2969059F: ??? (in /local/prman/24.3.2208291/lib/plugins/PxrSampleFilterCombiner.so)
==37293== by 0x8D1FD9C: ??? (in /local/prman/24.3.2208291/lib/libprman.so)What
does seem to help is try to force TBB (2019 Update 9) to use one thread. But for whatever reason, that doesn't translate into a solution for to the original procedural's source code.
tbb::task_scheduler_init tsi(1);
Any ideas on this? I don't expect anyone to be able to reproduce without an asset, but thought someone out there might have some experience integrating OpenVDB into RenderMan.
Here is the example procedural which can produce the corruption:
extern "C" RtVoid
Subdivide2(
RtContextHandle ctx,
RtFloat detail,
RtInt argc,
RtToken const toks[],
RtPointer const vals[])
{
//tbb::task_scheduler_init tsi(1);
openvdb::initialize();
openvdb::io::File f(PATH_TO_VDB);
f.open();
}
Here is the output of vdb_print on the asset that causes the problem:
VDB version: 8.1/224
creator: Houdini 19.0.622/GEO_VDBTranslator
Name: surface
Information about Tree:
Type: Tree_float_5_4_3
Configuration:
Root(1 x 8), Internal(8 x 32^3), Internal(1,713 x 16^3), Leaf(1,118,866 x 8^3)
Background value: 0.012
Min value: -0.012
Max value: 0.012
Number of active voxels: 309,934,076
Number of active tiles: 0
Bounding box of active voxels: [-808, -362, -728] -> [926, 3515, 909]
Dimensions of active voxels: 1735 x 3878 x 1638
Percentage of active voxels: 2.81%
Average leaf node fill ratio: 54.1%
Number of unallocated nodes: 0 (0%)
Memory footprint:
Actual: 2.290 GB
Active leaf voxels: 1.155 GB
Dense equivalent: 41.056 GB
Actual footprint is 5.58% of an equivalent dense volume
Leaf voxel footprint is 50.4% of actual footprint
Additional metadata:
class: level set
file_bbox_max: [926, 3515, 909]
file_bbox_min: [-808, -362, -728]
file_compression: blosc + active values
file_mem_bytes: 2458919028
file_voxel_count: 309934076
is_local_space: false
is_saved_as_half_float: false
name: surface
value_type: float
vector_type: invariant
Transform:
voxel size: 0.002
index to world:
[0.002, 0, 0, 0]
[0, 0.002, 0, 0]
[0, 0, 0.002, 0]
[0, 0, 0, 1]