csgDifference with implicit function?

144 views
Skip to first unread message

b...@formlabs.com

unread,
May 13, 2021, 8:39:11 PM5/13/21
to OpenVDB Forum
The openvdb::tools::csgDifference function takes two grids of the same type and subtracts the second from the first in-place. I want to do the same thing, but with the second argument as an implicit function, f(x,y,z)->float. Is there a built-in way to do this? I see csgDifference is based on visit2 using  CsgDiffVisitor. It looks like visit2 really needs the other tree to be a real tree.

I'm sure I could write this myself, but it seems like there should be an easier way.

Cheers,
Ben

Dan Bailey

unread,
May 17, 2021, 3:45:43 PM5/17/21
to OpenVDB Forum
Hi Ben,

The CsgDiffVisitor has actually been removed from the upcoming VDB 8.1. The main problem with the visitor methods is that they are inherently single-threaded and thus pretty slow. There are still valid cases where they come in useful though. We are in the process of removing all the visitors methods from the Tree and offering these two alternatives:

* tools::visitNodesDepthFirst() in tools/NodeVisitor.h for cases where single-threaded depth-first traversal is still desirable
* tree::DynamicNodeManager in tree/NodeManager.h for cases where multi-threaded processing is desirable

For your case I would suggest the simplest solution is to discretize the implicit function into a new intermediate tree with matching resolution and topology as the target and then to do tools::csgDifference() which will now use the new DynamicNodeManager to perform a multi-threading CSG difference operation stealing data from this intermediate tree where it can.

We have discussed improving support for implicit functions in the past, but it's not something we're actively looking at solving at this time. There is one example of a implicit field in the codebase meant only for debugging that you can look at though:


Thanks

b...@formlabs.com

unread,
May 18, 2021, 5:42:08 AM5/18/21
to OpenVDB Forum
Thanks Dan,

I have a few questions:
* Looking at visitNodesDepthFirst, and in particular DepthFirstNodeVisitor that implements it, is there a reason it couldn't pretty-easily be parallelized at each level using e.g., tbb::task_group?

* Regarding DynamicNodeManager (and NodeManager), I'm having trouble understanding their benefits. It looks like they collect all nodes for easier traversal? That sounds like it's going to use O(n) memory  and O(n) time to build, and like it wouldn't improve the big-O  time to do a pass. Am I mistaken? Does it somehow make up for it in better memory coherency or somehow better creation of tbb tasks?

One case I'm interested in is where the grid is very sparse and the implicit function fills all of space, so constructing the implicit function to fill the bbox of the grid would be costly.

Thanks,
Ben

Dan Bailey

unread,
May 18, 2021, 9:59:46 PM5/18/21
to OpenVDB Forum
Hi Ben,

Parallelizing the depth-first node visitor would result in the order of nodes being visited no longer being depth-first! The aim of this tool is to visit the nodes in a consistent, predictable and sequential order in cases where that's important. One of the motivating use cases was dispatching to a non-TBB thread pool for handling the multi-threading for example, but I've also used it to make small changes to lightweight sub-trees where that outperforms the TBB overhead of creating and dispatching tasks. It could just as easily be used to serialize a VDB to disk though.

There's lots to unpack in your questions about the NodeManager / DynamicNodeManager. From an algorithm perspective, the breadth-first order of traversal is very convenient for lots of algorithms. However, in terms of performance, I think the main point is that with a correctly tuned algorithm, the time and memory requirements of the construction should be insignificant in comparison to the actual work being done as you typically care about O(n) where n is the number of voxels, not the number of nodes. If there is minimal work being done per node, then adjusting the grain size or preventing the algorithm from recursing all the way down to the leaf node would be a better trade off in improving performance and reducing memory usage of the traversal. That's not to say that there may not be additional optimizations to be made in how this data is processed, but I doubt that it's a worthwhile return on investment. Happy to be proven wrong here though!

As an example, we recently improved the performance of the sequential tree::activeVoxelCount() method by rewriting it to use the DynamicNodeManager. Stats on a 1 billion voxel VDB:

Original: 203 milliseconds
New: 26 milliseconds

I would think that with a fair amount of work it might be possible to trim off 2 milliseconds. :/

In regards to your algorithm, you definitely shouldn't populate a bbox of a sparse grid, that would be a bad idea. I was suggesting that you should populate a grid with the same sparse topology as your target grid and evaluate the implicit function only on the active voxels of that new grid.

b...@formlabs.com

unread,
May 19, 2021, 6:48:55 AM5/19/21
to OpenVDB Forum
Thanks for the reply. That makes sense that we'd want true ordered depth-first traversal.

Can you elaborate on activeVoxelCount? I think I don't grok what the NodeManagers really do: I don't understand how it could be 8x faster to traverse the tree to find all the leaf nodes, then iterate over those as opposed to at each level of the tree doing a tbb::parallel_reduce on all "on" children. In both cases don't we have to traverse the whole tree and call onVoxelCount() on every LeafNode?

Thanks,
Ben

Greg Hurst

unread,
Jun 13, 2021, 4:20:45 PM6/13/21
to OpenVDB Forum
Correct me if I'm wrong but i think that's what the NodeManager is doing here, i.e. calling tbb::parallel_reduce on all children in a topdown fashion.

It appears that countActiveVoxels calls reduceTopDown, which calls reduce, which calls run, which calls tbb::parallel_reduce.

Dan Bailey

unread,
Jun 14, 2021, 7:52:24 PM6/14/21
to OpenVDB Forum
Yeah, exactly. It does a tbb::parallel_reduce over each level of the tree in turn, meaning it does a reduction over the immediate children of the root node, then a reduction over the immediate children of those children and so on. Each time the array of child nodes is built in parallel using a tbb::parallel_for (and deduced from the parent array) until finally it runs a reduction over the leaf nodes. So in this example, by doing each of these steps in parallel using 32 threads gives a net performance improvement of about 8x compared with doing a single-threaded traversal over the entire tree.

It might be helpful to look over the code here (https://github.com/AcademySoftwareFoundation/openvdb/blob/master/openvdb/openvdb/tree/NodeManager.h#L107) and compare the single-threaded and multi-threaded logic. This is constructing an array of nodes at one level of the tree from a parent array by computing the size of the node array and the offsets and then populating them in parallel.

Reply all
Reply to author
Forward
0 new messages