Hi Ben,
Parallelizing the depth-first node visitor would result in the order of nodes being visited no longer being depth-first! The aim of this tool is to visit the nodes in a consistent, predictable and sequential order in cases where that's important. One of the motivating use cases was dispatching to a non-TBB thread pool for handling the multi-threading for example, but I've also used it to make small changes to lightweight sub-trees where that outperforms the TBB overhead of creating and dispatching tasks. It could just as easily be used to serialize a VDB to disk though.
There's lots to unpack in your questions about the NodeManager / DynamicNodeManager. From an algorithm perspective, the breadth-first order of traversal is very convenient for lots of algorithms. However, in terms of performance, I think the main point is that with a correctly tuned algorithm, the time and memory requirements of the construction should be insignificant in comparison to the actual work being done as you typically care about O(n) where n is the number of voxels, not the number of nodes. If there is minimal work being done per node, then adjusting the grain size or preventing the algorithm from recursing all the way down to the leaf node would be a better trade off in improving performance and reducing memory usage of the traversal. That's not to say that there may not be additional optimizations to be made in how this data is processed, but I doubt that it's a worthwhile return on investment. Happy to be proven wrong here though!
As an example, we recently improved the performance of the sequential tree::activeVoxelCount() method by rewriting it to use the DynamicNodeManager. Stats on a 1 billion voxel VDB:
Original: 203 milliseconds
New: 26 milliseconds
I would think that with a fair amount of work it might be possible to trim off 2 milliseconds. :/
In regards to your algorithm, you definitely shouldn't populate a bbox of a sparse grid, that would be a bad idea. I was suggesting that you should populate a grid with the same sparse topology as your target grid and evaluate the implicit function only on the active voxels of that new grid.