Large VDB's and OOC

429 views
Skip to first unread message

Michael Cleaver

unread,
Aug 17, 2020, 9:04:12 PM8/17/20
to OpenVDB Forum
Hi all,

I've recently been working with some very large (~750 million voxel) grids.  I work with grids of different value types but a Vec3I grid this size can end up using about ~10GB memory per grid and for some operations I need to combine two grids at a time to evaluate some complex expressions which reference many grids.  My concern is that some of the users of the app may be on laptops with only 16GB of ram.

After I run a ValueOnCIter iterator over a delay loaded grid it ends up being fully loaded into memory including all leaf buffers.  Is there any mechanism to flush these buffers(back to out of core state?) or something like that?  

If I use a value accessor to request all voxels I will also end up fully loading the grid into memory right?

Is there any existing mechanism to iterate a grid without having to hold the entire grid in memory?

I guess I can use the bbox constraint on loading the grid to partition my operations.  If I do this though I will end up with a set of VDB files on disk which must then be merged.  Will that still require the entire final result grid to be able to fit in memory or does the node-stealing aspect of merging allow data to remain out-of-core through the merge?

Another idea is to open each VDB multiple times: maybe iterate one top level node from each file and then close and re-open the file or something like that?  I suspect this would not be ideal because the initial load of the grid has some cost too.  

Any advice appreciated,

Thanks,
Mike

Michael Cleaver

unread,
Aug 31, 2020, 10:47:46 PM8/31/20
to OpenVDB Forum
FYI I did end up testing the method of closing and re-opening the VDB files after reading a certain amount of voxels as a quick hack and it did let me work around a memory usage issue.

Given I've had no replies I'm a little worried that this is just something I'm doing wrong?  I also forgot to mention I'm building on windows (mostly).

Dan Bailey

unread,
Sep 29, 2020, 6:08:36 PM9/29/20
to OpenVDB Forum
Hi Mike,

I've used the out-of-core trick you've talked about here of streaming by just dropping the buffers and re-opening the file (I believe Ken has some experience to share here too).

In answer to your question here - it depends. When you merge two grids, if there are two leaf node buffers with the same origin, then the data will need to be brought in-core in order to be able to merge them. Otherwise that's not the case and the leaf node can remain out-of-core even after it has been merged.

Once you have a delay-loaded VDB, another trick you can use is to split the Grid out into multiple Grids based on the elements in the RootNode (there are now new methods to be able to steal nodes in this way). Meaning if your source grid has four RootNode elements, you'd end up with four new Grids each with one RootNode element, then you can combine these one at a time and finally combine the resulting grids which is fast because it's mostly stealing data. That would reduce your peak memory usage. It's unlikely to result in much performance gain over the technique you're using though because file I/O will still dominate, but it's a bit cleaner to implement.

Cheers,
Dan

Michael Cleaver

unread,
Oct 5, 2020, 8:10:37 PM10/5/20
to OpenVDB Forum
Hi again Dan,

Thanks again for the reply.

When you say 'dropping the buffers' do you mean there is some method on the tree/node classes that can do this? In the end I was just throwing away the entire grid (ptr reset) and re-opening it however this takes some extra time.  

I'm planning to experiment soon with your root node based splitting idea to partition some of our processing as I think that will be the best solution.

Cheers,
Mike

Dan Bailey

unread,
Oct 6, 2020, 12:08:32 AM10/6/20
to OpenVDB Forum
Hi Mike,

The technique I used was to create a new unallocated LeafBuffer using this constructor:

LeafBuffer(PartialCreate, const ValueType&);

And then to use the LeafNode::swap() method to swap the contents, thus dropping the contents of the value buffers. There's no way to recover the leaf nodes you've dropped though, so you would need to reopen the whole Grid if you required that same leaf node buffer again. However, it's useful for streaming where you know you only want to read each leaf node once and you want to keep the memory footprint as low as possible.

Let me know how you get on with the splitting. :)

Cheers,
Dan

Ken Museth

unread,
Oct 6, 2020, 5:05:59 PM10/6/20
to OpenVDB Forum
Hi Michael,

I know I'm late to the "party" but here are my thoughts:

You've discovered a major limitation of the current implementation of out-of-core support in OpenVDB, namely the fact that it's only performing delayed loading but no eviction. This implies, just as you observed, that if you "touch" all the leaf nodes we'll effectively end up with the same memory foot-print as if you had loaded everything into memory from the start! So, delayed loading is only really useful if you're not touching everything. A trick, which sounds similar to what you outlined in the followup post, is to partition the domain of your grid into blocks (preferably aligned with the index domain of internal nodes, say of size (16*8)^3 = 128^3) and only load leaf nodes in that block, perform the work and then delete the grid (which flushed all it's in-memory nodes) and move on the the next block. Obvious this is a bit of a clunky workflow which is exactly why I'm working on a new version of the OpenVDB grid that does exactly what you're after - it performs *both* delay-loading and eviction when a user-defined memory foot-print is hit. It's not done yet (thought I have a working prototype) so don't hold your breath :) However, I promise it come.

Cheers,
Ken

Michael Cleaver

unread,
Mar 8, 2021, 7:32:39 PM3/8/21
to OpenVDB Forum
Hi Ken,

I ended up implementing this partitioning by iterating the level 2 nodes and then using their bounding boxes and this worked very well for me and was not much effort.  In this case I was implementing a 'columns to rows' type algorithm generating a tabular output where each voxel centroid x,y,z is output with 150 values each sourced from independent vdb files sharing equivalent topology.  

I have several other algorithms which will benefit from partitioning and will roll out the improvement for those in coming months.  Possibly Dan's LeafNode::swap() will help in some of those too so I will also give that a go.

Thanks for taking the time to reply and also for creating such a useful library and apologies for taking so long to reply myself!

Mike
Reply all
Reply to author
Forward
0 new messages