Wrote an occupany mapping library for use with robotics

ViWalkerDev

unread,

Nov 3, 2022, 12:46:26 AM11/3/22

to OpenVDB Forum

Wrote an occupancy mapping library that uses OpenVDB, NanoVDB and CUDA to accelerate occupancy mapping when dealing with dense sensor inputs.

It's still very much in beta. But figured people might get some use out of it and documentation is sparse. If you want to use it and need help, don't be afraid to reach out and I will provide assistance when I can.

https://github.com/ViWalkerDev/NanoMap

Related Publication:

https://www.mdpi.com/2072-4292/14/21/5463

Ignacio Vizzo

unread,

Nov 4, 2022, 1:54:02 AM11/4/22

to OpenVDB Forum

Fascinating! I'm the author of vdbfusion: https://github.com/PRBonn/vdbfusion. It's strange to see that the reviewers of the MDPI paper could not connect this two papers together.

In any case, how did you manage to cope with the fact that NanoVDB can't create topologies, do you create the maps in the CPU and the stream over the GPU? I would soon give it a read to your paper, looks like great wor )

ViWalkerDev

unread,

Nov 4, 2022, 4:29:24 AM11/4/22

to OpenVDB Forum

Hi there!

I'm surprised I never saw your work either! Feels like stuff just gets buried in the search engine sometimes.

Yeah, so the map construction occurs on the CPU. The gains over traditional methods involve parallelizing the ray-casting, as in the case of dense sensors, this step involves a lot of overlapping information. Using knowledge of the sensor characteristics and desired mapping resolution, the software pre-allocates memory on the GPU that represents a cube of voxels that completely encloses the potential view. It doesn't really preallocate this entire space though, because with a frustum, the view never is equivalent to that entire cube. So it allocates the amount of memory it expects to use for raycasting an input frame from the sensor. If for some reason it needs to do a re-allocation it will, but normally it won't, and the user can adjust the amount of memory it starts with, to prevent these re-allocations.

So it gets an input frame, copies that to the GPU, the gpu then raycasts each point in the input through the 3d space that we've preallocated. It does it in two stages, casting at the leaf node level, mapping active leaf nodes to a buffer, then raycasts at the voxel level to populate a sparse buffer that maps to the active voxels with the logOdds hit/miss values that it calculates in parallel using atomic operations. Then it compresses that buffer to uint8t representation to make the copy time as small as possible, because this is where the bulk of the overhead is when dealing with CPU->GPU->CPU.

Once this compressed buffer is back on the CPU, a smaller index buffer and knowledge of the sensor position is used to map the voxel occupancy information contained in the compressed buffer to world space and modifies the openvdb grid that exists on the CPU.

The main tricks and headaches are all in how the library converts from world space to sensor space, and then moves between leaf node and voxel space populating the necessary buffers. This is all done with the intent of keeping the memory costs fixed, re-allocations close to zero, and memory copies between GPU and CPU as fast as possible.

Obviously, because of how this is done, sparse sensors like LIDAR, where there is little overlap in the information provided by the sensor within a single frame, don't benefit from this optimization.

I have a few ideas on how to hybridize a few things for potential speed ups, but the size of the potential information space means there will also need to be more memory optimization tricks to prevent the algorithm from running out of GPU memory. And I'm currently trying to finish my PhD and that isn't a focus, so it might be a while before the LIDAR optimisations are ever done.

But for edge robotics with frustum style sensors running jetson boards or nvidia GPUs(which are a big thing in my departments UAV research, which was the motivation for the library in the first place), this provides a way to process RGBD/Stereo sensors into an occupancy map with much less CPU overhead and in real-time, freeing up cycles for planning and other tasks.

Cool thing is that the performance is affected by the target mapping resolution and the range and FOV of the sensor. So if two sensors have the same characteristics (range and FOV), but one has twice the resolution of the other, the only thing that changes is the number of rays that need to be traced. The memory allocation stays the same, and the GPU->CPU costs stay the same, and the cost of editing the map stays the same. The only thing that takes longer is the ray-casting step, which is done as quickly and in parallel as possible.

And this performance is further accelerated because I wrote a number of voxelisation filters for the input clouds that also use CUDA to map the input cloud rays to the voxel space, allowing really dense sensor inputs to be reduced very quickly to the bare minimum amount of rays without losing much information at all, and then these are raycast in parallel as normal.

Anyway, I wrote all this in a hurry, and I hope the publication is maybe more helpful in explaining what the work aims to achieve and how it does so.

Cheers!

Ken Museth

unread,

Nov 4, 2022, 11:00:00 AM11/4/22

to OpenVDB Forum

Very interesting work guys! Am I right that in both of your approaches a computational bottleneck is the fact that today you can only create NanoVDB grids on the CPU? If so, you might be interested to hear that at NVIDIA we've developed code to generate NanoVDB grids directly on the GPU. It's still not part of OpenVDB, primarily because I am not convinced that it is mature enough. To this end, I'm curious to know what are you currently generating NanoVDB grids from (on the CPU)? In other words, what does your API look like for creating new NanoVDBs? Also, are you using CUDA or some other device language?

Cheers,

Ken

Zhang Kin

unread,

Nov 4, 2022, 1:42:01 PM11/4/22

to OpenVDB Forum

I noticed that NVIDIA published the nvblox [Signed Distance Functions (SDFs) on NVIDIA GPUs] which have similar but not OpenVDB lib to the robotics field. I run it also, and it definitely speeds up the whole mapping with GPU really quickly almost 10ms for 1 frame and 4s for total 3000 frames as I tested on 3080Ti.

Hope can also be a reference.

Looking forward to the work also!

Best,

Kin

ViWalkerDev

unread,

Nov 4, 2022, 9:18:31 PM11/4/22

to OpenVDB Forum

So in Nanomap, the primary grid used is the OpenVDB format on the CPU. NanoVDB and its structures and methods are just used to enable the acceleration of the raycasting. A NanoVDB grid of a known environment is constructed prior to simulation runs when simulating a sensor view in an existing VDB grid. So that the environment exists on the GPU for fast sensor input generation. But primarily, when processing inputs from a real sensor, the library just uses the NanoVDB methods to assist with the ray casting operations, then compresses that information, sends it back to the CPU, where it is processed into an OpenVDB grid. At first this was just because creating a performant sparse NanoVDB grid on the GPU is currently not possible with the current version, but also because OpenVDB grid is a little more flexible when needing to deal with providing access to the grid information for planning algorithms.

Definitely interested in the capacity to work directly with grids on the GPU, would be interested to see how that improves performance. How does it manage to do this though? I'm not a seasoned CUDA dev, I only know what I taught myself over the last 6 months as I wrote this library, but how does the new code handle updating a sparse grid on the GPU? My concern would be the expense of constant re-allocations.

The API is pretty primitive and still very much in beta. The main process is instantiating a manager object which maintains a config object, and a handler object. The handler contains all the objects, methods, and buffers required to facilitate the CPU->GPU->CPU operations and interacts with the kernel code. The config object contains all the info about the agents and the sensors being used. If three agents all have the same sensor configuration, then only one sensor is defined and the agents just point to that configuration, with a transform describing where their sensor is in relation to their pose. Each agent has its own map, which contains an openvdb grid representing an occupancy map. The agent object probably wasn't strictly necessary just for processing sensor inputs, but I built the library with more in mind than just processing inputs, so it ended up as part of the base implementation. It's used for simulation related stuff and will be used further for planning stuff that I am developing.

To save on memory, the handler only allocates a single set of buffers, sizing them for the sensor that requires the most amount of memory. And then all sensors just share that memory, with sensor processing done serially.

Currently you provide config via txt files, spin up a manager, and then to add a sensor input, you use the manager, providing the sensor name, sensor input (as a char array and the necessary array info to process it), pose of the sensor, and the pointer to the map that you want to edit.

The handler takes this info, edits offsets for calculation according to the sensor pose and sensor information, copies the sensor input to the GPU, the GPU does the necessary filtering and raycasting using the NanoVDB HDDA methods, compresses the occupancy results for the sensor view to a smaller form factor, copies them back to the CPU, the CPU then transforms these results to world space and edits the provided OpenVDB occupancy grid accordingly.

This does mean that existing code that wants to interact with the occupancy grid will need some kind of wrapper, but honestly I much prefer accessing OpenVDB structures now that I've experience with the library, so it wasn't a big deal for me.

Also, only using CUDA atm, learning one device language was a big enough challenge at the time 😂.

I hadn't seen nvblox! Very cool. With nanomap, using the GPU based input filtering, a frame from a depth sensor with a resolution of 640x480 and a range of 5m, can be processed into an occupancy map with a resolution of 10cm per voxel in 10ms. This is on a jetson nano, which I thought was pretty neat. On my laptop with an RTX 2060 MaxQ, the same operation takes 2.7ms. 5cm voxel size changes these times to 30ms and 4ms respectively.

Have a few ideas to improve the performance as sensor range increases by using range based LoD, but working on planning and solving openVDB grids atm.

ViWalkerDev

unread,

Nov 4, 2022, 9:30:32 PM11/4/22

to OpenVDB Forum

Here's a video where a series of poses are fed into the simulation manager. Then the sensor views, given the provided configurations, are used along with an existing NanoVDB grid on the GPU to generate a sensor input. The sensor input is then processed into the occupancy map as usual. I believe the mapping resolution is 0.1m I believe.

https://www.youtube.com/watch?v=UBrlLRqY_E4

Ignacio Vizzo

unread,

Nov 7, 2022, 8:00:58 AM11/7/22

to OpenVDB Forum

Thanks for the clarification!!! Was crystal clear to me ;)

Ignacio Vizzo

unread,

Nov 7, 2022, 8:04:36 AM11/7/22

to OpenVDB Forum

Am I right that in both of your approaches a computational bottleneck is the fact that today you can only create NanoVDB grids on the CPU?

This is actually true, I also saw a NVIDIA-post while ago, where there was one comment saying that there will be soon dynamic-topology creation for VDBs on the GPU. That would be amazing for robotics applications, since we could rely on in much denser map representations for doing downstream tasks such as path planning, localization, etc, if we are able to create our maps "on-the-fly" directly on the device (GPU).

I'm currently not using NanoVDB at all, since I was awaiting for the aformentioned feature :) My mapping/fusion pipeline runs on a single CPU, and still is fast enough to process some LiDAR data, but it's not fast enough to work with RGB-D(realsense, kinect, etc) sensors in realtime.

I'm excited to know you are still developing OpenVDB at NVIDIA

Reply all

Reply to author

Forward