Hi there!
I'm surprised I never saw your work either! Feels like stuff just gets buried in the search engine sometimes.
Yeah, so the map construction occurs on the CPU. The gains over traditional methods involve parallelizing the ray-casting, as in the case of dense sensors, this step involves a lot of overlapping information. Using knowledge of the sensor characteristics and desired mapping resolution, the software pre-allocates memory on the GPU that represents a cube of voxels that completely encloses the potential view. It doesn't really preallocate this entire space though, because with a frustum, the view never is equivalent to that entire cube. So it allocates the amount of memory it expects to use for raycasting an input frame from the sensor. If for some reason it needs to do a re-allocation it will, but normally it won't, and the user can adjust the amount of memory it starts with, to prevent these re-allocations.
So it gets an input frame, copies that to the GPU, the gpu then raycasts each point in the input through the 3d space that we've preallocated. It does it in two stages, casting at the leaf node level, mapping active leaf nodes to a buffer, then raycasts at the voxel level to populate a sparse buffer that maps to the active voxels with the logOdds hit/miss values that it calculates in parallel using atomic operations. Then it compresses that buffer to uint8t representation to make the copy time as small as possible, because this is where the bulk of the overhead is when dealing with CPU->GPU->CPU.
Once this compressed buffer is back on the CPU, a smaller index buffer and knowledge of the sensor position is used to map the voxel occupancy information contained in the compressed buffer to world space and modifies the openvdb grid that exists on the CPU.
The main tricks and headaches are all in how the library converts from world space to sensor space, and then moves between leaf node and voxel space populating the necessary buffers. This is all done with the intent of keeping the memory costs fixed, re-allocations close to zero, and memory copies between GPU and CPU as fast as possible.
Obviously, because of how this is done, sparse sensors like LIDAR, where there is little overlap in the information provided by the sensor within a single frame, don't benefit from this optimization.
I have a few ideas on how to hybridize a few things for potential speed ups, but the size of the potential information space means there will also need to be more memory optimization tricks to prevent the algorithm from running out of GPU memory. And I'm currently trying to finish my PhD and that isn't a focus, so it might be a while before the LIDAR optimisations are ever done.
But for edge robotics with frustum style sensors running jetson boards or nvidia GPUs(which are a big thing in my departments UAV research, which was the motivation for the library in the first place), this provides a way to process RGBD/Stereo sensors into an occupancy map with much less CPU overhead and in real-time, freeing up cycles for planning and other tasks.
Cool thing is that the performance is affected by the target mapping resolution and the range and FOV of the sensor. So if two sensors have the same characteristics (range and FOV), but one has twice the resolution of the other, the only thing that changes is the number of rays that need to be traced. The memory allocation stays the same, and the GPU->CPU costs stay the same, and the cost of editing the map stays the same. The only thing that takes longer is the ray-casting step, which is done as quickly and in parallel as possible.
And this performance is further accelerated because I wrote a number of voxelisation filters for the input clouds that also use CUDA to map the input cloud rays to the voxel space, allowing really dense sensor inputs to be reduced very quickly to the bare minimum amount of rays without losing much information at all, and then these are raycast in parallel as normal.
Anyway, I wrote all this in a hurry, and I hope the publication is maybe more helpful in explaining what the work aims to achieve and how it does so.
Cheers!