Binary Descriptors

Chris Sweeney

unread,

Aug 18, 2015, 12:19:52 PM8/18/15

to theia-visi...@googlegroups.com

Is anybody actively using the binary descriptors in Theia?

I have been thinking of removing binary descriptors entirely, as they create headaches for compiling and make the descriptor interface quite messy. However, if anybody is using them in a significant way then I can keep them.

Chris

Aaron K

unread,

Feb 9, 2018, 11:25:21 PM2/9/18

to Theia Vision Library

Bringing this back from the dead. :)

I see they were yanked out quite some time ago. I am going to test out LATCH descriptors which are binary. Is everyone pretty much just using SIFT/floating point descriptors?

Maybe there is a really simple/easy modification to allow for non-floating point descriptor to be added though a template type?

Aaron

charl...@gmail.com

unread,

Dec 31, 2018, 5:47:04 PM12/31/18

to Theia Vision Library

I'm experimenting with OpenCVs ORB now and attempting to understand how to compare binary descriptors and filter out 'realistic' pairs. I am interested in creating a 'near real-time' solution with Theia's API. I have read ORB is used with OpenCV's SFM framework here: https://hub.packtpub.com/exploring-structure-motion-using-opencv/ So I'm just looking to see how much faster (with respect to robustness, if any) ORB would be over SIFT or AKAZE

My understanding is SIFT & Akaze provide good feature coverage of the entire image, and ORB will mostly find "HARRIS corners". However, I'm curious to see what a theia reconstruction would even look like. It may be enough information (sparse) to perform a post-process and infer surface areas/etc, but i concede i do not know.

I am still trying to learn all of this when I have time. My goal in 2019 is to learn more about floating and binary descriptors, and then dive in to the global and incremental reconstruction estimators for posing and solving the sparse point cloud. There is so much to learn in computer vision but i feel i am slowly getting there!

Chris Sweeney

unread,

Jan 2, 2019, 10:59:51 AM1/2/19

to Charles O, Theia Vision Library

I'm not really sure what you mean about filtering out "realistic" pairs -- what does "realistic" mean here?

Note that the keypoint detector (e.g., SIFT's blob detector or Harris Corners) are distinct from the descriptors (e.g., SIFT or ORB). Most often the descriptor and detectors come together but they don't have to. I've experimented a bit with using both AKAZE and SIFT features in SfM -- there are more points in the final reconstruction, but I can't say I ever found it to increase robustness or accuracy.

Originally Theia had binary and float descriptors in it, but I found SIFT to perform the best every time so I removed the binary descriptors because it simplified the code greatly. Also, many binary descriptors make compromises in order to be real-time -- ORB for instance is rotation-invariant but not scale-invariant (SIFT is scale-invariant) so it has worse performance for wide baseline matching and loop closure.

If you're looking to increase speed without sacrificing robustness here are some tips:

Spend more energy on improving the matching stage (usually feature extraction is not the performance bottleneck)
If you need to speed up feature extraction, just find a gpu/cuda version of SIFT to use (there are plenty on github)
If you're using videos, look into optical flow trackers. I've used OpenCV's KLT tracker and LDOF but you could also use something more recent like FlowNet. You can run the flow tracker through the whole video to get your correspondences, then use those correspondences to estimate relative poses between keyframes for the ViewGraph with Theia
Kind of related, but check out the "progressive all the way" reconstruction lib for faster dense reconstructions with Theia: https://github.com/alexlocher/patw

Feel free to ping me with SfM questions. I'd highly recommend Rick Szeliski's Computer Vision book -- it's very easy to grasp and touches most of the relevant areas you'll be interested in.

Chris

--
You received this message because you are subscribed to the Google Groups "Theia Vision Library" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theia-vision-lib...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

charl...@gmail.com

unread,

Jan 3, 2019, 2:32:16 PM1/3/19

to Theia Vision Library

Thanks for the response. Regarding the comment on 'realistic' pairs, I'm referring to pairs of matched key points between two images. My rudimentary understanding here is that a homography is used to determine if the keypoints that are matched fit in a 'realistic' (physics constrained?) 3d scene. I think i'm conflating terms a bit, but the instinct is that not all ORB matched keypoints are 'good' matches, and thus should be filtered to only keep matches that solve the problem correctly. I was reading about this some here, https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html?highlight=findhomography#findhomography

I understand how detectors and descriptors are used differently. I too have found that using SIFT (Dense) to generate matches in std::vector, then appending that vector with AKAZE (Sparse/Normal) matches does in fact yield more points. But I too am not sure if this is significantly improving the robustness or accuracy.

Agree 100%, i have found SIFT to be the best performer in almost ALL circumstances, provided the affine distortion (transition tilt in ASIFT?) is not excessive. I have read a bit on the SIFT parameters and have struggled some with very high resolution (>4K) imagery in getting features to cover all of intricate areas of interest. Ultimately I found that Akaze worked well for detecting key points on man-made intricate areas, while SIFT works well across areas with less variance such as walls, streets and terrain without complex vegetation.

I've got two matching strategies prototyped that i hope to improve the matching stage. One is using a computed optical flow and the theia::L2 distance to compute 'forward matches' leveraging the optical flow map. Using a bounding box we brute force match the descriptors between the predicate and forward box and only add them to matches vector after a lowes ratio check. This seems to work pretty well but I need to do more testing.

Been looking at using CUDA for feature extraction, but after reading how fast CUDA can operate, I realize I need to spend more time with matching and pose estimation, etc. As you said, don't over think it, just use a GPU and crank out the features & keypoints, right?

I looked at progressive all the way and it looks very interesting. in other work we are heavily investigating using Octrees and can now render hundreds of millions of points on Intel HD 'realtime' (>60fps). There are some custom shaders as well that help with creating a natural look. This is getting me to lean more towards generating very dense point clouds, and less worry about meshing and texturing, but still interesting for exports to other applications that cannot render point clouds well. The octree seems to also lend itself well to compression and change detection!

Last night I started Rick Szeliski's CV book. I'm just getting through the book overview and excited to start chapter 2! Thank you for this recommendation, I'll likely be bouncing back in forth between Khan Academy to sharpen the math skills and this book. I will definitely reach out when I get to the point i can ask more intelligent questions without conflicting the jargon/terms.