Hi Oleg and all,
I have had unexpected and very different results when running MGM on 8-bit largely saturated images. In particular, I found elevations that could differ by up to 300 m compared to external validation in some cases for cost-mode 4. Most of these occur with xcorr-threshold set to 2 (the default now) but some even remain with a threshold of 0. Also I noticed that cost-mode 4 does a lot of interpolation in featureless areas, even with xcorr-threshold 0, while this is not the case for cost-mode 3.
Given that ASP's default value is cost-mode 4 and that the documentation recommends using this mode for featureless terrain (at
this page) , I was a bit surprised by the outcome.
I went through the different references in the documentation, in particular:
- for SGM: [Hirschmuller08] Heiko Hirschmüller. Stereo processing by semiglobal matching and mutual information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30:328–341, 2008.
- for MGM: [FDFM15] Gabriele Facciolo, Carlo De Franchis, and Enric Meinhardt. Mgm: a significantly more global matching for stereovision. In Proceedings of the British Machine Vision Conference (BMVC), BMVA Press, 90–1. 2015.
- for cost-mode 3: [ZW94] Ramin Zabih and John Woodfill. Non-parametric local transforms for computing visual correspondence. In European conference on computer vision, 151–158. Springer, 1994.
- for cost-mode 4: [HCW+16] Han Hua, Chongtai Chenb, Bo Wua, Xiaoxia Yangc, Qing Zhub, and Yulin Dingb. Texture-aware dense image matching using ternary census transform. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, pages 59–66, 2016.
This raised some questions on how things were implemented in ASP. I hope you can help answer some of those.
- in the original SGM, the regularization term contains two penalties named P1 and P2. Also P2 is suggested to be scaled to the gradient in image intensity. How are these two values set in ASP? And what function is used to scale P2? Some studies use a simple gradient, some a Canny-edge, and [HCW+16] suggest a new formula based on a combination of image gradient and standard deviation. Are P1 and P2 calculated the same way for all cost-modes?
- what does change between cost-mode 3 and 4? Is it only the image transform (binary or ternary census)? [HCW+16] also make other improvements like using a different pattern for matching (see their Fig 3) and a different P2 calculation. Are these also implemented for cost-mode 4?
- the ternary census transform basically adds a third term in case of equality (within tolerance, see their equation 4). Is the delta/tolerance of 1 pixel, or is it larger?
- are there any other big differences between the referenced papers and the actual implementation in ASP?
If you don't exactly have the answer but can point me to where this is implemented in the code, I'm happy to take a look as well.
Thanks a lot for your help!
Amaury