Technical details on the SGM/MGM and cost-mode implementations

7 views

Skip to first unread message

Amaury Dehecq

unread,

Jun 27, 2024, 10:52:17 AM (6 days ago) Jun 27

to Ames Stereo Pipeline Support

Hi Oleg and all,

I have had unexpected and very different results when running MGM on 8-bit largely saturated images. In particular, I found elevations that could differ by up to 300 m compared to external validation in some cases for cost-mode 4. Most of these occur with xcorr-threshold set to 2 (the default now) but some even remain with a threshold of 0. Also I noticed that cost-mode 4 does a lot of interpolation in featureless areas, even with xcorr-threshold 0, while this is not the case for cost-mode 3.
Given that ASP's default value is cost-mode 4 and that the documentation recommends using this mode for featureless terrain (at this page) , I was a bit surprised by the outcome.

I went through the different references in the documentation, in particular:

- for SGM: [Hirschmuller08] Heiko Hirschmüller. Stereo processing by semiglobal matching and mutual information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30:328–341, 2008.

- for MGM: [FDFM15] Gabriele Facciolo, Carlo De Franchis, and Enric Meinhardt. Mgm: a significantly more global matching for stereovision. In Proceedings of the British Machine Vision Conference (BMVC), BMVA Press, 90–1. 2015.

- for cost-mode 3: [ZW94] Ramin Zabih and John Woodfill. Non-parametric local transforms for computing visual correspondence. In European conference on computer vision, 151–158. Springer, 1994.
- for cost-mode 4: [HCW+16] Han Hua, Chongtai Chenb, Bo Wua, Xiaoxia Yangc, Qing Zhub, and Yulin Dingb. Texture-aware dense image matching using ternary census transform. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, pages 59–66, 2016.

This raised some questions on how things were implemented in ASP. I hope you can help answer some of those.
- in the original SGM, the regularization term contains two penalties named P1 and P2. Also P2 is suggested to be scaled to the gradient in image intensity. How are these two values set in ASP? And what function is used to scale P2? Some studies use a simple gradient, some a Canny-edge, and [HCW+16] suggest a new formula based on a combination of image gradient and standard deviation. Are P1 and P2 calculated the same way for all cost-modes?
- what does change between cost-mode 3 and 4? Is it only the image transform (binary or ternary census)? [HCW+16] also make other improvements like using a different pattern for matching (see their Fig 3) and a different P2 calculation. Are these also implemented for cost-mode 4?

- the ternary census transform basically adds a third term in case of equality (within tolerance, see their equation 4). Is the delta/tolerance of 1 pixel, or is it larger?

- are there any other big differences between the referenced papers and the actual implementation in ASP?

If you don't exactly have the answer but can point me to where this is implemented in the code, I'm happy to take a look as well.

Thanks a lot for your help!
Amaury

Oleg Alexandrov

unread,

Jun 27, 2024, 12:09:21 PM (6 days ago) Jun 27

to Amaury Dehecq, Ames Stereo Pipeline Support

Amaury,

Perhaps Scott will have time to answer these in detail.

My take is that one should be quite careful in saturated areas, as those are not well-defined.

The cost functions are defined here: https://github.com/visionworkbench/visionworkbench/blob/master/src/vw/Stereo/CostFunctions.h#L143

Their use is here: https://github.com/visionworkbench/visionworkbench/blob/master/src/vw/Stereo/SGM.cc. This talks about p1 and p2.

These two should be enough to see what decisions were made. Can also look at SGM.h in the same place, if need be.

Happy to look at things in more detail if the above sources are not sufficient. Let me know.

--
You received this message because you are subscribed to the Google Groups "Ames Stereo Pipeline Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ames-stereo-pipeline...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ames-stereo-pipeline-support/2d974ebe-c620-49de-8125-6694be88b9c6n%40googlegroups.com.

Scott McMichael

unread,

Jun 27, 2024, 1:17:28 PM (6 days ago) Jun 27

to Ames Stereo Pipeline Support

I have had unexpected and very different results when running MGM on 8-bit largely saturated images. In particular, I found elevations that could differ by up to 300 m compared to external validation in some cases for cost-mode 4. Most of these occur with xcorr-threshold set to 2 (the default now) but some even remain with a threshold of 0. Also I noticed that cost-mode 4 does a lot of interpolation in featureless areas, even with xcorr-threshold 0, while this is not the case for cost-mode 3.
Given that ASP's default value is cost-mode 4 and that the documentation recommends using this mode for featureless terrain (at this page) , I was a bit surprised by the outcome.

That sounds similar to a lot of the Icebridge data that we processed using MGM. It is not surprising to get some bad results with data like that but 300m is a lot. Are these spikes in the output or is the entire DEM shifted by that much?

This raised some questions on how things were implemented in ASP. I hope you can help answer some of those.
- in the original SGM, the regularization term contains two penalties named P1 and P2. Also P2 is suggested to be scaled to the gradient in image intensity. How are these two values set in ASP? And what function is used to scale P2? Some studies use a simple gradient, some a Canny-edge, and [HCW+16] suggest a new formula based on a combination of image gradient and standard deviation. Are P1 and P2 calculated the same way for all cost-modes?

Oleg pointed you to the location where P1 and P2 are set. They are scaled based on the type of census transform and kernel size but are not modified based on the image content. The values were experimentally determined and are not exposed to the user.

- what does change between cost-mode 3 and 4? Is it only the image transform (binary or ternary census)? [HCW+16] also make other improvements like using a different pattern for matching (see their Fig 3) and a different P2 calculation. Are these also implemented for cost-mode 4?

Cost mode 3/4 decides between using binary or ternary census and adjusts P1/P2 but does not use a different matching pattern.

- the ternary census transform basically adds a third term in case of equality (within tolerance, see their equation 4). Is the delta/tolerance of 1 pixel, or is it larger?

This is set to 5.

- are there any other big differences between the referenced papers and the actual implementation in ASP?

I don't recall the details of the different papers but the major features of the ASP implementation (besides the exposed options) are that it performs a 2D search instead of a 1D search and the per-pixel optimization of disparity search range that it performs. The search range optimization can lead to errors in some situations but the time savings are usually large.

If you don't exactly have the answer but can point me to where this is implemented in the code, I'm happy to take a look as well.

Most of the gruesome details can be found in this file but I will do my best to answer questions without you having to look through it =)

https://github.com/visionworkbench/visionworkbench/blob/master/src/vw/Stereo/SGM.cc

It may be useful to try running the "external" SGM/MGM algorithms that ASP ships with to get a good comparison.