Scaling of CauchyLoss

Tim Pfeifer

unread,

Nov 3, 2021, 5:25:53 AM11/3/21

to Ceres Solver

Hi everyone,
I'm currently locking into the theory about robust loss functions an made some simple comparisons.
By doing so, I stumbled about the Cauchy loss that is implemented inside the ceres framework and I felt that a scaling factor of 2 was missing.
But from the beginning:

1. The squared loss is defined in [1] as rho(x) = x^2 with x as the unsquared error. (Please note, that the ceres documentation uses the squared error s = x^2)
If we define
rho(x) = -log(P(x)) + const.
and assume a Gaussian distribution, then rho(x) would actually be 1/2*x^2. Since the magical 1/2 is hidden inside the ceres internals, that seems legit.

2. The Cauchy loss is defined in [1] as rho(x) = log(1 +x^2).
The PDF of a Cauchy distribution (with a scale parameter of 1) is
P(x) = 1/pi * 1/(x^2 + 1).
Therefore, the loss function is
rho(x) = -log(P(x)) = log(x^2 + 1) + const.
The problem here is, that ceres still uses the magical factor of 1/2, so the actually used loss is 1/2*log(x^2 + 1).

It seems that the scaling of CauchyLoss and TrivialLoss are not consistent, which can affect the optimization result if different loss functions are used within the same optimization problem.
So should the CauchyLoss be changed to 2*log(x^2 + 1) or am I missing something?

Best Regards
Tim

[1] http://ceres-solver.org/nnls_modeling.html#instances

Sameer Agarwal

unread,

Nov 4, 2021, 9:23:13 AM11/4/21

to ceres-...@googlegroups.com

Tim,

I think you are right. However at this point there is a lot of code that depends on these definitions which would break if we changed the scaling factor. So we are going to have to live with this inconsistency. I can look into adding some more documentation to the header and to the website documentation about this.

Sameer

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/c2cb931b-e94c-478b-8c04-c31cc2ef4b1an%40googlegroups.com.

Tim Pfeifer

unread,

Nov 4, 2021, 11:20:18 AM11/4/21

to Ceres Solver

Hi Sameer,
thanks for your quick response!
Although it's not a really satisfying answer, I can, at least, be sure that my understanding was right. :-)

I looked further into the implementation and things got even odder.
In the source code [1], the Cauchy loss is defined as
rho(x) = a^2 * log(1 + x^2/a^2)
where $a$ is the scale of the Cauchy distribution.

If I derive the loss on my own, I get something different:
p(x) = 1/(pi*a) * 1/(1 + x^2/a^2)
rho(x) = -log(p(x))
= log(1 + x^2/a^2) + const.
Since I now doubted myself, I looked into literature [2, Appendix 6.8] and found exactly the equation that ceres is using. But I can't explain where the a^2 in front of the logarithm comes from.
Can you help me with that?

I furthermore made a quick implementation in Matlab with a scale factor of a=2, which should give a more broad cost surface.
For the version that I derived, it seems to work but the one of Ceres and [2] seems to be tighter. I also tried to recover the PDF by normalizing exp(-rho(x)) and compared it to [3] and my implementation seems correct while the other one is not.

The figure is attached...

Best Regards
Tim

References:

[1] https://github.com/ceres-solver/ceres-solver/blob/31008453fe979f947e594df15a7e254d6631881b/internal/ceres/loss_function.cc#L77

[2] Hartley & Zisserman, Multiple View Geometry in Computer Vision

[3] https://en.wikipedia.org/wiki/Cauchy_distribution#/media/File:Cauchy_pdf.svg

Markus Moll

unread,

Nov 4, 2021, 12:12:37 PM11/4/21

to ceres-...@googlegroups.com

Hi

Am Donnerstag, 4. November 2021, 16:20:18 CET schrieb 'Tim Pfeifer' via Ceres
Solver:

> I looked further into the implementation and things got even odder.
> In the source code [1], the Cauchy loss is defined as
> rho(x) = a^2 * log(1 + x^2/a^2)
> where $a$ is the scale of the Cauchy distribution.
>
> If I derive the loss on my own, I get something different:
> p(x) = 1/(pi*a) * 1/(1 + x^2/a^2)
> rho(x) = -log(p(x))
> = log(1 + x^2/a^2) + const.
> Since I now doubted myself, I looked into literature [2, Appendix 6.8] and
> found exactly the equation that ceres is using. But I can't explain where
> the a^2 in front of the logarithm comes from.
> Can you help me with that?

In [2], the authors list the Cauchy loss under "heuristic loss functions",
suggesting that despite the name it is not directly derived from the Cauchy
distribution. Indeed, they say that the scaling factor "a" determines the
range in which the loss approximates the normal loss.

(I guess you start at x² ~ log(1 + x²) for small x², and a is introduced to
affect what "small" means: x² = a² (x/a)² ~ a² log(1 + x²/a²))

Markus

Sameer Agarwal

unread,

Nov 4, 2021, 2:31:02 PM11/4/21

to ceres-...@googlegroups.com

As Markus points out, it is better to treat the cauchy loss as a heuristic construction.

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/4371219.LvFx2qVVIh%40x2.

Tim Pfeifer

unread,

Nov 8, 2021, 7:18:06 AM11/8/21

to Ceres Solver

Thanks Markus,

you are right, they treated it more in a heuristic way than I thought in the first place.

Other publications seem to vary a bit, but the majority uses the variant with the a² in front.

Then even the 1/2, that I mentioned in the first message, makes sense to locally mimic the behavior of a Gaussian if the error is small.

Best Regards
Tim

Reply all

Reply to author

Forward