Hi everyone,
I'm currently locking into the theory about robust loss functions an made some simple comparisons.
By doing so, I stumbled about the Cauchy loss that is implemented inside the ceres framework and I felt that a scaling factor of 2 was missing.
But from the beginning:
1. The squared loss is defined in [1] as rho(x) = x^2 with x as the unsquared error. (Please note, that the ceres documentation uses the squared error s = x^2)
If we define
rho(x) = -log(P(x)) + const.
and assume a Gaussian distribution, then rho(x) would actually be 1/2*x^2. Since the magical 1/2 is hidden inside the ceres internals, that seems legit.
2. The Cauchy loss is defined in [1] as rho(x) = log(1 +x^2).
The PDF of a Cauchy distribution (with a scale parameter of 1) is
P(x) = 1/pi * 1/(x^2 + 1).
Therefore, the loss function is
rho(x) = -log(P(x)) = log(x^2 + 1) + const.
The problem here is, that ceres still uses the magical factor of 1/2, so the actually used loss is 1/2*log(x^2 + 1).
It seems that the scaling of CauchyLoss and TrivialLoss are not consistent, which can affect the optimization result if different loss functions are used within the same optimization problem.
So should the CauchyLoss be changed to 2*log(x^2 + 1) or am I missing something?
Best Regards
Tim
[1]
http://ceres-solver.org/nnls_modeling.html#instances