Why does changing a multi-dimensional residual term into a scalar give an incorrect solution?

118 views
Skip to first unread message

Andrea Velasco

unread,
Jan 15, 2018, 11:08:02 AM1/15/18
to Ceres Solver


I have a problem that involves finding translations for a 3D point cloud that will take each point to a location that minimizes a certain error.


As I am new to Ceres and automatic differentiation methods, I built a very simplified version of the problem in order to learn how to use residual blocks properly when solving for several points. Also, the original problem requires using a 3rd party function, so I used the CostFunctionToFunctor interface for the simplified version, despite it not being necessary.


The simplified problem is as follows:


I have a 3D point cloud, and I want to find the translation [x y z] that will bring each point to the origin (0, 0, 0).


In my main AutoDiff Cost Function, I pass the x, y and z components of a point as doubles and set "translation" as the parameter to minimize. Inside the evaluator, I add the translation to the point as such:


T pointPlusTranslation[3];

pointPlusTranslation
[0] = T( point_x ) + translation[0];

pointPlusTranslation
[1] = T( point_y ) + translation[1];

pointPlusTranslation
[2] = T( point_z ) + translation[2];


I then pass this newly translated point to a new function that evaluates the distance between the newly translated point and the origin, using this value as the error to minimize.


In the main method, I iterate through all points in my point cloud and create a new cost function for each, adding it as a residual block to the ceres problem.


Now, this works perfectly if I split the residual per dimension as follows:


error[0] = (0. - *point_x);

error
[1] = (0. - *point_y);

error
[2] = (0. - *point_z);

In 2 iterations it reaches the ideal translation for each point.


However, if I instead pose a single scalar value as the error as such:


error[0] = (0. - *point_x) + (0. - *point_y) + (0. - *point_z);


it also converges after 2 iterations with a very small final error, but the values are incorrect.


For example, if one point is at (0.0514317, 1.40441, -11.2916), the ideal translation should be (-0.0514317, -1.40441, 11.2916).

Instead, with this error the program yields (3.57952, 2.94071, 3.3155).


In this problem, it is feasible to split the error per point dimension, but in the more complex version, the error criteria does not allow this (it’s the variance of the pixel intensity obtained by projecting the point onto several images). Regardless of the feasibility of splitting the residual like this for each problem, I simply can’t understand why it wouldn’t work to have one scalar error. I think I am missing some fundamental understanding of Ceres with an error dependent on multiple variables.


I attached a file with more detailed pseudocode for this simplified problem.


Thank you for your time!


ResidualQuestionCode.cpp

Chris Sweeney

unread,
Jan 15, 2018, 11:34:02 AM1/15/18
to ceres-...@googlegroups.com

Ceres minimizes the sum of squared errors of your residual. These aren't minimizing the same costs.


--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/69e97289-2daa-4926-a7e1-c6dcb2061b27%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Message has been deleted

Andrea Velasco

unread,
Jan 15, 2018, 12:36:57 PM1/15/18
to Ceres Solver
It's true that they are not the same costs, but

1. Shouldn't both of them prompt the minimization to find the translations that causes the distance from the point to the origin to be zero?
2. If I wanted to solve this problem with a one dimensional error, what would be the right way to go about it? If I try to "replicate" Ceres' behavior of using the sum of squared errors as such:

error[0] = (0. - *point_x)*(0. - *point_x) + (0. - *point_y)*(0. - *point_y) + (0. - *point_z)*(0. - *point_z);

the error barely changes and it does not converge.

Thanks again.

Sameer Agarwal

unread,
Jan 15, 2018, 10:27:02 PM1/15/18
to ceres-...@googlegroups.com
Andrea,
There are two different (and interesting) questions here:

1. What is the reason not to reduce a multidimensional residual to a single scalar?

Any non-linear optimization problem with say an n-dimensional residual can be reduced a one-dimensional optimization problem.  For example if you are trying to minimize |f|^2 where f is a n-dimensional residual function, you could always replace it by minimize of (|f|)^2, where you define a new function which is the norm of f, and then give that to the non-linear least squares minimizer as a one dimensional function.

Mathematically you have not changed anything. The two problems have the same solution. 

But in terms of the information you are giving to the solver, you have significantly reduced the information available to it. In the case where f is a n-dimensional residual, say depending on d parameters, the Jacobian matrix (J) is a nxd dimensional, containing detailed information about how each entry of the residual vector is affected by changes in the parameters. Allowing us to construct the Gauss-Newton Hessian matrix J'J which is roughly equal to the Hessian matrix near the solution, which in turn is why we get second order convergence. 

in the case where you reduce it to a single dimensional residual by taking its norm, we are dealing with 

(f'f)^{1/2} as the residual and the jacobian of this function is a vector, just the scaled gradient, so by doing this, you have reduced the optimization algorithm to a gradient descent. Not something you want to do.

2. Why are you getting a solution which has a small error but different answer.

Now, if your optimization problem is simple enough, gradient descent will also find the solution. But two things to keep in mind. It may find another local minima, or your problem may have large basin of small error around the optimum. 

Sameer



Andrea Velasco

unread,
Jan 18, 2018, 4:21:45 AM1/18/18
to Ceres Solver
Dear Sameer,

Thank you so much for your response. It was very helpful to realize I was focusing too much on the math and not enough on what I was feeding the solver, and I now see why three residuals works better than one for the case I described - the more residuals, the more informative their Jacobian matrix will be. In fact, I realized that if the number of equations/residual functions is less than the amount of parameters we're solving for, the system is underdetermined and the Gauss-Newton iterations won't work, no?

So, for problems like this one where I can easily come up with three residuals, this is fine. However, what if I have only one error metric (and thus, one residual function) dependent on all 3 variables i'm minimizing with no way to separate them?

I'll explain myself: 
I have an AutoDiff functor that takes in a 3D point from a point cloud along with its associated normal, images and cameras. It minimizes a translation vector with 3 components that'll take the point to a new location that minimizes the variance in the pixel intensity when projected to each one of its images.

So, the evaluating operator adds one translation component to its respective x,y,z component of the point, just like in the original problem I posted. 

The complication comes here, where I have a NumericDiff cost function that uses the CostFunctionToFunctor class to perform the error calculation:
  • Project point to all images (we can assume it correctly projects within their bounds)
  • For each image, save the intensity value of the pixel the point corresponds to
  • Calculate the variance between all these intensity values, which is what we want to reduce
  • Assign this variance directly to the error. This is where I think I'm wrong.
It converges at the first iteration with a 0 change in cost. While this formulation makes sense to me intuitively, I know I'm going about solving this with Ceres incorrectly.

I've tried several variations including solving for the point itself since the initialization is good, instead of an added translation with a random initialization, but the same thing happens. I do not see a way to give the solver more information, since the translations I want to minimize and the error have so many layers in between (the projection, the accumulation of intensities, the variance calculation...) as opposed to being a straightforward function with explicit dependence on my parameters, as is the case in most examples.

How would you suggest that I formulate this kind of error? Again, I apologize in advance for any fundamental theoretical or practical misunderstanding I might have.

Andrea
Message has been deleted

Andrea Velasco

unread,
Jan 18, 2018, 5:45:56 AM1/18/18
to Ceres Solver
I think I found a way to phrase my doubt better: my error function (variance) isn't directly based on the minimization parameters (translations), it's based on pixel intensity, that is, scalars! This whole projection function that takes my 3D point + 3D translation is essentially a map from R3 to R1, and all differentiable information is lost, I think. I'm looking into the BiCubicInterpolator class to get a smooth approximation of my images. I hope I'm on the right track!

Sameer Agarwal

unread,
Jan 21, 2018, 7:00:16 PM1/21/18
to ceres-...@googlegroups.com
Hi Andrea,
My replies are inline.

On Thu, Jan 18, 2018 at 1:21 AM Andrea Velasco <avgo...@gmail.com> wrote:
Dear Sameer,

Thank you so much for your response. It was very helpful to realize I was focusing too much on the math and not enough on what I was feeding the solver, and I now see why three residuals works better than one for the case I described - the more residuals, the more informative their Jacobian matrix will be. In fact, I realized that if the number of equations/residual functions is less than the amount of parameters we're solving for, the system is underdetermined and the Gauss-Newton iterations won't work, no?

We don't do pure Gauss-Newton iterations. The Levenberg-Marquardt diagonal regularization ensures that the system is full rank.
 

So, for problems like this one where I can easily come up with three residuals, this is fine. However, what if I have only one error metric (and thus, one residual function) dependent on all 3 variables i'm minimizing with no way to separate them?

That should be okay.
 

I'll explain myself: 
I have an AutoDiff functor that takes in a 3D point from a point cloud along with its associated normal, images and cameras. It minimizes a translation vector with 3 components that'll take the point to a new location that minimizes the variance in the pixel intensity when projected to each one of its images.

So, the evaluating operator adds one translation component to its respective x,y,z component of the point, just like in the original problem I posted. 

The complication comes here, where I have a NumericDiff cost function that uses the CostFunctionToFunctor class to perform the error calculation:
  • Project point to all images (we can assume it correctly projects within their bounds)
  • For each image, save the intensity value of the pixel the point corresponds to
  • Calculate the variance between all these intensity values, which is what we want to reduce
  • Assign this variance directly to the error. This is where I think I'm wrong.
if you are trying to minimize the variance, then your residual should be the pixel intensity in each image minus the mean pixel intensity across images.

 
Sameer
 

Sameer Agarwal

unread,
Jan 21, 2018, 7:02:00 PM1/21/18
to ceres-...@googlegroups.com
On Thu, Jan 18, 2018 at 2:45 AM Andrea Velasco <avgo...@gmail.com> wrote:
I think I found a way to phrase my doubt better: my error function (variance) isn't directly based on the minimization parameters (translations), it's based on pixel intensity, that is, scalars! This whole projection function that takes my 3D point + 3D translation is essentially a map from R3 to R1, and all differentiable information is lost, I think. I'm looking into the BiCubicInterpolator class to get a smooth approximation of my images. I hope I'm on the right track!

why should going from R3 to R1 destroy differentiable structure?
I think a mathematically precise description of your optimization problem will help inform the discussion better.

Sameer
 
Reply all
Reply to author
Forward
0 new messages