Correct usage for triplet loss / non-metric multidimensional scaling problem

Kip Warner

unread,

Jul 15, 2024, 2:45:28 PM7/15/24

to ceres-...@googlegroups.com

Hello Ceres Solver users,

I am experimenting with Ceres Solver (2.2.0) to try and solve a triplet
loss optimization problem for object similarity detection.

I have a bunch of triplet training examples of the form <A, P, N> where
A is some object (the anchor) and P is a positive example of some
object that is more similar to A than the negative example N. The
number of triplet training examples is not known at compile time
because they are provided by the user.

Each object has some R^d dimensional embedding that I'd like to
generate based on the object's features and respective weights applied
to each component in the vector embedding. I am using Ceres Solver to
try and find the optimal weights vector to ensure that, given a
distance metric, the distance between the embeddings is brought closer
together for the As and Ps while the As and Ns are pushed farther apart
for as many training examples as possible.

I suspect I am doing something wrong because for around 30K triplets
Ceres Solver is taking over 20 minutes and appears to be using only a
single thread. My embedding function might well be expensive, but I
suspect that's not the primary issue. Here is the essence of how I am
currently computing the weights, which is surely incorrect:

https://pastebin.com/yzCCY7He

The first function, TripletCostFunctor::operator(), if I understand
correctly, should take the current candidate weights and sum their cost
into a residual bucket. It does this by looping over all of the
triplets where the distance is computed between the anchor <-->
positives' embeddings and the anchor <--> negative based on the
candidate weights. If the distance between the anchor <--> positive was
not less than the anchor <--> negative, then there is a cost.

The second function, WeightOptimizer::Train(), sets up the problem by
first creating a starting weight vector of unity weights (all ones). I
then tell the library I would like to auto differentiate the
TripletCostFunctor with a single residual and add a residual block. I
also tell the library I would like my weights to be in the range of [0,
1]. I set a few other solver options as you can see and then tell it to
begin solving.

I suspect that I am doing something wrong with how I am creating the
residual block (should there be more than one?) and storing the
aggregate triplet loss cost, and potentially some other things with the
library which are causing performance degradation. Any help
appreciated.

PS I know I've said it before to him, but I'd like to take a moment to
recognize Sameer for his contributions to creating such a marvellous
library. It took an enormous amount of work, is well written, well
documented, and has surely aided countless.

--
Kip Warner
OpenPGP signed/encrypted mail preferred
https://www.thevertigo.com

signature.asc

Sameer Agarwal

unread,

Jul 15, 2024, 2:47:18 PM7/15/24

to ceres-...@googlegroups.com

Kip,

can you post the output of Solver::Summary::FullReport for a smaller instance?

also the pastebin link does not work.

Sameer

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/0806a52bddaea5b5daac769d7a397aaf4cf027e4.camel%40thevertigo.com.

Kip Warner

unread,

Jul 15, 2024, 3:12:41 PM7/15/24

to ceres-...@googlegroups.com

On Mon, 2024-07-15 at 11:47 -0700, 'Sameer Agarwal' via Ceres Solver
wrote:

> Kip,
> can you post the output of Solver::Summary::FullReport for a smaller
> instance? also the pastebin link does not work.

Apologies Sameer. The link should work now:

https://pastebin.com/yzCCY7He

Here's the full report:

Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-suitesparse-(7.6.1)-eigensparse)

Original Reduced
Parameter blocks 1 1
Parameters 34 34
Residual blocks 1 1
Residuals 1 1

Minimizer TRUST_REGION

Dense linear algebra library EIGEN
Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver DENSE_QR DENSE_QR
Threads 16 16
Linear solver ordering AUTOMATIC 1

Cost:
Initial 4.669543e+08
Final 4.572289e+08
Change 9.725384e+06

Minimizer iterations 12
Successful steps 6
Unsuccessful steps 6
Line search steps 0

Time (in seconds):
Preprocessor 0.001490

Residual only evaluation 3.331208 (12)
Line search cost evaluation 0.000000
Jacobian & residual evaluation 1220.004134 (18)
Line search gradient evaluation 814.350305
Linear solver 0.000341 (12)
Line search polynomial minimization 0.000000
Minimizer 1223.336643

Postprocessor 0.000004
Total 1223.338137

Termination: CONVERGENCE (Function tolerance reached. |cost_change|/cost: 1.520767e-08 <= 1.000000e-06)

signature.asc

Sameer Agarwal

unread,

Jul 15, 2024, 4:11:42 PM7/15/24

to ceres-...@googlegroups.com

From the summary report we can tell that you have one residual block, so just one cost function which is constructing a single scalar residual, and thats where all the time is going.

The problem is the following loop.

for(size_t TrainingExampleIndex = 0;
TrainingExampleIndex < TrainingExamples.size();
++TrainingExampleIndex)
{
// Unpack current triplet into anchor, positive, and negative
// examples...
const auto &[AnchorIndex, PositiveIndex, NegativeIndex] =
TrainingExamples.at(TrainingExampleIndex);
// Compute distance from the anchor to positive example...
const Type AnchorToPositive = GetDistance(
AnchorIndex,
PositiveIndex,
Weights);
// Compute distance from the anchor to negative example...
const Type AnchorToNegative = GetDistance(
AnchorIndex,
NegativeIndex,
Weights);
// Compute loss of this triplet...
const Type TripletLoss = fmax(
Type(1.0) + AnchorToPositive - AnchorToNegative,
Type(0.0));
// The residual is the cost of the given weights on our triplets...
Residual[0] += TripletLoss;
}

so effectively what you are doing is residual += cost[i]

ceres cannot thread inside your code.

try creating a single residual per triplet.

which just returns the loss for that triplet. that way you will create as many residual blocks as your observations, and ceres will evaluate them using multiple threads.

This will also give ceres a jacobian, right now you are just using a single scalar residual, so effectively you are performing gradient descent and not getting any benefit from levenberg marquardt.

Sameer

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/fc531861dc7bd5c81fb95c80797483d0b218af84.camel%40thevertigo.com.

Kip Warner

unread,

Jul 15, 2024, 4:24:06 PM7/15/24

to ceres-...@googlegroups.com

On Mon, 2024-07-15 at 13:11 -0700, 'Sameer Agarwal' via Ceres Solver
wrote:

> try creating a single residual per triplet.
> which just returns the loss for that triplet. that way you will
> create as many residual blocks as your observations, and ceres will
> evaluate them using multiple threads.
> This will also give ceres a jacobian, right now you are just using a
> single scalar residual, so effectively you are performing gradient
> descent and not getting any benefit from levenberg marquardt.

If you look at L104-L115, that is where I add the cost functor and the
residual block. If I understand correctly, are you suggesting I should
take that, wrap it in a loop like so?

for(size_t TripletIndex = 0;
TripletIndex < GetTrainingExamplesTotal();
++TripletIndex)
{
// Allocate cost function such that its derivative will be automatically
// computed via Jacobian auto-differentiation. AutoDiffCostFunction
// takes ownership of the TripletCostFunctor and is responsible for
// cleaning it up when the problem object destructs...
ceres::CostFunction *CostFunctor = new ceres::AutoDiffCostFunction
<
TripletCostFunctor, /* Type of cost functor */
1, /* Number of residuals */
FeatureWeights<double>::AllWeightsCount /* Size of parameter block */
>(new TripletCostFunctor(*this));

// Set it within the problem object as the cost function...
CeresProblem.AddResidualBlock(
CostFunctor,
nullptr,
AllWeightsAsDouble1DArray.data());
}

And then change the TripletCostFunctor::operator() to get rid of its
loop so that it only computes the cost of a single triplet, assigning
to Residual[0]?

If I do that, the functor needs to know which triplet to process when
invoked. Is the recommended Ceres Solver approach via the functor's
constructor when the ceres::CostFunction is allocated to something like
the following?

ceres::CostFunction *CostFunctor = new ceres::AutoDiffCostFunction
<
TripletCostFunctor,
1,
FeatureWeights<double>::AllWeightsCount
>(new TripletCostFunctor(*this, TripletIndex));

signature.asc

Sameer Agarwal

unread,

Jul 15, 2024, 4:27:06 PM7/15/24

to ceres-...@googlegroups.com

yes.

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/90cf1c08016ebef8530a0b473bbf7ef7cc41fece.camel%40thevertigo.com.

Kip Warner

unread,

Jul 15, 2024, 4:56:45 PM7/15/24

to ceres-...@googlegroups.com

On Mon, 2024-07-15 at 13:26 -0700, 'Sameer Agarwal' via Ceres Solver
wrote:
> yes.

Thanks Sameer. That reduced the training time significantly, but the
resulting solution weight vector looks suspect:

Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-suitesparse-(7.6.1)-eigensparse)

Original Reduced
Parameter blocks 1 1
Parameters 34 34

Residual blocks 29488 29488
Residuals 29488 29488

Minimizer TRUST_REGION

Dense linear algebra library EIGEN
Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver DENSE_QR DENSE_QR
Threads 16 16
Linear solver ordering AUTOMATIC 1

Cost:

Initial 1.523965e+04
Final 1.474400e+04
Change 4.956501e+02

Minimizer iterations 6
Successful steps 6
Unsuccessful steps 0

Line search steps 0

Time (in seconds):

Preprocessor 0.002314

Residual only evaluation 0.128700 (5)

Line search cost evaluation 0.000000

Jacobian & residual evaluation 79.086017 (11)
Line search gradient evaluation 35.955970
Linear solver 0.052634 (5)

Line search polynomial minimization 0.000000

Minimizer 79.281790

Postprocessor 0.000653
Total 79.284758

Termination: CONVERGENCE (Gradient tolerance reached. Gradient max norm: 3.697043e-13 <= 1.000000e-10)

Feature weights before / after training:
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 1
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 1

signature.asc

Sameer Agarwal

unread,

Jul 15, 2024, 5:35:00 PM7/15/24

to ceres-...@googlegroups.com

so now its worth looking at what your weights are saying, and how that translates into your cost.

at this point this is not a code/performance problem but rather a modeling issue. Note that your cost has not changed much. It is worth taking a closer look at the max that you are taking in your cost function.

Sameer

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/77eb663680383378d0ecba7a053511018f712b10.camel%40thevertigo.com.

Kip Warner

unread,

Jul 15, 2024, 10:19:04 PM7/15/24

to ceres-...@googlegroups.com

On Mon, 2024-07-15 at 14:34 -0700, 'Sameer Agarwal' via Ceres Solver
wrote:

> so now its worth looking at what your weights are saying, and how
> that translates into your cost.
> at this point this is not a code/performance problem but rather a
> modeling issue. Note that your cost has not changed much. It is worth
> taking a closer look at the max that you are taking in your cost
> function.

Thanks for your patience Sameer.

I've spent the day fiddling with it and I found a problem in one of the
scenarios where I use cosine similarity as the distance metric. Since
that ranges from [-1, 1], that would break the fmax() in the cost
function. What I'm doing now on L55 should be hopefully more sane in
using cosine distance (ranges from [0, 2]) instead where a smaller
number is better.

https://pastebin.com/yzCCY7He

Except there's still something wrong:

Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-suitesparse-(7.6.1)-eigensparse)

Original Reduced
Parameter blocks 1 1
Parameters 34 34

Residual blocks 28614 28614
Residuals 28614 28614

Minimizer TRUST_REGION

Dense linear algebra library EIGEN
Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver DENSE_QR DENSE_QR
Threads 16 16
Linear solver ordering AUTOMATIC 1

Cost:

Initial 1.378852e+04
Final 7.640724e+03
Change 6.147797e+03

Minimizer iterations 30
Successful steps 26
Unsuccessful steps 4
Line search steps 44

Time (in seconds):
Preprocessor 0.002781

Residual only evaluation 0.866667 (30)

Line search cost evaluation 0.000000

Jacobian & residual evaluation 766.859171 (96)
Line search gradient evaluation 559.301355
Linear solver 0.317060 (30)
Line search polynomial minimization 0.001110
Minimizer 768.117835

Postprocessor 0.000481
Total 768.121098

Termination: CONVERGENCE (Function tolerance reached. |cost_change|/cost: 9.697896e-07 <= 1.000000e-06)

Feature weights before / after training:
1 0
1 0
1 0
1 0

1 0.815256

1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0

1 0.772392

1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 1
1 0

1 0.854761

1 0
1 1
1 0

There's three things that stand out. The first is the cost was reduced
from 13788.5 to 7640.72 after 29 iterations. That's good.

But the second thing is the resulting weight vector you see above looks
funny in mostly zero components. That might have been a mathematically
sensible solution, if it wasn't for the third issue.

After the optimizer completes I compute the accuracy of the solution by
applying it to a test set of triplets (L179) that were not trained on.
The number of predicates that held true was zero. So something is
clearly wrong and I'm stumped again.

signature.asc

Kip Warner

unread,

Jul 17, 2024, 12:31:36 AM7/17/24

to ceres-...@googlegroups.com

On Mon, 2024-07-15 at 14:34 -0700, 'Sameer Agarwal' via Ceres Solver
wrote:

> so now its worth looking at what your weights are saying, and how
> that translates into your cost.
> at this point this is not a code/performance problem but rather a
> modeling issue. Note that your cost has not changed much. It is worth
> taking a closer look at the max that you are taking in your cost
> function.

Sameer, I think I may have found the problem after many more hours. The
gradients are apparently not being computed correctly. I've now enabled
check_gradients in the Solver::Options on L134:

https://pastebin.com/yzCCY7He

When I run with that flag I see immediately after iteration 0:

E20240716 21:01:23.947016 2398748
gradient_checking_cost_function.cc:137] Gradient error detected.
Terminating solver.

This happens regardless of whether I constrain the solution with
SetParameterLowerBound / SetParameterUpperBound or not.

Since my CostFunction is a AutoDiffCostFunction, should the optimizer
not compute the partial derivatives for each component of the solution
vector automatically? The cost functor is simple at L19-L64. I've
looked at the fmax() and I don't see an issue with it in how it's
computing the cost, unless I'm missing something?

I have not compiled my own Ceres Solver, but using the vanilla build
(2.2.0) that ships with Ubuntu Noble (amd64).

signature.asc

Sameer Agarwal

unread,

Jul 18, 2024, 11:05:22 PM7/18/24

to ceres-...@googlegroups.com

There is nothing wrong with ceres. The problem is the fmax.

const Type TripletLoss = fmax(

(AnchorToPositiveCosineDistance) - (AnchorToNegativeCosineDistance) +
AlphaMargin,
Type(0.0));

The problem is that if the first term in the max is less than 0, then your loss is always zero, which is a constant function with zero derivative.

what are you trying to do is

sum_i [max(f_i - g_i, 0)]^2

and your parameters move to the point where f_i - g_i <= 0, and your derivatives go to zero, and the optimizer believes it has found a local minimum.

have a look at your triplets, is it the case, that most or all of them are triggering the max with 0?

Sameer

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/451b1fe22d80cfc616ddeabfde8eb1889948cffc.camel%40thevertigo.com.

Kip Warner

unread,

Jul 19, 2024, 1:05:30 AM7/19/24

to ceres-...@googlegroups.com

On Thu, 2024-07-18 at 20:05 -0700, 'Sameer Agarwal' via Ceres Solver
wrote:

> There is nothing wrong with ceres. The problem is the fmax.

Hey Sameer. Literally a minute before your email arrived I realized the
function might not be correctly differentiating because it's not smooth
as a piece-wise function. If that is the case then the gradient check
error is correct.

>    1. const Type TripletLoss = fmax(
>    2. (AnchorToPositiveCosineDistance) -
> (AnchorToNegativeCosineDistance) +
>    3. AlphaMargin,
>    4. Type(0.0));

>
> The problem is that if the first term in the max is less than 0, then
> your loss is always zero, which is a constant function with zero
> derivative.
>
> what are you trying to do is
>
> sum_i [max(f_i - g_i, 0)]^2

Yes, but with the alpha / margin term in there too.

> and your parameters move to the point where f_i - g_i <= 0, and your
> derivatives go to zero, and the optimizer believes it has found a
> local minimum.
>
> have a look at your triplets, is it the case, that most or all of
> them are triggering the max with 0?

If you scroll down to L235, I've logged within the cost functor when
instantiated as double type. The resulting log was over a 100MB for
several thousand triplets, so I truncated it to only include training
for three of them:

https://pastebin.com/yzCCY7He

The logging format is:

Triplet(AnchorID, PositiveID, NegativeID) \
AP=cosine_distance_between_anchor_and_positive \
AN=cosine_distance_between_anchor_and_negative \
arguments and value of fmax

e.g.

...
Triplet=(47,49,37) AP=0.0162818 AN=0.00593436 max=(0.0162818 - 0.00593436 + 1, 0) = 1.01035
...

Interestingly, in neither the truncated log nor the complete one did I
see any of the fmax values evaluate to zero. They all seemed to be
close to 1.

signature.asc

Sameer Agarwal

unread,

Jul 19, 2024, 1:09:49 AM7/19/24

to ceres-...@googlegroups.com

In that case what is your cosine weight computation like?

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/f9441c29871efd68074d6141eca65979565a15fe.camel%40thevertigo.com.

Kip Warner

unread,

Jul 19, 2024, 1:20:24 AM7/19/24

to ceres-...@googlegroups.com

On Thu, 2024-07-18 at 22:09 -0700, 'Sameer Agarwal' via Ceres Solver
wrote:

> In that case what is your cosine weight computation like?

As in how cosine similarity is computed on the object embeddings or how
are the embeddings themselves computed from the weights? If the former,
it's just the ratio of the two embeddings' dot and norm products with
the help of libeigen. See L70:

https://pastebin.com/yzCCY7He

If you're asking about the embeddings themselves I could have it log
the values of the embeddings at each iteration and see if we can
observe anything interesting?

signature.asc

Kip Warner

unread,

Jul 19, 2024, 9:35:43 PM7/19/24

to ceres-...@googlegroups.com

On Thu, 2024-07-18 at 22:20 -0700, Kip Warner wrote:
> If you're asking about the embeddings themselves I could have it log
> the values of the embeddings at each iteration and see if we can
> observe anything interesting?

Sameer, addendum to yesterday's post. I've logged the three embeddings
for an anchor, positive, and negative for each iteration of the cost
functor here:

https://pastebin.com/fmRWfaXS

I selected to log just one triplet otherwise the log would be massive.

You can see the components of the three embeddings changing with each
iteration, but I don't see anything unusual.

signature.asc

Kip Warner

unread,

Jul 19, 2024, 10:48:48 PM7/19/24

to ceres-...@googlegroups.com

On Fri, 2024-07-19 at 18:35 -0700, Kip Warner wrote:
> Sameer, addendum to yesterday's post. I've logged the three
> embeddings
> for an anchor, positive, and negative for each iteration of the cost
> functor here:
>
> https://pastebin.com/fmRWfaXS
>
> I selected to log just one triplet otherwise the log would be
> massive.
>
> You can see the components of the three embeddings changing with each
> iteration, but I don't see anything unusual.

Complete gzip compressed log uploaded below. Warning that it's 89 MB
and will decompress if loaded in your browser to 1.3 GB.

https://shorturl.at/NTbSV

signature.asc

Kip Warner

unread,

Jul 21, 2024, 5:18:55 PM7/21/24

to ceres-...@googlegroups.com

On Thu, 2024-07-18 at 20:05 -0700, 'Sameer Agarwal' via Ceres Solver
wrote:

> There is nothing wrong with ceres. The problem is the fmax.

I experimented today and yesterday in getting rid of the fmax entirely
by replacing:

const Type TripletLoss = fmax(

AnchorToPositiveCosineDistance -
AnchorToNegativeCosineDistance +
Type(1.0),
Type(0.0));

With:

const Type TripletLoss =
AnchorToPositiveCosineDistance -
AnchorToNegativeCosineDistance +
Type(2.0);

Since cosine distance (1 - cosine_similarity) is always in the range of
[0, 2], then the aforementioned should always be >= 0. However, CS is
still not able to compute the derivative (same gradient error). I tried
squaring the TripletLoss before assigning to the residual and still
same problem.

signature.asc

Kip Warner

unread,

Jul 24, 2024, 12:09:46 AM7/24/24

to ceres-...@googlegroups.com

Sameer,

I've managed to get some more clues on what's going on. I didn't
realize Solver::Summary::message contained a potentially useful log,
even if convergence failed prematurely when Options::check_gradients
was set and it failed to autodifferentiate.

This is what I managed to uncover within it, partially truncated to
keep within pastebin's 512KB limit:

https://pastebin.com/eBDyp0xf

It looks as though most of Jacobians are differentiating correctly, if
I'm reading that correctly, except 1 or sometimes 2 of them and not
always the same ones? Is this helpful at all?

signature.asc

Sameer Agarwal

unread,

Jul 28, 2024, 11:39:30 AM7/28/24

to ceres-...@googlegroups.com

Your GetDistance function is not differentiable.

Do not return a constant if the values are the same.

Current it is

template <typename Type>
Type GetDistanceImplementation(
const ObjectEmbedding<Type> &FirstEmbedding,
const ObjectEmbedding<Type> &SecondEmbedding)
{
// If they're both the same vector, then it has a perfect cosine
// similarity with itself and we can skip all the fancy math...
if(FirstEmbedding.GetValues() == SecondEmbedding.GetValues())
return Type(1.0);
// Otherwise compute the numerator dot product...
const Type DotProduct =
FirstEmbedding.GetValues().dot(SecondEmbedding.GetValues());
// Compute the denominator which is the product of the two vectors' norms...
const Type NormProduct =
FirstEmbedding.GetValues().norm() *
SecondEmbedding.GetValues().norm();
// Compute cosine similarity which is the ratio of the two...
const Type CosineSimilarity = DotProduct / NormProduct;
// Return similarity to caller...
return CosineSimilarity;
}

it should be

template <typename Type>
Type GetDistanceImplementation(
const ObjectEmbedding<Type> &FirstEmbedding,
const ObjectEmbedding<Type> &SecondEmbedding)
{
// Otherwise compute the numerator dot product...
const Type DotProduct =
FirstEmbedding.GetValues().dot(SecondEmbedding.GetValues());
// Compute the denominator which is the product of the two vectors' norms...
const Type NormProduct =
FirstEmbedding.GetValues().norm() *
SecondEmbedding.GetValues().norm();
// Compute cosine similarity which is the ratio of the two...
const Type CosineSimilarity = DotProduct / NormProduct;
// Return similarity to caller...
return CosineSimilarity;
}

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/c07ea6df527cb69159890aa1e356369b756aa2f5.camel%40thevertigo.com.

Sameer Agarwal

unread,

Jul 28, 2024, 11:43:59 AM7/28/24

to ceres-...@googlegroups.com

Your code also has way too many layers to do a simple computation. It will be faster and simpler debug this if you have a simple functor which does everything in that one function so I can see what is going on.

Kip Warner

unread,

Jul 28, 2024, 12:26:28 PM7/28/24

to ceres-...@googlegroups.com

On Sun, 2024-07-28 at 08:39 -0700, 'Sameer Agarwal' via Ceres Solver
wrote:

> Your GetDistance function is not differentiable.
> Do not return a constant if the values are the same.

Hey Sameer,

I tried removing the two lines checking for if the two vectors are the
same and unfortunately it still doesn't differentiate:

https://pastebin.com/eBDyp0xf

> Your code also has way too many layers to do a simple computation. It
> will be faster and simpler debug this if you have a simple functor
> which does everything in that one function so I can see what is going
> on.

I agree. I will try and put together a minimal for you later today.

signature.asc

Kip Warner

unread,

Jul 29, 2024, 9:01:07 PM7/29/24

to ceres-...@googlegroups.com

On Sun, 2024-07-28 at 08:43 -0700, 'Sameer Agarwal' via Ceres Solver
wrote:

> Your code also has way too many layers to do a simple computation. It
> will be faster and simpler debug this if you have a simple functor
> which does everything in that one function so I can see what is going
> on.

Hey Sameer. I've prepared a minimal for you that you can compile and
run:

https://pastebin.com/Xa91vVTR

Note that the training data for the sake of this minimal is randomly
generated and I don't know how that might affect autodifferentiation or
convergence efforts.

signature.asc

Sameer Agarwal

unread,

Aug 18, 2024, 5:28:51 PM8/18/24

to ceres-...@googlegroups.com

This code is fine as far as I can tell.

With random data I would not expect the solver to make much progress.

The gradient checking failure you are getting is likely because of the non-differentiable residual which comes from the fmax in the functor.

The default number of iterations are not enough you should increase the number of iterations and run with real data to see how it does.

Sameer

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/48e69f2ea7dec62c83b0f9890e5cfe760d80a1ae.camel%40thevertigo.com.

Kip Warner

unread,

Aug 18, 2024, 7:35:56 PM8/18/24

to ceres-...@googlegroups.com

On Sun, 2024-08-18 at 14:28 -0700, 'Sameer Agarwal' via Ceres Solver
wrote:

> This code is fine as far as I can tell.

Glad to hear that (kind of - since it still doesn't work).

> With random data I would not expect the solver to make much progress.

I'd imagine it may not be able converge with random data, but the
minimal should at the very least autodifferentiate?

> The gradient checking failure you are getting is likely because of
> the non-differentiable residual which comes from the fmax in the
> functor.

Is that still a problem if we increase the number of iterations or are
you saying I need to replace the fmax? If the latter I don't know how.

> The default number of iterations are not enough you should increase
> the number of iterations and run with real data to see how it does.

To increase the number of iterations, do you mean to increase
Solver::Options::max_num_iterations from the default of 150 to
something higher? If so, I've tried 500, 1500, and 15000 and it still
gives the same "Gradient Error detected!"

signature.asc

Kip Warner

unread,

Aug 18, 2024, 7:52:46 PM8/18/24

to ceres-...@googlegroups.com

On Sun, 2024-08-18 at 16:35 -0700, Kip Warner wrote:
> To increase the number of iterations, do you mean to increase
> Solver::Options::max_num_iterations from the default of 150 to
> something higher? If so, I've tried 500, 1500, and 15000 and it still
> gives the same "Gradient Error detected!"

I tried just now on real data. I set max_num_iterations to 150,000.
Still getting the same "Gradient Error detected!" My usage of the
library seems to be forever jinxed.

signature.asc

Sameer Agarwal

unread,

Aug 18, 2024, 8:02:25 PM8/18/24

to Kip Warner, ceres-...@googlegroups.com

On Sun, Aug 18, 2024 at 4:35 PM Kip Warner <k...@thevertigo.com> wrote:

On Sun, 2024-08-18 at 14:28 -0700, 'Sameer Agarwal' via Ceres Solver
wrote:
> This code is fine as far as I can tell.

Glad to hear that (kind of - since it still doesn't work).

> With random data I would not expect the solver to make much progress.

I'd imagine it may not be able converge with random data, but the
minimal should at the very least autodifferentiate?

and it does, but the fmax makes it a non-differentiable function so when you have a data point which is close to that point of non-differentiability the automatic derivatives will not match the numeric derivatives use to check them and it will indicate an error.

> The gradient checking failure you are getting is likely because of
> the non-differentiable residual which comes from the fmax in the
> functor.

Is that still a problem if we increase the number of iterations or are
you saying I need to replace the fmax? If the latter I don't know how.

This has nothing to do with number of iterations. What I am saying is, that it is not something you need to worry about in this current formulation.

That said, it is possible you need to change your formulation to get better solutions or that Ceres Solver is not a great fit for this formulation.

> The default number of iterations are not enough you should increase
> the number of iterations and run with real data to see how it does.

To increase the number of iterations, do you mean to increase
Solver::Options::max_num_iterations from the default of 150 to
something higher? If so, I've tried 500, 1500, and 15000 and it still
gives the same "Gradient Error detected!"

Increasing the number of iterations will give you a better solution it does nothing to change gradient checking.

Sameer

Kip Warner

unread,

Aug 20, 2024, 7:28:27 PM8/20/24

to ceres-...@googlegroups.com

On Sun, 2024-08-18 at 17:02 -0700, Sameer Agarwal wrote:
> and it does, but the fmax makes it a non-differentiable function so
> when you have a data point which is close to that point of non-
> differentiability the automatic derivatives will not match the
> numeric derivatives use to check them and it will indicate an error.

Got it. In other words it's a red herring.

> This has nothing to do with number of iterations. What I am saying
> is, that it is not something you need to worry about in this current
> formulation.
> That said, it is possible you need to change your formulation to get
> better solutions or that Ceres Solver is not a great fit for this
> formulation.

Thanks Sameer. By disabling the gradient checking and increasing the
number of iterations, it does report convergence. But but bear in mind
this is on randomly generated data so I don't expect the solution to be
optimal. Here is an updated minimal:

https://pastebin.com/Xa91vVTR

A few questions for you:

(1) What I've done is after convergence test the accuracy of the
solution by checking how many triplet predicates from an independent
test set (not the set used to train) still hold as true on L497-553.

Note that there is an alpha margin used in the cost functor at L306-313
to ensure the positive and negative embeddings have at least some
minimum arbitrary space between them when training. But after
convergence to check the solution's accuracy I don't think I need to
ensure that the margin is still respected. Only that the distance
between the anchor and positive is less than the anchor and the
negative on L551. Is that reasonable?

(2) I've set the maximum number of iterations to 500,000. The number of
iterations necessary to find a good solution I'm assuming will vary
widely across different problem domains and formulations. Is that
correct?

(3) When I test on real data it took over three hours to converge using
all logical cores using Solver::Options::num_threads set to
thread::hardware_concurrency(). Are there any options for Ceres Solver
you would recommend I experiment with to reduce that time?

Cost:
Initial 1.987583e+03
Final 1.385421e+03
Change 6.021622e+02

Minimizer iterations 10232
Successful steps 10222
Unsuccessful steps 10

Time (in seconds):
Preprocessor 0.001859

Residual only evaluation 45.426662 (10232)
Jacobian & residual evaluation 11811.069377 (10222)
Linear solver 15.100255 (10232)
Minimizer 11875.343345

Postprocessor 0.000121
Total 11875.345325

Termination: CONVERGENCE (Function tolerance reached. |cost_change|/cost: 3.994556e-07 <= 1.000000e-06)

(4) Based on the above time spent in the Jacobian & residual
evaluation, would it make sense to leverage a GPU and set
Solver::Options::dense_linear_algebra_library_type to ceres::CUDA? Or
would the amount of time moving data from CPU / RAM to the GPU make
that pointless?

(5) You can see that I've disabled bounds constraints on L420-436. I
don't see why they'd be necessary in this problem. Would you agree?

> Increasing the number of iterations will give you a better solution
> it does nothing to change gradient checking.

Understood. Which leads me to (6), is there anything else you would
suggest that might increase the quality of the solution? I tried
reducing the size of Solver::Options::function_tolerance from the
default 1e-6 to 1e-12, but to my surprise that reduced the accuracy. As
I understand it, that is the ratio of the current iteration's cost
delta and the remaining cost.

If I set Dimensions to 20 on L36, despite using random data, the
accuracy is still 56.12 %.

signature.asc

Sameer Agarwal

unread,

Aug 21, 2024, 3:38:57 PM8/21/24

to Kip Warner, ceres-...@googlegroups.com

Yes

(2) I've set the maximum number of iterations to 500,000. The number of
iterations necessary to find a good solution I'm assuming will vary
widely across different problem domains and formulations. Is that
correct?

yes but I suspect you are not going to need half a million iterations.

(3) When I test on real data it took over three hours to converge using
all logical cores using Solver::Options::num_threads set to
thread::hardware_concurrency(). Are there any options for Ceres Solver
you would recommend I experiment with to reduce that time?

you should not need to run the solver for so long. I doubt the solution is changing much after some thousands of iterations.

If it is, then you have problems with the formulation.

Cost:
Initial 1.987583e+03
Final 1.385421e+03
Change 6.021622e+02

Minimizer iterations 10232
Successful steps 10222
Unsuccessful steps 10

Time (in seconds):
Preprocessor 0.001859

Residual only evaluation 45.426662 (10232)
Jacobian & residual evaluation 11811.069377 (10222)
Linear solver 15.100255 (10232)
Minimizer 11875.343345

Postprocessor 0.000121
Total 11875.345325

Termination: CONVERGENCE (Function tolerance reached. |cost_change|/cost: 3.994556e-07 <= 1.000000e-06)

(4) Based on the above time spent in the Jacobian & residual
evaluation, would it make sense to leverage a GPU and set
Solver::Options::dense_linear_algebra_library_type to ceres::CUDA? Or
would the amount of time moving data from CPU / RAM to the GPU make
that pointless?

no. Your problem is tiny. you are not spending any time in the linear solver. All your time is in the Jacobian evaluation. I would focus on optimizing it.

Further your objective function is easily analytically differentiable, that should speed up your jacobian computation. But all that comes after you start getting solutions.

(5) You can see that I've disabled bounds constraints on L420-436. I
don't see why they'd be necessary in this problem. Would you agree?

you want the weights to be positive no?

> Increasing the number of iterations will give you a better solution
> it does nothing to change gradient checking.

Understood. Which leads me to (6), is there anything else you would
suggest that might increase the quality of the solution? I tried
reducing the size of Solver::Options::function_tolerance from the
default 1e-6 to 1e-12, but to my surprise that reduced the accuracy. As
I understand it, that is the ratio of the current iteration's cost
delta and the remaining cost.

fuction_tolerance is a convergence hack. Are you sure you were comparing the same data in both cases when you reduced the tolerance? it should not lead to a worse solution.

Sameer

Kip Warner

unread,

Aug 24, 2024, 8:49:05 PM8/24/24

to Sameer Agarwal, ceres-...@googlegroups.com

On Wed, 2024-08-21 at 12:38 -0700, Sameer Agarwal wrote:
> no. Your problem is tiny. you are not spending any time in the linear
> solver. All your time is in the Jacobian evaluation. I would focus on
> optimizing it.

Thanks Sameer. Any tips on that? I had suspected a GPU wouldn't do
anything here.

> Further your objective function is easily analytically
> differentiable, that should speed up your jacobian computation. But
> all that comes after you start getting solutions.

Sorry, I don't follow. Do you mean the objective function is easily
autodifferentiating by the optimizer, but that it should speed up as
the optimizer gets closer to convergence? I don't know what I could do
to speed up the Jacobian computation?

> > (5) You can see that I've disabled bounds constraints on L420-436.
> > I don't see why they'd be necessary in this problem. Would you
> > agree?
>
> you want the weights to be positive no?

I don't think it should in theory matter what value the weights are,
provided they create useful embeddings that respect as many of the
triplet predicates as possible. Or is there a benefit to having them
bounds constrained to being positive / some range?

> >
> > Understood. Which leads me to (6), is there anything else you would
> > suggest that might increase the quality of the solution? I tried
> > reducing the size of Solver::Options::function_tolerance from the
> > default 1e-6 to 1e-12, but to my surprise that reduced the
> > accuracy. As I understand it, that is the ratio of the current
> > iteration's cost delta and the remaining cost.
>
> fuction_tolerance is a convergence hack. Are you sure you were
> comparing the same data in both cases when you reduced the tolerance?
> it should not lead to a worse solution.

You might be right. I was trying to reproduce this in a minimal for you
but having difficulty. It may be that I had changed something else that
was the real reason why the accuracy varied.

signature.asc

Sameer Agarwal

unread,

Aug 24, 2024, 8:53:33 PM8/24/24

to Kip Warner, ceres-...@googlegroups.com

On Sat, Aug 24, 2024, 5:49 PM Kip Warner <k...@thevertigo.com> wrote:

On Wed, 2024-08-21 at 12:38 -0700, Sameer Agarwal wrote:
> no. Your problem is tiny. you are not spending any time in the linear
> solver. All your time is in the Jacobian evaluation. I would focus on
> optimizing it.

Thanks Sameer. Any tips on that? I had suspected a GPU wouldn't do
anything here.

> Further your objective function is easily analytically
> differentiable, that should speed up your jacobian computation. But
> all that comes after you start getting solutions.

Sorry, I don't follow. Do you mean the objective function is easily
autodifferentiating by the optimizer, but that it should speed up as
the optimizer gets closer to convergence? I don't know what I could do
to speed up the Jacobian computation?

What I mean is compute the derivatives by hand rather than using automatic differentiation.

> > (5) You can see that I've disabled bounds constraints on L420-436.
> > I don't see why they'd be necessary in this problem. Would you
> > agree?
>
> you want the weights to be positive no?

I don't think it should in theory matter what value the weights are,
provided they create useful embeddings that respect as many of the
triplet predicates as possible. Or is there a benefit to having them
bounds constrained to being positive / some range?

This is a matter for you to decide how you want to think about the underlying model.

Sameer

Kip Warner

unread,

Aug 24, 2024, 9:50:28 PM8/24/24

to Sameer Agarwal, ceres-...@googlegroups.com

On Sat, 2024-08-24 at 17:53 -0700, Sameer Agarwal wrote:
> What I mean is compute the derivatives by hand rather than using
> automatic differentiation.

I'd have to dust-off my first year differential calculus textbook.

But assuming I manage to hand compute the derivatives, are you saying
that would solve the performance bottleneck?

I just assumed the optimizer computed the derivatives only once in the
beginning and thereafter keeps feeding them candidate values on each
iteration. But if I understand correctly, the Jacobian matrices that
the optimizer computes every iteration _is_ the process it needs to
follow in order to compute the derivatives every time?

signature.asc

Sameer Agarwal

unread,

Aug 24, 2024, 9:52:12 PM8/24/24

to Kip Warner, ceres-...@googlegroups.com

I am saying that it will be faster since the function is essentially linear.

Kip Warner

unread,

Aug 24, 2024, 9:53:13 PM8/24/24

to Sameer Agarwal, ceres-...@googlegroups.com

On Sat, 2024-08-24 at 18:51 -0700, Sameer Agarwal wrote:
> I am saying that it will be faster since the function is essentially
> linear.

Ok. I will experiment when I can.

signature.asc

Kip Warner

unread,

Aug 28, 2024, 8:27:58 PM8/28/24

to ceres-...@googlegroups.com, Sameer Agarwal

On Sun, 2024-08-18 at 14:28 -0700, 'Sameer Agarwal' via Ceres Solver
wrote:

> This code is fine as far as I can tell.
> With random data I would not expect the solver to make much progress.

Hey Sameer,

I've been experimenting with real data and unfortunately, although it
does converge, there are some obvious problems.

It appears to take too many iterations, with each one barely decreasing
the cost. After convergence the change in cost is not very much.

But most importantly, the total number of predicates that still hold
true after training seems to average around 50 %, which is the same as
assigning totally random weights when creating the object embeddings.

As a test run I used 144 objects and 407 learning example triplets that
refer to the objects. The latter 407 examples were divided into 325
used for actually training with the optimizer and the remaining 82 for
a test set for after training (80 / 20 ratio).

Cost:
Initial 1.606899e+00
Final 1.543389e+00
Change 6.351000e-02

Minimizer iterations 524
Successful steps 516
Unsuccessful steps 8

Time (in seconds):
Preprocessor 0.001421

Residual only evaluation 0.236995 (524)
Jacobian & residual evaluation 42.372609 (516)
Linear solver 0.077782 (524)
Minimizer 42.731210

Postprocessor 0.000009
Total 42.732640

Termination: CONVERGENCE (Function tolerance reached. |cost_change|/cost: 1.302798e-07 <= 1.000000e-06)

Test set accuracy: 62.1951 %

Could the problem be with the cost functor perhaps? I cannot think of a
creative mathematical way to replace the fmax() while still achieving
the same end, but open to suggestions. There must be some way to
express it as a polynomial perhaps.

signature.asc

Sameer Agarwal

unread,

Aug 28, 2024, 8:29:07 PM8/28/24

to Kip Warner, ceres-...@googlegroups.com

At this point this is out of my area of expertise.

Kip Warner

unread,

Sep 22, 2024, 10:27:53 PM9/22/24

to ceres-...@googlegroups.com

On Sun, 2024-08-18 at 14:28 -0700, 'Sameer Agarwal' via Ceres Solver
wrote:

> The gradient checking failure you are getting is likely because of
> the non-differentiable residual which comes from the fmax in the
> functor.

Hey Sameer,

I've had a chance to do some more thinking and experimentation on the
subject of the non-differentiability of the fmax() function in my cost
functor. I've attempted to substitute that call with a smooth
approximation of max() that should be differentiable at L316:

https://pastebin.com/sv4LKpsg

The replacement is the Smooth Maximum Unit (SMU) described below. I
selected an epsilon value of 0.5:

https://en.wikipedia.org/wiki/Smooth_maximum#Smooth_maximum_unit

If I re-enable gradient checking, I note that it fails again. If I
disable it I note that the cost briefly goes up for the first few
iterations before going down (but only a little).

I've tried 0.1, 50, and 500 for epsilon and it doesn't seem to make any
difference.

Shouldn't cost functor at least autodifferentiate correctly now?

signature.asc

Sameer Agarwal

unread,

Sep 24, 2024, 1:05:52 PM9/24/24

to ceres-...@googlegroups.com

Gradient checking is based on numerical differentiation. Depending on the curvature of the function it can produce false positives.
It is worth taking a look at the values at which it is complaining to see what the derivative you are getting and what should be expected.

Sameer

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/7675e21d385d85473606db06f0c5159cd77305b1.camel%40thevertigo.com.

Kip Warner

unread,

Sep 24, 2024, 1:09:13 PM9/24/24

to ceres-...@googlegroups.com, Sameer Agarwal

On Tue, 2024-09-24 at 10:05 -0700, Sameer Agarwal wrote:
> Gradient checking is based on numerical differentiation. Depending on
> the curvature of the function it can produce false positives.
> It is worth taking a look at the values at which it is complaining to
> see what the derivative you are getting and what should be expected.