Correct usage for triplet loss / non-metric multidimensional scaling problem

221 views
Skip to first unread message

Kip Warner

unread,
Jul 15, 2024, 2:45:28 PM7/15/24
to ceres-...@googlegroups.com
Hello Ceres Solver users,

I am experimenting with Ceres Solver (2.2.0) to try and solve a triplet
loss optimization problem for object similarity detection.

I have a bunch of triplet training examples of the form <A, P, N> where
A is some object (the anchor) and P is a positive example of some
object that is more similar to A than the negative example N. The
number of triplet training examples is not known at compile time
because they are provided by the user.

Each object has some R^d dimensional embedding that I'd like to
generate based on the object's features and respective weights applied
to each component in the vector embedding. I am using Ceres Solver to
try and find the optimal weights vector to ensure that, given a
distance metric, the distance between the embeddings is brought closer
together for the As and Ps while the As and Ns are pushed farther apart
for as many training examples as possible.

I suspect I am doing something wrong because for around 30K triplets
Ceres Solver is taking over 20 minutes and appears to be using only a
single thread. My embedding function might well be expensive, but I
suspect that's not the primary issue. Here is the essence of how I am
currently computing the weights, which is surely incorrect:

https://pastebin.com/yzCCY7He

The first function, TripletCostFunctor::operator(), if I understand
correctly, should take the current candidate weights and sum their cost
into a residual bucket. It does this by looping over all of the
triplets where the distance is computed between the anchor <-->
positives' embeddings and the anchor <--> negative based on the
candidate weights. If the distance between the anchor <--> positive was
not less than the anchor <--> negative, then there is a cost.

The second function, WeightOptimizer::Train(), sets up the problem by
first creating a starting weight vector of unity weights (all ones). I
then tell the library I would like to auto differentiate the
TripletCostFunctor with a single residual and add a residual block. I
also tell the library I would like my weights to be in the range of [0,
1]. I set a few other solver options as you can see and then tell it to
begin solving.

I suspect that I am doing something wrong with how I am creating the
residual block (should there be more than one?) and storing the
aggregate triplet loss cost, and potentially some other things with the
library which are causing performance degradation. Any help
appreciated.

PS I know I've said it before to him, but I'd like to take a moment to
recognize Sameer for his contributions to creating such a marvellous
library. It took an enormous amount of work, is well written, well
documented, and has surely aided countless.

--
Kip Warner
OpenPGP signed/encrypted mail preferred
https://www.thevertigo.com
signature.asc

Sameer Agarwal

unread,
Jul 15, 2024, 2:47:18 PM7/15/24
to ceres-...@googlegroups.com
Kip,
can you post the output of Solver::Summary::FullReport for a smaller instance?
also the pastebin link does not work.
Sameer


--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ceres-solver/0806a52bddaea5b5daac769d7a397aaf4cf027e4.camel%40thevertigo.com.

Kip Warner

unread,
Jul 15, 2024, 3:12:41 PM7/15/24
to ceres-...@googlegroups.com
On Mon, 2024-07-15 at 11:47 -0700, 'Sameer Agarwal' via Ceres Solver
wrote:
> Kip,
> can you post the output of Solver::Summary::FullReport for a smaller
> instance? also the pastebin link does not work.

Apologies Sameer. The link should work now:

https://pastebin.com/yzCCY7He

Here's the full report:

Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-suitesparse-(7.6.1)-eigensparse)

Original Reduced
Parameter blocks 1 1
Parameters 34 34
Residual blocks 1 1
Residuals 1 1

Minimizer TRUST_REGION

Dense linear algebra library EIGEN
Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver DENSE_QR DENSE_QR
Threads 16 16
Linear solver ordering AUTOMATIC 1

Cost:
Initial 4.669543e+08
Final 4.572289e+08
Change 9.725384e+06

Minimizer iterations 12
Successful steps 6
Unsuccessful steps 6
Line search steps 0

Time (in seconds):
Preprocessor 0.001490

Residual only evaluation 3.331208 (12)
Line search cost evaluation 0.000000
Jacobian & residual evaluation 1220.004134 (18)
Line search gradient evaluation 814.350305
Linear solver 0.000341 (12)
Line search polynomial minimization 0.000000
Minimizer 1223.336643

Postprocessor 0.000004
Total 1223.338137

Termination: CONVERGENCE (Function tolerance reached. |cost_change|/cost: 1.520767e-08 <= 1.000000e-06)
signature.asc

Sameer Agarwal

unread,
Jul 15, 2024, 4:11:42 PM7/15/24
to ceres-...@googlegroups.com
From the summary report we can tell that you have one residual block, so just one cost function which is constructing a single scalar residual, and thats where all the time is going.

The problem is the following loop.

  1. for(size_t TrainingExampleIndex = 0;
  2.                 TrainingExampleIndex < TrainingExamples.size();
  3.               ++TrainingExampleIndex)
  4.             {
  5.                 // Unpack current triplet into anchor, positive, and negative
  6.                 //  examples...
  7.                 const auto &[AnchorIndex, PositiveIndex, NegativeIndex] =
  8.                     TrainingExamples.at(TrainingExampleIndex);
  9.  
  10.                 // Compute distance from the anchor to positive example...
  11.                 const Type AnchorToPositive = GetDistance(
  12.                     AnchorIndex,
  13.                     PositiveIndex,
  14.                     Weights);
  15.  
  16.                 // Compute distance from the anchor to negative example...
  17.                 const Type AnchorToNegative = GetDistance(
  18.                     AnchorIndex,
  19.                     NegativeIndex,
  20.                     Weights);
  21.  
  22.                 // Compute loss of this triplet...
  23.                 const Type TripletLoss = fmax(
  24.                     Type(1.0) + AnchorToPositive - AnchorToNegative,
  25.                     Type(0.0));
  26.  
  27.                 // The residual is the cost of the given weights on our triplets...
  28.                 Residual[0] += TripletLoss;
  29.             }
  30.  

so effectively what you are doing is residual += cost[i]  
ceres cannot thread inside your code.

try creating a single residual per triplet.
which just returns the loss for that triplet. that way you will create as many residual blocks as your observations, and ceres will evaluate them using multiple threads.
This will also give ceres a jacobian, right now you are just using a single scalar residual, so effectively you are performing gradient descent and not getting any benefit from levenberg marquardt.

Sameer



--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

Kip Warner

unread,
Jul 15, 2024, 4:24:06 PM7/15/24
to ceres-...@googlegroups.com
On Mon, 2024-07-15 at 13:11 -0700, 'Sameer Agarwal' via Ceres Solver
wrote:
> try creating a single residual per triplet.
> which just returns the loss for that triplet. that way you will
> create as many residual blocks as your observations, and ceres will
> evaluate them using multiple threads.
> This will also give ceres a jacobian, right now you are just using a
> single scalar residual, so effectively you are performing gradient
> descent and not getting any benefit from levenberg marquardt.

If you look at L104-L115, that is where I add the cost functor and the
residual block. If I understand correctly, are you suggesting I should
take that, wrap it in a loop like so?

for(size_t TripletIndex = 0;
TripletIndex < GetTrainingExamplesTotal();
++TripletIndex)
{
// Allocate cost function such that its derivative will be automatically
// computed via Jacobian auto-differentiation. AutoDiffCostFunction
// takes ownership of the TripletCostFunctor and is responsible for
// cleaning it up when the problem object destructs...
ceres::CostFunction *CostFunctor = new ceres::AutoDiffCostFunction
<
TripletCostFunctor, /* Type of cost functor */
1, /* Number of residuals */
FeatureWeights<double>::AllWeightsCount /* Size of parameter block */
>(new TripletCostFunctor(*this));

// Set it within the problem object as the cost function...
CeresProblem.AddResidualBlock(
CostFunctor,
nullptr,
AllWeightsAsDouble1DArray.data());
}

And then change the TripletCostFunctor::operator() to get rid of its
loop so that it only computes the cost of a single triplet, assigning
to Residual[0]?

If I do that, the functor needs to know which triplet to process when
invoked. Is the recommended Ceres Solver approach via the functor's
constructor when the ceres::CostFunction is allocated to something like
the following?

ceres::CostFunction *CostFunctor = new ceres::AutoDiffCostFunction
<
TripletCostFunctor,
1,
FeatureWeights<double>::AllWeightsCount
>(new TripletCostFunctor(*this, TripletIndex));
signature.asc

Sameer Agarwal

unread,
Jul 15, 2024, 4:27:06 PM7/15/24
to ceres-...@googlegroups.com
yes.

--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

Kip Warner

unread,
Jul 15, 2024, 4:56:45 PM7/15/24
to ceres-...@googlegroups.com
On Mon, 2024-07-15 at 13:26 -0700, 'Sameer Agarwal' via Ceres Solver
wrote:
> yes.

Thanks Sameer. That reduced the training time significantly, but the
resulting solution weight vector looks suspect:

Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-suitesparse-(7.6.1)-eigensparse)

Original Reduced
Parameter blocks 1 1
Parameters 34 34
Residual blocks 29488 29488
Residuals 29488 29488

Minimizer TRUST_REGION

Dense linear algebra library EIGEN
Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver DENSE_QR DENSE_QR
Threads 16 16
Linear solver ordering AUTOMATIC 1

Cost:
Initial 1.523965e+04
Final 1.474400e+04
Change 4.956501e+02

Minimizer iterations 6
Successful steps 6
Unsuccessful steps 0
Line search steps 0

Time (in seconds):
Preprocessor 0.002314

Residual only evaluation 0.128700 (5)
Line search cost evaluation 0.000000
Jacobian & residual evaluation 79.086017 (11)
Line search gradient evaluation 35.955970
Linear solver 0.052634 (5)
Line search polynomial minimization 0.000000
Minimizer 79.281790

Postprocessor 0.000653
Total 79.284758

Termination: CONVERGENCE (Gradient tolerance reached. Gradient max norm: 3.697043e-13 <= 1.000000e-10)


Feature weights before / after training:
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 1
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 1
signature.asc

Sameer Agarwal

unread,
Jul 15, 2024, 5:35:00 PM7/15/24
to ceres-...@googlegroups.com
so now its worth looking at what your weights are saying, and how that translates into your cost.
at this point this is not a code/performance problem but rather a modeling issue. Note that your cost has not changed much. It is worth taking a closer look at the max that you are taking in your cost function.
Sameer



--
You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

Kip Warner

unread,
Jul 15, 2024, 10:19:04 PM7/15/24
to ceres-...@googlegroups.com
On Mon, 2024-07-15 at 14:34 -0700, 'Sameer Agarwal' via Ceres Solver
wrote:
> so now its worth looking at what your weights are saying, and how
> that translates into your cost.
> at this point this is not a code/performance problem but rather a
> modeling issue. Note that your cost has not changed much. It is worth
> taking a closer look at the max that you are taking in your cost
> function.

Thanks for your patience Sameer.

I've spent the day fiddling with it and I found a problem in one of the
scenarios where I use cosine similarity as the distance metric. Since
that ranges from [-1, 1], that would break the fmax() in the cost
function. What I'm doing now on L55 should be hopefully more sane in
using cosine distance (ranges from [0, 2]) instead where a smaller
number is better.

https://pastebin.com/yzCCY7He

Except there's still something wrong:

Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-suitesparse-(7.6.1)-eigensparse)

Original Reduced
Parameter blocks 1 1
Parameters 34 34
Residual blocks 28614 28614
Residuals 28614 28614

Minimizer TRUST_REGION

Dense linear algebra library EIGEN
Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver DENSE_QR DENSE_QR
Threads 16 16
Linear solver ordering AUTOMATIC 1

Cost:
Initial 1.378852e+04
Final 7.640724e+03
Change 6.147797e+03

Minimizer iterations 30
Successful steps 26
Unsuccessful steps 4
Line search steps 44

Time (in seconds):
Preprocessor 0.002781

Residual only evaluation 0.866667 (30)
Line search cost evaluation 0.000000
Jacobian & residual evaluation 766.859171 (96)
Line search gradient evaluation 559.301355
Linear solver 0.317060 (30)
Line search polynomial minimization 0.001110
Minimizer 768.117835

Postprocessor 0.000481
Total 768.121098

Termination: CONVERGENCE (Function tolerance reached. |cost_change|/cost: 9.697896e-07 <= 1.000000e-06)


Feature weights before / after training:
1 0
1 0
1 0
1 0
1 0.815256
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0.772392
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 1
1 0
1 0.854761
1 0
1 1
1 0

There's three things that stand out. The first is the cost was reduced
from 13788.5 to 7640.72 after 29 iterations. That's good.

But the second thing is the resulting weight vector you see above looks
funny in mostly zero components. That might have been a mathematically
sensible solution, if it wasn't for the third issue.

After the optimizer completes I compute the accuracy of the solution by
applying it to a test set of triplets (L179) that were not trained on.
The number of predicates that held true was zero. So something is
clearly wrong and I'm stumped again.
signature.asc

Kip Warner

unread,
Jul 17, 2024, 12:31:36 AM7/17/24
to ceres-...@googlegroups.com
On Mon, 2024-07-15 at 14:34 -0700, 'Sameer Agarwal' via Ceres Solver
wrote:
> so now its worth looking at what your weights are saying, and how
> that translates into your cost.
> at this point this is not a code/performance problem but rather a
> modeling issue. Note that your cost has not changed much. It is worth
> taking a closer look at the max that you are taking in your cost
> function.

Sameer, I think I may have found the problem after many more hours. The
gradients are apparently not being computed correctly. I've now enabled
check_gradients in the Solver::Options on L134:

https://pastebin.com/yzCCY7He

When I run with that flag I see immediately after iteration 0:

E20240716 21:01:23.947016 2398748
gradient_checking_cost_function.cc:137] Gradient error detected.
Terminating solver.

This happens regardless of whether I constrain the solution with
SetParameterLowerBound / SetParameterUpperBound or not.

Since my CostFunction is a AutoDiffCostFunction, should the optimizer
not compute the partial derivatives for each component of the solution
vector automatically? The cost functor is simple at L19-L64. I've
looked at the fmax() and I don't see an issue with it in how it's
computing the cost, unless I'm missing something?

I have not compiled my own Ceres Solver, but using the vanilla build
(2.2.0) that ships with Ubuntu Noble (amd64).
signature.asc

Sameer Agarwal

unread,
Jul 18, 2024, 11:05:22 PM7/18/24
to ceres-...@googlegroups.com
There is nothing wrong with ceres. The problem is the fmax.

    1. const Type TripletLoss = fmax(
    1.                 (AnchorToPositiveCosineDistance) - (AnchorToNegativeCosineDistance) +
    2.                     AlphaMargin,
    3.                 Type(0.0));

    The problem is that if the first term in the max is less than 0, then your loss is always zero, which is a constant function with zero derivative.

    what are you trying to do is

    sum_i [max(f_i - g_i, 0)]^2

    and your parameters move to the point where f_i - g_i <= 0, and your derivatives go to zero, and the optimizer believes it has found a local minimum.

    have a look at your triplets, is it the case, that most or all of them are triggering the max with 0?

    Sameer


    --
    You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

    Kip Warner

    unread,
    Jul 19, 2024, 1:05:30 AM7/19/24
    to ceres-...@googlegroups.com
    On Thu, 2024-07-18 at 20:05 -0700, 'Sameer Agarwal' via Ceres Solver
    wrote:
    > There is nothing wrong with ceres. The problem is the fmax.

    Hey Sameer. Literally a minute before your email arrived I realized the
    function might not be correctly differentiating because it's not smooth
    as a piece-wise function. If that is the case then the gradient check
    error is correct.

    >    1. const Type TripletLoss = fmax(
    >    2.                 (AnchorToPositiveCosineDistance) -
    > (AnchorToNegativeCosineDistance) +
    >    3.                     AlphaMargin,
    >    4.                 Type(0.0));
    >
    > The problem is that if the first term in the max is less than 0, then
    > your loss is always zero, which is a constant function with zero
    > derivative.
    >
    > what are you trying to do is
    >
    > sum_i [max(f_i - g_i, 0)]^2

    Yes, but with the alpha / margin term in there too.

    > and your parameters move to the point where f_i - g_i <= 0, and your
    > derivatives go to zero, and the optimizer believes it has found a
    > local minimum.
    >
    > have a look at your triplets, is it the case, that most or all of
    > them are triggering the max with 0?

    If you scroll down to L235, I've logged within the cost functor when
    instantiated as double type. The resulting log was over a 100MB for
    several thousand triplets, so I truncated it to only include training
    for three of them:

    https://pastebin.com/yzCCY7He

    The logging format is:

    Triplet(AnchorID, PositiveID, NegativeID) \
    AP=cosine_distance_between_anchor_and_positive \
    AN=cosine_distance_between_anchor_and_negative \
    arguments and value of fmax

    e.g.

    ...
    Triplet=(47,49,37) AP=0.0162818 AN=0.00593436 max=(0.0162818 - 0.00593436 + 1, 0) = 1.01035
    ...

    Interestingly, in neither the truncated log nor the complete one did I
    see any of the fmax values evaluate to zero. They all seemed to be
    close to 1.
    signature.asc

    Sameer Agarwal

    unread,
    Jul 19, 2024, 1:09:49 AM7/19/24
    to ceres-...@googlegroups.com

    In that case what is your cosine weight computation like?


    --
    You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

    Kip Warner

    unread,
    Jul 19, 2024, 1:20:24 AM7/19/24
    to ceres-...@googlegroups.com
    On Thu, 2024-07-18 at 22:09 -0700, 'Sameer Agarwal' via Ceres Solver
    wrote:
    > In that case what is your cosine weight computation like?

    As in how cosine similarity is computed on the object embeddings or how
    are the embeddings themselves computed from the weights? If the former,
    it's just the ratio of the two embeddings' dot and norm products with
    the help of libeigen. See L70:

    https://pastebin.com/yzCCY7He

    If you're asking about the embeddings themselves I could have it log
    the values of the embeddings at each iteration and see if we can
    observe anything interesting?
    signature.asc

    Kip Warner

    unread,
    Jul 19, 2024, 9:35:43 PM7/19/24
    to ceres-...@googlegroups.com
    On Thu, 2024-07-18 at 22:20 -0700, Kip Warner wrote:
    > If you're asking about the embeddings themselves I could have it log
    > the values of the embeddings at each iteration and see if we can
    > observe anything interesting?

    Sameer, addendum to yesterday's post. I've logged the three embeddings
    for an anchor, positive, and negative for each iteration of the cost
    functor here:

    https://pastebin.com/fmRWfaXS

    I selected to log just one triplet otherwise the log would be massive.

    You can see the components of the three embeddings changing with each
    iteration, but I don't see anything unusual.
    signature.asc

    Kip Warner

    unread,
    Jul 19, 2024, 10:48:48 PM7/19/24
    to ceres-...@googlegroups.com
    On Fri, 2024-07-19 at 18:35 -0700, Kip Warner wrote:
    > Sameer, addendum to yesterday's post. I've logged the three
    > embeddings
    > for an anchor, positive, and negative for each iteration of the cost
    > functor here:
    >
    >    https://pastebin.com/fmRWfaXS
    >
    > I selected to log just one triplet otherwise the log would be
    > massive.
    >
    > You can see the components of the three embeddings changing with each
    > iteration, but I don't see anything unusual.

    Complete gzip compressed log uploaded below. Warning that it's 89 MB
    and will decompress if loaded in your browser to 1.3 GB.

    https://shorturl.at/NTbSV
    signature.asc

    Kip Warner

    unread,
    Jul 21, 2024, 5:18:55 PM7/21/24
    to ceres-...@googlegroups.com
    On Thu, 2024-07-18 at 20:05 -0700, 'Sameer Agarwal' via Ceres Solver
    wrote:
    > There is nothing wrong with ceres. The problem is the fmax.

    I experimented today and yesterday in getting rid of the fmax entirely
    by replacing:

    const Type TripletLoss = fmax(
    AnchorToPositiveCosineDistance -
    AnchorToNegativeCosineDistance +
    Type(1.0),
    Type(0.0));

    With:

    const Type TripletLoss =
    AnchorToPositiveCosineDistance -
    AnchorToNegativeCosineDistance +
    Type(2.0);

    Since cosine distance (1 - cosine_similarity) is always in the range of
    [0, 2], then the aforementioned should always be >= 0. However, CS is
    still not able to compute the derivative (same gradient error). I tried
    squaring the TripletLoss before assigning to the residual and still
    same problem.
    signature.asc

    Kip Warner

    unread,
    Jul 24, 2024, 12:09:46 AM7/24/24
    to ceres-...@googlegroups.com
    Sameer,

    I've managed to get some more clues on what's going on. I didn't
    realize Solver::Summary::message contained a potentially useful log,
    even if convergence failed prematurely when Options::check_gradients
    was set and it failed to autodifferentiate.

    This is what I managed to uncover within it, partially truncated to
    keep within pastebin's 512KB limit:

    https://pastebin.com/eBDyp0xf

    It looks as though most of Jacobians are differentiating correctly, if
    I'm reading that correctly, except 1 or sometimes 2 of them and not
    always the same ones? Is this helpful at all?
    signature.asc

    Sameer Agarwal

    unread,
    Jul 28, 2024, 11:39:30 AM7/28/24
    to ceres-...@googlegroups.com
    Your GetDistance function is not differentiable.
    Do not return a constant if the values are the same. 

    Current it is 

    1. template <typename Type>
    2. Type GetDistanceImplementation(
    3.     const ObjectEmbedding<Type> &FirstEmbedding,
    4.     const ObjectEmbedding<Type> &SecondEmbedding)
    5. {
    6.     // If they're both the same vector, then it has a perfect cosine
    7.     //  similarity with itself and we can skip all the fancy math...
    8.     if(FirstEmbedding.GetValues() == SecondEmbedding.GetValues())
    9.         return Type(1.0);
    10.  
    11.     // Otherwise compute the numerator dot product...
    12.     const Type DotProduct =
    13.         FirstEmbedding.GetValues().dot(SecondEmbedding.GetValues());
    14.  
    15.     // Compute the denominator which is the product of the two vectors' norms...
    16.     const Type NormProduct =
    17.         FirstEmbedding.GetValues().norm() *
    18.         SecondEmbedding.GetValues().norm();
    19.  
    20.     // Compute cosine similarity which is the ratio of the two...
    21.     const Type CosineSimilarity = DotProduct / NormProduct;
    22.  
    23.     // Return similarity to caller...
    24.     return CosineSimilarity;
    25. }

    it should be

    1. template <typename Type>
    2. Type GetDistanceImplementation(
    3.     const ObjectEmbedding<Type> &FirstEmbedding,
    4.     const ObjectEmbedding<Type> &SecondEmbedding)
    5. {
    6.     // Otherwise compute the numerator dot product...
    7.     const Type DotProduct =
    8.         FirstEmbedding.GetValues().dot(SecondEmbedding.GetValues());
    9.  
    10.     // Compute the denominator which is the product of the two vectors' norms...
    11.     const Type NormProduct =
    12.         FirstEmbedding.GetValues().norm() *
    13.         SecondEmbedding.GetValues().norm();
    14.  
    15.     // Compute cosine similarity which is the ratio of the two...
    16.     const Type CosineSimilarity = DotProduct / NormProduct;
    17.  
    18.     // Return similarity to caller...
    19.     return CosineSimilarity;
    20. }

    --
    You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

    Sameer Agarwal

    unread,
    Jul 28, 2024, 11:43:59 AM7/28/24
    to ceres-...@googlegroups.com
    Your code also has way too many layers to do a simple computation. It will be faster and simpler debug this if you have a simple functor which does everything in that one function so I can see what is going on. 

    Kip Warner

    unread,
    Jul 28, 2024, 12:26:28 PM7/28/24
    to ceres-...@googlegroups.com
    On Sun, 2024-07-28 at 08:39 -0700, 'Sameer Agarwal' via Ceres Solver
    wrote:
    > Your GetDistance function is not differentiable.
    > Do not return a constant if the values are the same. 

    Hey Sameer,

    I tried removing the two lines checking for if the two vectors are the
    same and unfortunately it still doesn't differentiate:

    https://pastebin.com/eBDyp0xf

    > Your code also has way too many layers to do a simple computation. It
    > will be faster and simpler debug this if you have a simple functor
    > which does everything in that one function so I can see what is going
    > on.

    I agree. I will try and put together a minimal for you later today.
    signature.asc

    Kip Warner

    unread,
    Jul 29, 2024, 9:01:07 PM7/29/24
    to ceres-...@googlegroups.com
    On Sun, 2024-07-28 at 08:43 -0700, 'Sameer Agarwal' via Ceres Solver
    wrote:
    > Your code also has way too many layers to do a simple computation. It
    > will be faster and simpler debug this if you have a simple functor
    > which does everything in that one function so I can see what is going
    > on. 

    Hey Sameer. I've prepared a minimal for you that you can compile and
    run:

    https://pastebin.com/Xa91vVTR

    Note that the training data for the sake of this minimal is randomly
    generated and I don't know how that might affect autodifferentiation or
    convergence efforts.
    signature.asc

    Sameer Agarwal

    unread,
    Aug 18, 2024, 5:28:51 PM8/18/24
    to ceres-...@googlegroups.com
    This code is fine as far as I can tell.
    With random data I would not expect the solver to make much progress.
    The gradient checking failure you are getting is likely because of the non-differentiable residual which comes from the fmax in the functor. 
    The default number of iterations are not enough you should increase the number of iterations and run with real data to see how it does.
    Sameer


    --
    You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

    Kip Warner

    unread,
    Aug 18, 2024, 7:35:56 PM8/18/24
    to ceres-...@googlegroups.com
    On Sun, 2024-08-18 at 14:28 -0700, 'Sameer Agarwal' via Ceres Solver
    wrote:
    > This code is fine as far as I can tell.

    Glad to hear that (kind of - since it still doesn't work).

    > With random data I would not expect the solver to make much progress.

    I'd imagine it may not be able converge with random data, but the
    minimal should at the very least autodifferentiate?

    > The gradient checking failure you are getting is likely because of
    > the non-differentiable residual which comes from the fmax in the
    > functor. 

    Is that still a problem if we increase the number of iterations or are
    you saying I need to replace the fmax? If the latter I don't know how.

    > The default number of iterations are not enough you should increase
    > the number of iterations and run with real data to see how it does.

    To increase the number of iterations, do you mean to increase
    Solver::Options::max_num_iterations from the default of 150 to
    something higher? If so, I've tried 500, 1500, and 15000 and it still
    gives the same "Gradient Error detected!"
    signature.asc

    Kip Warner

    unread,
    Aug 18, 2024, 7:52:46 PM8/18/24
    to ceres-...@googlegroups.com
    On Sun, 2024-08-18 at 16:35 -0700, Kip Warner wrote:
    > To increase the number of iterations, do you mean to increase
    > Solver::Options::max_num_iterations from the default of 150 to
    > something higher? If so, I've tried 500, 1500, and 15000 and it still
    > gives the same "Gradient Error detected!"

    I tried just now on real data. I set max_num_iterations to 150,000.
    Still getting the same "Gradient Error detected!" My usage of the
    library seems to be forever jinxed.
    signature.asc

    Sameer Agarwal

    unread,
    Aug 18, 2024, 8:02:25 PM8/18/24
    to Kip Warner, ceres-...@googlegroups.com
    On Sun, Aug 18, 2024 at 4:35 PM Kip Warner <k...@thevertigo.com> wrote:
    On Sun, 2024-08-18 at 14:28 -0700, 'Sameer Agarwal' via Ceres Solver
    wrote:
    > This code is fine as far as I can tell.

    Glad to hear that (kind of - since it still doesn't work).

    > With random data I would not expect the solver to make much progress.

    I'd imagine it may not be able converge with random data, but the
    minimal should at the very least autodifferentiate?

    and it does, but the fmax makes it a non-differentiable function so when you have a data point which is close to that point of non-differentiability the automatic derivatives will not match the numeric derivatives use to check them and it will indicate an error. 
     
    > The gradient checking failure you are getting is likely because of
    > the non-differentiable residual which comes from the fmax in the
    > functor. 

    Is that still a problem if we increase the number of iterations or are
    you saying I need to replace the fmax? If the latter I don't know how.

    This has nothing to do with number of iterations. What I am saying is, that it is not something you need to worry about in this current formulation. 
    That said, it is possible you need to change your formulation to get better solutions or that Ceres Solver is not a great fit for this formulation.

     
    > The default number of iterations are not enough you should increase
    > the number of iterations and run with real data to see how it does.

    To increase the number of iterations, do you mean to increase
    Solver::Options::max_num_iterations from the default of 150 to
    something higher? If so, I've tried 500, 1500, and 15000 and it still
    gives the same "Gradient Error detected!"

    Increasing the number of iterations will give you a better solution it does nothing to change gradient checking.
    Sameer

    Kip Warner

    unread,
    Aug 20, 2024, 7:28:27 PM8/20/24
    to ceres-...@googlegroups.com
    On Sun, 2024-08-18 at 17:02 -0700, Sameer Agarwal wrote:
    > and it does, but the fmax makes it a non-differentiable function so
    > when you have a data point which is close to that point of non-
    > differentiability the automatic derivatives will not match the
    > numeric derivatives use to check them and it will indicate an error. 

    Got it. In other words it's a red herring.

    > This has nothing to do with number of iterations. What I am saying
    > is, that it is not something you need to worry about in this current
    > formulation. 
    > That said, it is possible you need to change your formulation to get
    > better solutions or that Ceres Solver is not a great fit for this
    > formulation.

    Thanks Sameer. By disabling the gradient checking and increasing the
    number of iterations, it does report convergence. But but bear in mind
    this is on randomly generated data so I don't expect the solution to be
    optimal. Here is an updated minimal:

    https://pastebin.com/Xa91vVTR

    A few questions for you:

    (1) What I've done is after convergence test the accuracy of the
    solution by checking how many triplet predicates from an independent
    test set (not the set used to train) still hold as true on L497-553.

    Note that there is an alpha margin used in the cost functor at L306-313
    to ensure the positive and negative embeddings have at least some
    minimum arbitrary space between them when training. But after
    convergence to check the solution's accuracy I don't think I need to
    ensure that the margin is still respected. Only that the distance
    between the anchor and positive is less than the anchor and the
    negative on L551. Is that reasonable?

    (2) I've set the maximum number of iterations to 500,000. The number of
    iterations necessary to find a good solution I'm assuming will vary
    widely across different problem domains and formulations. Is that
    correct?

    (3) When I test on real data it took over three hours to converge using
    all logical cores using Solver::Options::num_threads set to
    thread::hardware_concurrency(). Are there any options for Ceres Solver
    you would recommend I experiment with to reduce that time?

    Cost:
    Initial 1.987583e+03
    Final 1.385421e+03
    Change 6.021622e+02

    Minimizer iterations 10232
    Successful steps 10222
    Unsuccessful steps 10

    Time (in seconds):
    Preprocessor 0.001859

    Residual only evaluation 45.426662 (10232)
    Jacobian & residual evaluation 11811.069377 (10222)
    Linear solver 15.100255 (10232)
    Minimizer 11875.343345

    Postprocessor 0.000121
    Total 11875.345325

    Termination: CONVERGENCE (Function tolerance reached. |cost_change|/cost: 3.994556e-07 <= 1.000000e-06)

    (4) Based on the above time spent in the Jacobian & residual
    evaluation, would it make sense to leverage a GPU and set
    Solver::Options::dense_linear_algebra_library_type to ceres::CUDA? Or
    would the amount of time moving data from CPU / RAM to the GPU make
    that pointless?

    (5) You can see that I've disabled bounds constraints on L420-436. I
    don't see why they'd be necessary in this problem. Would you agree?

    > Increasing the number of iterations will give you a better solution
    > it does nothing to change gradient checking.

    Understood. Which leads me to (6), is there anything else you would
    suggest that might increase the quality of the solution? I tried
    reducing the size of Solver::Options::function_tolerance from the
    default 1e-6 to 1e-12, but to my surprise that reduced the accuracy. As
    I understand it, that is the ratio of the current iteration's cost
    delta and the remaining cost.

    If I set Dimensions to 20 on L36, despite using random data, the
    accuracy is still 56.12 %.
    signature.asc

    Sameer Agarwal

    unread,
    Aug 21, 2024, 3:38:57 PM8/21/24
    to Kip Warner, ceres-...@googlegroups.com
    Yes
     

    (2) I've set the maximum number of iterations to 500,000. The number of
    iterations necessary to find a good solution I'm assuming will vary
    widely across different problem domains and formulations. Is that
    correct?

    yes but I suspect you are not going to need half a million iterations.  
     

    (3) When I test on real data it took over three hours to converge using
    all logical cores using Solver::Options::num_threads set to
    thread::hardware_concurrency(). Are there any options for Ceres Solver
    you would recommend I experiment with to reduce that time?

    you should not need to run the solver for so long. I doubt the solution is changing much after some thousands of iterations.
    If it is, then you have problems with the formulation.
     

       Cost:
       Initial                          1.987583e+03
       Final                            1.385421e+03
       Change                           6.021622e+02

       Minimizer iterations                    10232
       Successful steps                        10222
       Unsuccessful steps                         10

       Time (in seconds):
       Preprocessor                         0.001859

         Residual only evaluation          45.426662 (10232)
         Jacobian & residual evaluation 11811.069377 (10222)
         Linear solver                     15.100255 (10232)
       Minimizer                        11875.343345

       Postprocessor                        0.000121
       Total                            11875.345325

       Termination:                      CONVERGENCE (Function tolerance reached. |cost_change|/cost: 3.994556e-07 <= 1.000000e-06)

    (4) Based on the above time spent in the Jacobian & residual
    evaluation, would it make sense to leverage a GPU and set
    Solver::Options::dense_linear_algebra_library_type to ceres::CUDA? Or
    would the amount of time moving data from CPU / RAM to the GPU make
    that pointless?
     
    no. Your problem is tiny. you are not spending any time in the linear solver. All your time is in the Jacobian evaluation. I would focus on optimizing it.
    Further your objective function is easily analytically differentiable, that should speed up your jacobian computation. But all that comes after you start getting solutions.

    (5) You can see that I've disabled bounds constraints on L420-436. I
    don't see why they'd be necessary in this problem. Would you agree?

    you want the weights to be positive no?
     

    > Increasing the number of iterations will give you a better solution
    > it does nothing to change gradient checking.

    Understood. Which leads me to (6), is there anything else you would
    suggest that might increase the quality of the solution? I tried
    reducing the size of Solver::Options::function_tolerance from the
    default 1e-6 to 1e-12, but to my surprise that reduced the accuracy. As
    I understand it, that is the ratio of the current iteration's cost
    delta and the remaining cost.

    fuction_tolerance is a convergence hack. Are you sure you were comparing the same data in both cases when you reduced the tolerance? it should not lead to a worse solution.

    Sameer

    Kip Warner

    unread,
    Aug 24, 2024, 8:49:05 PM8/24/24
    to Sameer Agarwal, ceres-...@googlegroups.com
    On Wed, 2024-08-21 at 12:38 -0700, Sameer Agarwal wrote:
    > no. Your problem is tiny. you are not spending any time in the linear
    > solver. All your time is in the Jacobian evaluation. I would focus on
    > optimizing it.

    Thanks Sameer. Any tips on that? I had suspected a GPU wouldn't do
    anything here.

    > Further your objective function is easily analytically
    > differentiable, that should speed up your jacobian computation. But
    > all that comes after you start getting solutions.

    Sorry, I don't follow. Do you mean the objective function is easily
    autodifferentiating by the optimizer, but that it should speed up as
    the optimizer gets closer to convergence? I don't know what I could do
    to speed up the Jacobian computation?

    > > (5) You can see that I've disabled bounds constraints on L420-436.
    > > I don't see why they'd be necessary in this problem. Would you
    > > agree?
    >
    > you want the weights to be positive no?

    I don't think it should in theory matter what value the weights are,
    provided they create useful embeddings that respect as many of the
    triplet predicates as possible. Or is there a benefit to having them
    bounds constrained to being positive / some range?

    > >
    > > Understood. Which leads me to (6), is there anything else you would
    > > suggest that might increase the quality of the solution? I tried
    > > reducing the size of Solver::Options::function_tolerance from the
    > > default 1e-6 to 1e-12, but to my surprise that reduced the
    > > accuracy. As I understand it, that is the ratio of the current
    > > iteration's cost delta and the remaining cost.
    >
    > fuction_tolerance is a convergence hack. Are you sure you were
    > comparing the same data in both cases when you reduced the tolerance?
    > it should not lead to a worse solution.

    You might be right. I was trying to reproduce this in a minimal for you
    but having difficulty. It may be that I had changed something else that
    was the real reason why the accuracy varied.
    signature.asc

    Sameer Agarwal

    unread,
    Aug 24, 2024, 8:53:33 PM8/24/24
    to Kip Warner, ceres-...@googlegroups.com


    On Sat, Aug 24, 2024, 5:49 PM Kip Warner <k...@thevertigo.com> wrote:
    On Wed, 2024-08-21 at 12:38 -0700, Sameer Agarwal wrote:
    > no. Your problem is tiny. you are not spending any time in the linear
    > solver. All your time is in the Jacobian evaluation. I would focus on
    > optimizing it.

    Thanks Sameer. Any tips on that? I had suspected a GPU wouldn't do
    anything here.

    > Further your objective function is easily analytically
    > differentiable, that should speed up your jacobian computation. But
    > all that comes after you start getting solutions.

    Sorry, I don't follow. Do you mean the objective function is easily
    autodifferentiating by the optimizer, but that it should speed up as
    the optimizer gets closer to convergence? I don't know what I could do
    to speed up the Jacobian computation?

    What I mean is compute the derivatives by hand rather than using automatic differentiation.


    > > (5) You can see that I've disabled bounds constraints on L420-436.
    > > I don't see why they'd be necessary in this problem. Would you
    > > agree?
    >
    > you want the weights to be positive no?

    I don't think it should in theory matter what value the weights are,
    provided they create useful embeddings that respect as many of the
    triplet predicates as possible. Or is there a benefit to having them
    bounds constrained to being positive / some range?

    This is a matter for you to decide how you want to think about the underlying model.

    Sameer 

    Kip Warner

    unread,
    Aug 24, 2024, 9:50:28 PM8/24/24
    to Sameer Agarwal, ceres-...@googlegroups.com
    On Sat, 2024-08-24 at 17:53 -0700, Sameer Agarwal wrote:
    > What I mean is compute the derivatives by hand rather than using
    > automatic differentiation.

    I'd have to dust-off my first year differential calculus textbook.

    But assuming I manage to hand compute the derivatives, are you saying
    that would solve the performance bottleneck?

    I just assumed the optimizer computed the derivatives only once in the
    beginning and thereafter keeps feeding them candidate values on each
    iteration. But if I understand correctly, the Jacobian matrices that
    the optimizer computes every iteration _is_ the process it needs to
    follow in order to compute the derivatives every time?
    signature.asc

    Sameer Agarwal

    unread,
    Aug 24, 2024, 9:52:12 PM8/24/24
    to Kip Warner, ceres-...@googlegroups.com

    I am saying that it will be faster since the function is essentially linear.

    Kip Warner

    unread,
    Aug 24, 2024, 9:53:13 PM8/24/24
    to Sameer Agarwal, ceres-...@googlegroups.com
    On Sat, 2024-08-24 at 18:51 -0700, Sameer Agarwal wrote:
    > I am saying that it will be faster since the function is essentially
    > linear.

    Ok. I will experiment when I can.
    signature.asc

    Kip Warner

    unread,
    Aug 28, 2024, 8:27:58 PM8/28/24
    to ceres-...@googlegroups.com, Sameer Agarwal
    On Sun, 2024-08-18 at 14:28 -0700, 'Sameer Agarwal' via Ceres Solver
    wrote:
    > This code is fine as far as I can tell.
    > With random data I would not expect the solver to make much progress.

    Hey Sameer,

    I've been experimenting with real data and unfortunately, although it
    does converge, there are some obvious problems.

    It appears to take too many iterations, with each one barely decreasing
    the cost. After convergence the change in cost is not very much.

    But most importantly, the total number of predicates that still hold
    true after training seems to average around 50 %, which is the same as
    assigning totally random weights when creating the object embeddings.

    As a test run I used 144 objects and 407 learning example triplets that
    refer to the objects. The latter 407 examples were divided into 325
    used for actually training with the optimizer and the remaining 82 for
    a test set for after training (80 / 20 ratio).

    Cost:
    Initial 1.606899e+00
    Final 1.543389e+00
    Change 6.351000e-02

    Minimizer iterations 524
    Successful steps 516
    Unsuccessful steps 8

    Time (in seconds):
    Preprocessor 0.001421

    Residual only evaluation 0.236995 (524)
    Jacobian & residual evaluation 42.372609 (516)
    Linear solver 0.077782 (524)
    Minimizer 42.731210

    Postprocessor 0.000009
    Total 42.732640

    Termination: CONVERGENCE (Function tolerance reached. |cost_change|/cost: 1.302798e-07 <= 1.000000e-06)

    Test set accuracy: 62.1951 %

    Could the problem be with the cost functor perhaps? I cannot think of a
    creative mathematical way to replace the fmax() while still achieving
    the same end, but open to suggestions. There must be some way to
    express it as a polynomial perhaps.
    signature.asc

    Sameer Agarwal

    unread,
    Aug 28, 2024, 8:29:07 PM8/28/24
    to Kip Warner, ceres-...@googlegroups.com

    At this point this is out of my area of expertise.

    Kip Warner

    unread,
    Sep 22, 2024, 10:27:53 PM9/22/24
    to ceres-...@googlegroups.com
    On Sun, 2024-08-18 at 14:28 -0700, 'Sameer Agarwal' via Ceres Solver
    wrote:
    > The gradient checking failure you are getting is likely because of
    > the non-differentiable residual which comes from the fmax in the
    > functor. 

    Hey Sameer,

    I've had a chance to do some more thinking and experimentation on the
    subject of the non-differentiability of the fmax() function in my cost
    functor. I've attempted to substitute that call with a smooth
    approximation of max() that should be differentiable at L316:

    https://pastebin.com/sv4LKpsg

    The replacement is the Smooth Maximum Unit (SMU) described below. I
    selected an epsilon value of 0.5:

    https://en.wikipedia.org/wiki/Smooth_maximum#Smooth_maximum_unit

    If I re-enable gradient checking, I note that it fails again. If I
    disable it I note that the cost briefly goes up for the first few
    iterations before going down (but only a little).

    I've tried 0.1, 50, and 500 for epsilon and it doesn't seem to make any
    difference.

    Shouldn't cost functor at least autodifferentiate correctly now?
    signature.asc

    Sameer Agarwal

    unread,
    Sep 24, 2024, 1:05:52 PM9/24/24
    to ceres-...@googlegroups.com
    Gradient checking is based on numerical differentiation. Depending on the curvature of the function it can produce false positives. 
    It is worth taking a look at the values at which it is complaining to see what the derivative you are getting and what should be expected.
    Sameer

    --
    You received this message because you are subscribed to the Google Groups "Ceres Solver" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to ceres-solver...@googlegroups.com.

    Kip Warner

    unread,
    Sep 24, 2024, 1:09:13 PM9/24/24
    to ceres-...@googlegroups.com, Sameer Agarwal
    On Tue, 2024-09-24 at 10:05 -0700, Sameer Agarwal wrote:
    > Gradient checking is based on numerical differentiation. Depending on
    > the curvature of the function it can produce false positives. 
    > It is worth taking a look at the values at which it is complaining to
    > see what the derivative you are getting and what should be expected.

    Thanks Sameer. I will do some more experimentation.
    signature.asc
    Reply all
    Reply to author
    Forward
    0 new messages