Least squares similarity transformation

Kaba

unread,

Jun 11, 2008, 3:21:08 PM6/11/08

to

Hi,

I am trying to derive a formula for the least square similarity
transformation between two paired point sets.

A similarity, in geometric sense, is a composition of rotation, scaling
and translation. Below is my derivation thus far. Where I get stuck is
that the equations for partial derivatives of the function I try to
minimize seem to be such that I can't get them into matrix form. But I
have seen a derivation which was solved by linear algebra.

How to get forward?

Least squares similarity transformation in 2D
---------------------------------------------

Given are two point sets P = {p_i}, Q = {q_i}
with n points in each. Find the "best"
similarity transformation (rotation + scale + translation)
that approximatively maps each p_i to q_i.

Notation
--------

p_i = [p_i_x, p_i_y]^T
q_i = [q_i_x, q_i_y]^T

p = [p_1, ..., p_n]
q = [q_1, ..., q_n]

R = [cos(r), -sin(r)]
[sin(r), cos(r)]

t = [t_x, t_y]^T

Formal problem statement
------------------------

Let

f(p; s, r, t) = sRp + t

Find the parameters s, r and t such that

sum_i |q_i - f(p_i; s, r, t)|^2 is minimized

Solution
--------

Let

g(s, r, t)
= sum_i |q_i - f(p_i; s, r, t)|^2
= sum_i |q_i - sRp_i - t|^2
= sum_i (q_i - sRp_i - t)^T (q_i - sRp_i - t)
= sum_i (q_i - t)^T (q_i - t) - 2s p_i^T R^T (q_i - t) + s^2 p_i^T p_i
= sum_i q_i^T q_i - 2q_i^T t + t^T t - 2s p_i^T R^T q_i + 2s p_i^T R^T t
+ s^2 p_i^T p_i

In the above derivation note that R^T R = I.

The minimum is at the position which satisfies
(grad for gradient)

grad(g)(s, r, t) = 0

<=>

{
(dg/ds)(s, r, t) = 0
(dg/dr)(s, r, t) = 0
grad_t(g)(s, r, t) = 0
}

Recall the following differentiation rules:

grad_v(u^T v) = u^T
grad_v(v^T M v) = v^T (M + M^T)
grad_v(v^T v) = grad_v(v^T I v) = v^T (I + I^T) = 2v
d(u^T M(r) v)/dr = u^T (dM/dr)(r) v (the dM/dr means "element-wise")

Then

(dg/ds)(s, r, t)
= -2 p_i^T R^T q_i + 2 p_i^T R^T t + 2s p_i^T p_i

(dg/dr)(s, r, t)
= 2s * p_i^T S^T (t - q_i)

where
S = dR/dr = [-sin(r) -cos(r)]
[ cos(r) -sin(r)]

grad_t(g)(s, r, t)
= -2q_i^T + 2t + 2s p_i^T R^T

The next step: equate to zero and find out s, r and t. But how?

--
http://kaba.hilvi.org

Rob Johnson

unread,

Jun 17, 2008, 8:50:05 PM6/17/08

to

In article <MPG.22ba521aa...@news.cc.tut.fi>,

I have adapted the result from

<http://www.whim.org/nebula/math/affineregress.html>

to cover this case, which is to find a least squares conformal map of
the form Q = P M + R. Unfortunately, this requires performing a polar
decomposition of a matrix. Fortunately, most CAS can perform polar
decompositions of matrices.

Conformality adds the constraint that M M^T = t^2 I for some scalar t.

2 T
t I = M M

T T T T T T
2t dt I = d(MM ) = dM M + M dM = (dM M ) + dM M

which implies that

T
dM M = dA + t dt I

dM = (dA + t dt I) M/t^2 (where dA is skew-symmetric)

As in the article cited above, taking the first difference of the
least squares condition yields

n
---
0 = > 
--- j j j
j=1

As shown in the cited article, letting dM = 0, we get the condition

_ _
P M + R = Q [1]

where

n n
_ 1 --- _ 1 ---
P = - > P and Q = - > Q
n --- j n --- j
j=1 j=1

Letting dR = 0, we get

n
---
0 = > 
--- j j j
j=1

n
1 ---
= - > 
t --- j j j
j=1

n
---
= dt > 
--- j j j
j=1

n
1 ---
+ - > < R - Q , P dA M >
t --- j j
j=1

We didn't forget a P_j M in the last sum; since dA is skew-symmetric,
we have < P_j M , P_j dA M > = 0.

Letting dA = 0, we get the condition

n
---
0 = > 
--- j j j
j=1

We can eliminate R using [1]:

n
--- _ _ _
0 = > < ( P - P ) M - ( Q - Q ) , ( P - P ) M > [2]
--- j j j
j=1

Letting dt = 0, we are left with

n
---
0 = > < R - Q , P dA M >
--- j j
j=1

for any skew-symmetric dA. Similar to [7] in the cited article, this
implies

n
--- T T T
0 = > P ( R - Q ) M - M ( R - Q ) P
--- j j j j
j=1

which, through simple manipulation, remembering M M^T = r^2 I and
using [1], yields

n
--- _ T _ _ T _
0 = > ( ( P - P ) M ) ( Q - Q ) - ( Q - Q ) ( P - P ) M
--- j j j j
j=1

which requires that

n
--- _ T _
> ( Q - Q ) ( P - P ) M
--- j j
j=1

be symmetric. If we define

n
--- _ T _
S = > ( Q - Q ) ( P - P )
---
j=1

then we need to have S M be symmetric. Let S = U D be the polar
decomposition of S where U is orthonormal and D is symmetric positive
definite. If M = t U^T, then S M = t U D U^T, which is symmetric.

Here is the method we have developed
====================================

First, compute the centroids

n n
_ 1 --- _ 1 ---
P = - > P and Q = - > Q
n --- j n --- j
j=1 j=1

Next, compute the matrix

n
--- _ T _
S = > ( Q - Q ) ( P - P )
---
j=1

Let S = U D be the polar decomposition of S where U is orthonormal
and D is symmetric positive definite. Next compute t by

n n
--- _ 2 --- _ _
t > | P - P | = > < ( Q - Q ) U , ( P - P ) >
--- j --- j j
j=1 j=1

Finally, we have the desired conformal map Q = P M + R where

T
M = t U

and
_ _
R = Q - P M

Rob Johnson <r...@trash.whim.org>
take out the trash before replying
to view any ASCII art, display article in a monospaced font

Rob Johnson

unread,

Jun 17, 2008, 9:43:17 PM6/17/08

to

In article <2008061...@whim.org>,

I wrote:
> n
> --- _ T _
> S = > ( Q - Q ) ( P - P )
> ---
> j=1

I left ou the indices in the sum, the last part of the post should be

Rob Johnson

unread,

Jun 17, 2008, 9:52:45 PM6/17/08

to

In article <2008061...@whim.org>,
I wrote:

> n
> --- _ T _
> S = > ( Q - Q ) ( P - P )
> ---
> j=1

Once again, I left out the indices in the sum, the last part of the
post should be

Here is the method we have developed
====================================

First, compute the centroids

n n
_ 1 --- _ 1 ---
P = - > P and Q = - > Q
n --- j n --- j
j=1 j=1

Next, compute the matrix

n
--- _ T _
S = > ( Q - Q ) ( P - P )

--- j j

Rob Johnson

unread,

Jun 18, 2008, 3:44:16 PM6/18/08

to

In article <MPG.22ba521aa...@news.cc.tut.fi>,
Kaba <no...@here.com> wrote:

>I am trying to derive a formula for the least square similarity
>transformation between two paired point sets.
>
>A similarity, in geometric sense, is a composition of rotation, scaling
>and translation. Below is my derivation thus far. Where I get stuck is
>that the equations for partial derivatives of the function I try to
>minimize seem to be such that I can't get them into matrix form. But I
>have seen a derivation which was solved by linear algebra.
>
>How to get forward?

In addition to the previous thread, I have also added a page to my
website with the solution to this problem:

<http://www.whim.org/nebula/math/conformalregress.html>

I hope it is of some help.

Kaba

unread,

Jun 18, 2008, 5:56:22 PM6/18/08

to

Rob Johnson wrote:
> In addition to the previous thread, I have also added a page to my
> website with the solution to this problem:
>
> <http://www.whim.org/nebula/math/conformalregress.html>
>
> I hope it is of some help.

Thank you very much, this is exactly what I am looking for.

Could you explain a little bit more where the second equation comes
from?

I am having trouble with the use of variations. Are dM and dR
differentials?

--
http://kaba.hilvi.org

Rob Johnson

unread,

Jun 18, 2008, 7:55:18 PM6/18/08

to

In article <MPG.22c3b100b...@news.cc.tut.fi>,

dM and dR are rates of change, dM can be any allowable rate of change
of M and dR can be any allowable rate of change of R. There is no
restriction on dR since there is no restriction on R. There is a
restriction on dM since M is supposed to be a uniformly scaled
isometry; that is M M^T = r^2 I.

To make things more conventional, you could replace dM by dM/dt and
dR by dR/dt. Looking at them this way, I guess you could call them
differentials.

Equation [2] is simply the rate of change of equation [1], where we
have |x|^2 = <x,x>.

Han de Bruijn

unread,

Jun 19, 2008, 3:22:31 AM6/19/08

to

Rob Johnson wrote:

Sorry for jumping in so lately, but I find the above of interest, while
I didn't keep up with the thread. What _is_ a conformal transformation,
and how does that compare with the requirement in your writeup that:

M M^T = r^2 I ? E.g. what is r ?

Han de Bruijn

Rob Johnson

unread,

Jun 19, 2008, 6:03:31 AM6/19/08

to

In article <649bc$485a0937$82a1e228$33...@news2.tudelft.nl>,

Literally, conformal means "same shape".

"In mathematics, a conformal map is a function which preserves angles"
<http://en.wikipedia.org/wiki/Conformal_map>

"Conformal map projections preserve angles locally"
<http://en.wikipedia.org/wiki/Conformal_map_projection#Conformal>

A conformal affine transformation preserves angles, but not
necessarily size. That means it may translate (R), and it may rotate
and scale uniformly (M M^T = r^2 I). This last condition means that

2
| (x M + R) - (y M + R) |

= < (x - y) M , (x - y) M >

T
= < (x - y) M M , x - y >

2
= r < x - y , x - y >

2 2
= r | x - y |

Thus, the map x -> x M + R scales all distances by r, and therefore
preserves shape and angles.

Han de Bruijn

unread,

Jun 19, 2008, 8:59:38 AM6/19/08

to

Rob Johnson wrote:

Well, the confusion comes from the fact that I've learned some about
conformal mappings in the context of complex function theory. And in
that context, they are non-linear, in general. Thanks for explaining!

I have another puzzle, though. Suppose there are given two triangles,
with their respective coordinates, in the plane. Is there a way then
to establish whether these two triangles may be approximately similar?

Or perhaps: is such a thing already the subject of the current thread?

Han de Bruijn

Rob Johnson

unread,

Jun 19, 2008, 1:43:04 PM6/19/08

to

In article <b58c5$485a583a$82a1e228$15...@news1.tudelft.nl>,

in R^2, these conformal linear maps, can be represented by the linear
f(z) = a + b z or the anti-conformal conj(a + b z).

>I have another puzzle, though. Suppose there are given two triangles,
>with their respective coordinates, in the plane. Is there a way then
>to establish whether these two triangles may be approximately similar?
>
>Or perhaps: is such a thing already the subject of the current thread?

One could run the vertices through the algorithm and compute the
square residue. For example,

P = { (0,2) , (0,0) , (1,0) }

Q = { (2,3) , (0,1) , (1,0) }

_ 1 _ 1
P = - [ 1 2 ] Q = - [ 3 4 ]
3 3

[ 2 ] [ 0 ] [ 1 ] 3 [ 3 ]
S = [ ] [ 0 2 ] + [ ] [ 0 0 ] + [ ] [ 1 0 ] - - [ ] [ 1 2 ]
[ 3 ] [ 1 ] [ 0 ] 9 [ 4 ]

[ 0 4 ] [ 0 0 ] [ 1 0 ] 1 [ 3 6 ]
= [ ] + [ ] + [ ] - - [ ]
[ 0 6 ] [ 0 0 ] [ 0 0 ] 3 [ 4 8 ]

1 [ 0 6 ]
= - [ ]
3 [ -4 10 ]

1 [ 1 1 ] 2 sqrt(2) [ 1 -1 ]
= ------- [ ] --------- [ ]
sqrt(2) [ -1 1 ] 3 [ -1 4 ]

1 [ 1 1 ]
U = ------- [ ]
sqrt(2) [ -1 1 ]

1
r ( 4 + 0 + 1 - 3 5/9 ) = ------- ( <(-1,5),(0,2)> + <(-1,1),(0,0)> + <(1,1),(1,0)> - 3/9 <(-1,7),(1,2)> )
sqrt(2)

r 10/3 = 1/sqrt(2) ( 10 + 0 + 1 - 1/3 13 )

r = sqrt(2)

1 [ 1 1 ]T [ 1 -1 ]
M = sqrt(2) ------- [ ] = [ ]
sqrt(2) [ -1 1 ] [ 1 1 ]

1 [ 1 -1 ]
R = - ( [ 3 4 ] - [ 1 2 ] [ ] )
3 [ 1 1 ]

= [ 0 1 ]

The map is

[ 1 -1 ]
Q = P [ ] + [ 0 1 ]
[ 1 1 ]

The fit is exact; that is, there is no square residue.

Rob Johnson

unread,

Jun 19, 2008, 8:56:36 PM6/19/08

to

In article <b58c5$485a583a$82a1e228$15...@news1.tudelft.nl>,

I was thinking of how to measure the quality of the fit, and I came
up with a decent metric. To expand upon a point I was making before,
the square residue is the quantity we are trying to minimize, so it
makes sense to use that as a measure of fit. However, it would be
nice to assess the fit regardless of the scale of the destination
shape. To this end, it seems proper to define the similarity error
to be

1
--- var( P M + R - Q )
r^2

n
1 1 --- 2
= --- - > | P M + R - Q | [1]
r^2 n --- j j
j=1

This removes the dependence on the scale of the destination shape,
but leaves us with the scale of the source shape. To remove the
dependence on the scale of the source shape, we should divide by

n
1 --- _ 2
var(P) = - > | P - P | [2]
n --- j
j=1

So, it seems that a good measure of fit would be

1 var( P M + R - Q )
--- ------------------ [3]
r^2 var(P)

This can be simplified a bit using our earlier results.

var( P M + R - Q )

n
1 --- _ _ 2
= - > | ( P - P ) M - ( Q - Q ) |
n --- j j
j=1

n
1 --- _ _ _ _
= - > < ( P - P ) M - ( Q - Q ) , ( P - P ) M - ( Q - Q ) >
n --- j j j j
j=1

n n
2 1 --- _ 2 1 --- _ 2
= r ( - > | P - P | ) + - > | Q - Q |

n --- j n --- j
j=1 j=1

n
2 --- _ _
- - > < ( P - P ) M , Q - Q > [4]
n --- j j
j=1

If we look back at the equation we used to compute r, we see that
since M = r U^T, we have

n n
2 1 --- _ 2 1 --- _ _
r ( - > | P - P | ) = - > < ( P - P ) M , Q - Q > [5]

n --- j n --- j j

j=1 j=1

Combining [4] and [5], we see that

2
var( P M + R - Q ) = var(Q) - r var(P) [6]

Putting together [3] and [6], we get the measure of fit to be

1 var(Q)
--- ------ - 1 [7]
r^2 var(P)

So if we know the vertex correspondence, we can simply use [7]. If
we don't, then we can perform the regression for the n! permutations
of the destination vertices and choose the one which has the smallest
measure of fit.

Ray Koopman

unread,

Jun 20, 2008, 3:16:44 AM6/20/08

to

On Jun 19, 5:56 pm, r...@trash.whim.org (Rob Johnson) wrote:
> In article <b58c5$485a583a$82a1e228$15...@news1.tudelft.nl>,

> Han de Bruijn <Han.deBru...@DTO.TUDelft.NL> wrote:
>
>
>
> >Rob Johnson wrote:

> >> In article <649bc$485a0937$82a1e228$3...@news2.tudelft.nl>,

> >> Han de Bruijn <Han.deBru...@DTO.TUDelft.NL> wrote:
>
> >>>Rob Johnson wrote:
>

> >>>>In article <MPG.22ba521aaaba61c7989...@news.cc.tut.fi>,

The method you present is known in psychometrics as the Schonemann-
Carroll procedure. For a discussion of measuring fit, see

Lingoes, J.C. & Schonemann, P.H. Alternative measures of fit for the
Schonemann-Carroll matrix fitting algorithm. Psychometrika, 1974, 39,
423-427.

Kaba

unread,

Jun 20, 2008, 5:46:49 AM6/20/08

to

Rob Johnson wrote:
> So if we know the vertex correspondence, we can simply use [7]. If
> we don't, then we can perform the regression for the n! permutations
> of the destination vertices and choose the one which has the smallest

A side note:
Finding a best similarity (= conformal affine) transformation (or affine
or rigid or..) without vertex correspondence falls under the field of
point pattern matching. There one can find fast algorithms for this
task. However, they are usually restricted to 2d or 3d. As an example:

http://www.cs.mcgill.ca/~cs644/Godfried/2005/Fall/mbouch/

--
http://kaba.hilvi.org

Rob Johnson

unread,

Jun 20, 2008, 2:08:00 PM6/20/08

to

In article <3e3b3ed3-d3d4-4fc9...@q27g2000prf.googlegroups.com>,

Thank you very much for that reference. I had not had any luck in
finding references on this subject. Now that I know some proper
keywords, I have found several references. I see from this reference
and other recursive references,

Schonemann, Peter H. & Carroll, Robert M. Fitting One Matrix To
Another Under Choice Of A Central Dilation And A Rigid Motion.
Psychometrika 1970, 35, 245-255.

Schonemann, Peter H. A Generalized Solution Of The Orthogonal
Procrustes Problem. Psychometrika 1966, 31, 1-10.

that this is also known as the the Orthogonal Procrustes Problem. I
also notice that some of the papers use Singular Value Decomposition
rather than Polar Decomposition. Although Mathematica performs Polar
Decomposition, my HP calculator only does SVD. Fortunately, it turns
out it is pretty easy to perform a Polar Decomposition using SVD.

Rob Johnson

unread,

Jun 20, 2008, 2:41:45 PM6/20/08

to

In article <MPG.22c5a9007...@news.cc.tut.fi>,

Thanks. I wil look into that link.

Ray Koopman

unread,

Jun 20, 2008, 2:50:24 PM6/20/08

to

On Jun 20, 11:08 am, r...@trash.whim.org (Rob Johnson) wrote:
> [...] Although Mathematica performs PolarDecomposition,

> my HP calculator only does SVD. Fortunately, it turns out
> it is pretty easy to perform a Polar Decomposition using SVD.

Which is a good thing, because in version 6.0 Wolfram dropped
PolarDecomposition from the LinearAlgebra`MatrixManipulation`
package. Now you have to do it using SVD.

Rob Johnson

unread,

Jun 20, 2008, 11:00:43 PM6/20/08

to

In article <30da5827-73f0-42ea...@c19g2000prf.googlegroups.com>,

Thanks for the update. I have added the way to convert from Singular
Value Decomposition to Polar Decomposition to the page cited earlier:
<http://www.whim.org/nebula/math/conformalregress.html>.

Ray Koopman

unread,

Jun 21, 2008, 12:19:08 AM6/21/08

to

On Jun 20, 8:00 pm, r...@trash.whim.org (Rob Johnson) wrote:
>
> Thanks for the update. I have added the way to convert from Singular
> Value Decomposition to Polar Decomposition to the page cited earlier:
> <http://www.whim.org/nebula/math/conformalregress.html>.

Some time ago I ran into a similar problem (in > 2 dimensions) which
had the additional requirement that the transformation be a "proper"
rotation, with determinant = +1. The best I could suggest if the SVD-
based approach gave a matrix whose determinant was -1 was to minimize
the error by a Jacobi-like sequence of single-plane rotations. What
would you do in this situation?

Han de Bruijn

unread,

Jun 23, 2008, 3:37:57 AM6/23/08

to

Ray Koopman wrote:

With my little project 'Sunk into an Idea' I couldn't think of anything
better than a general affine transformation and hope it wouldn't distort
too much. But what I really meant is a combination of rotation, scaling
and skewing. I mean: the ways in which handwritten characters can vary.
Same question: what would you do in this situation? Reference at:

http://hdebruijn.soo.dto.tudelft.nl/jaar2008/index.htm

Han de Bruijn

Rob Johnson

unread,

Jun 23, 2008, 5:16:27 PM6/23/08

to

In article <0fd1049b-10ee-4762...@j1g2000prb.googlegroups.com>,

In <http://www.whim.org/nebula/math/conformalregress.html> the least
squares conformal map is computed. I will refer to this as [CR].

If the M computed in [CR] has negative determinant, then we must
reflect the result of P M + R across a hyperplane. The hyperplane
that would least affect the squares minimized in [CR], would be the
hyperplane which has the minimum square distance to the { Q_j }.
The unit normal to this hyperplane is the unit W that minimizes

n
--- _ 2
> < Q - Q , W > [1]
--- j
j=1

Thus we need to find the W so that

n
--- _ _
0 = > < Q - Q , W > < Q - Q , dW > [2]
--- j j
j=1

for all dW under the condition that |W| = 1, that is

n
---
0 = > < W , dW > [3]
---
j=1

Using orthogonality, we see that we need to have

n
--- _ _
k W = > < Q - Q , W > ( Q - Q ) [4]
--- j j
j=1

That is, W is a unit eigenvector of

n
--- _ T _

N = > ( Q - Q ) ( Q - Q ) [5]
--- j j
j=1

Note that [1] equals

T
W N W [6]

and since W is minimizes [1], W must be the eigenvector of N whose
eigenvalue has the smallest absolute value. Thus, the reflection
matrix is

T
I - 2 W W [7]

Since we can still rescale, define

T
M' = r' M ( I - 2 W W ) [8]

from which it follows that

T 2 T 2
M' M' = r' M M = ( r' r ) I [9]

Similar to [11] in [CR], varying the scale parameter alone yields
a critical point when

n
--- _ _ _

0 = > < ( P - P ) M' - ( Q - Q ) , ( P - P ) M' > [10]

--- j j j
j=1

Therefore, we can compute r' with

n
--- _ 2
r' > | ( P - P ) M |
--- j
j=1

n
--- _ _ T
= > < Q - Q , ( P - P ) M ( I - 2 W W ) > [11]
--- j j
j=1

Then, using [11] in [CR] and [11] above, we get

n
1 - r' --- _ 2
------ > | ( P - P ) M |
2 --- j
j=1

n
--- _ _
= > < Q - Q , W > < ( P - P ) M , W > [12]
--- j j
j=1

and since the translation must still obey [5] from [CR], we need

_ _
R' = Q - P M' [13]

Summary
-------
Compute M from [CR].

Compute

n
--- _ T _

N = > ( Q - Q ) ( Q - Q )
--- j j
j=1

Let W be the unit eigenvector of N whose eigenvalue has the smallest
absolute value. Then compute r' from

n
1 - r' --- _ 2
------ > | ( P - P ) M |
2 --- j
j=1

n
--- _ _
= > < Q - Q , W > < ( P - P ) M , W >
--- j j
j=1

Then compute M' as

T
M' = r' M ( I - 2 W W )

Finally, calculate R' using

_ _
R' = Q - P M'

That should get us close.

Rob Johnson

unread,

Jun 23, 2008, 5:17:47 PM6/23/08

to

In article <3dd14$485f52d7$82a1e228$47...@news2.tudelft.nl>,

Han de Bruijn <Han.de...@DTO.TUDelft.NL> wrote:

Have you taken a look at the least squares regression for general
affine maps that I mentioned early in this thread:

<http://www.whim.org/nebula/math/affineregress.html>

The method of solution there requires only inversion rather than
polar or singular value decomposition.

Ray Koopman

unread,

Jun 24, 2008, 2:04:00 AM6/24/08

to

On Jun 23, 2:16 pm, r...@trash.whim.org (Rob Johnson) wrote:
> In article <0fd1049b-10ee-4762-9d94-9289d69bd...@j1g2000prb.googlegroups.com>,

Thanks, Rob.

Ray Koopman

Han de Bruijn

unread,

Jun 24, 2008, 4:33:59 AM6/24/08

to

Rob Johnson wrote:

You're great! Thanks!

Han de Bruijn

Kaba

unread,

Jun 24, 2008, 3:55:36 PM6/24/08

to

Rob Johnson wrote:
> >Could you explain a little bit more where the second equation comes
> >from?
> >

> To make things more conventional, you could replace dM by dM/dt and
> dR by dR/dt. Looking at them this way, I guess you could call them
> differentials.
>
> Equation [2] is simply the rate of change of equation [1], where we
> have |x|^2 = <x,x>.

Rob,

I am still having difficulties deriving [2] from [1].
What I try to do is to take the error formula [1] and find it's partial
derivatives with respect to M_uv, the components of M, one at a time.
But this gets very messy at the component level, particularly when I try
to find the partial derivative of M_uv for:
sum_j p_j^T M^T M p_j

Maybe you have some tricks to avoid going into the component level?

How did you derive [2]?

These two pages are great! The 2-dimensional art and all:) I wonder if
the derivations were further simplified by the following.

For the affine transformation:
* first consider the problem of finding a least squares linear
transformation between two point sets
* then take along the translation and show that the midpoint must map to
midpoint and use that to reduce to the linear case

Similarly for the conformal affine transformation:
* first consider the problem of finding a least squares orthogonal
transformation between two point sets
* take along translation and reduce to the previous case

Well, not sure if it's worth the trouble.

--
http://kaba.hilvi.org

Rob Johnson

unread,

Jun 24, 2008, 4:31:08 PM6/24/08

to

In article <MPG.22cb7db53...@news.cc.tut.fi>,

Kaba <no...@here.com> wrote:
>Rob Johnson wrote:
>> >Could you explain a little bit more where the second equation comes
>> >from?
>> >
>> To make things more conventional, you could replace dM by dM/dt and
>> dR by dR/dt. Looking at them this way, I guess you could call them
>> differentials.
>>
>> Equation [2] is simply the rate of change of equation [1], where we
>> have |x|^2 = <x,x>.
>
>Rob,
>
>I am still having difficulties deriving [2] from [1].
>What I try to do is to take the error formula [1] and find it's partial
>derivatives with respect to M_uv, the components of M, one at a time.
>But this gets very messy at the component level, particularly when I try
>to find the partial derivative of M_uv for:
>sum_j p_j^T M^T M p_j
>
>Maybe you have some tricks to avoid going into the component level?
>
>How did you derive [2]?

Just taking the variation of [1] (and dividing by 2). Remember, the
variation is the rate of change of the matrix M and the vector R.

n
--- 2

> | P M + R - Q |

--- j j
j=1

n
---
= >

--- j j j j
j=1

The P_j and Q_j are fixed, so the only things that will change in the
variation are M and R. Let dM be the rate of change of M and dR be
the rate of change of R, then the rate of change of the square error
is

n
---
> + 
--- j j j j j j
j=1

n
---
= 2 >

--- j j j
j=1

Now, since we are minimizing the square error, this variation must
be 0:

n
---
0 = >

--- j j j
j=1

which is [2].

>These two pages are great! The 2-dimensional art and all:) I wonder if
>the derivations were further simplified by the following.
>
>For the affine transformation:
>* first consider the problem of finding a least squares linear
>transformation between two point sets
>* then take along the translation and show that the midpoint must map to
>midpoint and use that to reduce to the linear case
>
>Similarly for the conformal affine transformation:
>* first consider the problem of finding a least squares orthogonal
>transformation between two point sets
>* take along translation and reduce to the previous case
>
>Well, not sure if it's worth the trouble.

Since [2] must hold for all allowable dM and dR, and one allowable dM
is 0, we have [3] for any allowable dR

n
---
0 = > 
--- j j
j=1

There are no restrictions on dR, so [3] has to hold for any dR, that
is, [ 1 0 0 0 ... 0 ], [ 0 1 0 0 ... 0 ], [ 0 0 1 0 ... 0 ], etc.
This means that no matter against which component we test

n
---
0 = > P M + R - Q
--- j j
j=1

it is true, so it must be true for all components, and therefore, for
the whole vector. Dividing this by n, this yields

_ _
P M + R = Q

which is [5]. This is used to eliminate R from most equations,
reducing the number of unknowns, and to allow us to compute R later,
after we know M.

I hope that helps to make things clearer.

Kaba

unread,

Jun 24, 2008, 4:45:55 PM6/24/08

to

Rob Johnson wrote:
> >How did you derive [2]?
>
> Just taking the variation of [1] (and dividing by 2). Remember, the
> variation is the rate of change of the matrix M and the vector R.

Hmm.. Rate of change with respect to what?

I tried w.r.t. the components of M (n^2 equations) and R (n equations),
one at a time.

Could you give the definition of dM and dR formally?

Is variation a formally defined term? I have never heard it before..

Thanks:)

--
http://kaba.hilvi.org

Rob Johnson

unread,

Jun 24, 2008, 8:30:00 PM6/24/08

to

In article <MPG.22cb8982b...@news.cc.tut.fi>,

A variation is a generalization of a derivative. Most commonly used
in the calculus of variations, variations are used in a wide variety
of optimization problems. They are usually denoted by a Greek letter
delta rather than the Roman 'd', but as we cannot use Greek letters
in a text based forum, I use d or D where they can be used without
confusion, and '&' or '%' when circumstances demand. Below, I will
use '&'.

<http://en.wikipedia.org/wiki/First_variation> defines a variation
in a way that explicitly displays the variation, but also makes the
notation a bit more cumbersome. In their definition, of &J, I would
simply write &y for h. &y, the variation of y, is usually not fully
specified, but restricted in some way. In our case, &M is a matrix
which represents an n^2 dimensional direction in which M is being
perturbed. &R is an n dimensional direction in which R is being
perturbed. There is no restriction on &R, but we need M M^T to be
a multiple of the identity matrix. To see how this restricts &M,
we note that

T 2
M M = r I

As you can see from the wikipedia definition, & should behave just
like a derivative. Thus, we get

T T T T T
&M M + M &M = &M M + ( &M M ) = 2 r &r I

This last equation implies that

T
&M M = &A + r &r I

where &A is skew-symmetric. That is, if we know the sum of a matrix
and its transpose, we know the matrix modulo a skew-symmetric matrix.

Now since M M^T = r^2 I, we also have M^T M = r^2 I, thus we get

&M = ( &A + r &r I ) M/r^2

&M is restricted to matrices of this form (a skew-symmetric matrix
plus a multiple of the identity all times M).

So the short answer to "what is a variation" is that it is a
directional derivative where the "direction" is unspecified but
partially restricted. Depending on the operand of the functional
being varied, the direction can be a function, a matrix, a vector,
or simply a scalar.

Han de Bruijn

unread,

Jun 25, 2008, 3:32:43 AM6/25/08

to

Rob Johnson wrote:

> A variation is a generalization of a derivative. Most commonly used
> in the calculus of variations, variations are used in a wide variety
> of optimization problems. They are usually denoted by a Greek letter
> delta rather than the Roman 'd', but as we cannot use Greek letters
> in a text based forum, I use d or D where they can be used without
> confusion, and '&' or '%' when circumstances demand. Below, I will
> use '&'.
>
> <http://en.wikipedia.org/wiki/First_variation> defines a variation

I've always avoided any calculus of variations and replaced it by common
differentiation. In my limited experience this has always been possible.
Question is: is such a replacement _always_ posssible and is a variation
really a generalization of a derivative or just another way to formulate
the very same thing ?

Han de Bruijn

Rob Johnson

unread,

Jun 25, 2008, 5:19:26 AM6/25/08

to

In article <c2f84$4861f49a$82a1e228$11...@news2.tudelft.nl>,

Han de Bruijn <Han.de...@DTO.TUDelft.NL> wrote:

In the case of finite dimensional operands as we have here, it is
more of a notational simplification. The same could be done with
common differentiation. However, when the operands are functions,
such as in the isoperimetric problem, when computing geodesics, etc.,
I don't think it is possible to do the same thing with common
differentiation.

Kaba

unread,

Jun 25, 2008, 5:52:11 AM6/25/08

to

In article <2008062...@whim.org>, r...@trash.whim.org says...

> >Is variation a formally defined term? I have never heard it before..
>
> A variation is a generalization of a derivative. Most commonly used
> in the calculus of variations, variations are used in a wide variety
> of optimization problems. They are usually denoted by a Greek letter

--x8--

Thank you, just what I needed. Btw. should I think of calculus of
variations as a just another small tool in the toolbox, or as something
larger, from which there are several books written?

Rob Johnson

unread,

Jun 25, 2008, 4:47:53 PM6/25/08

to

In article <MPG.22cc41c2a...@news.cc.tut.fi>,

<http://www.google.com/search?q=calculus+of+variations+books>

Kaba

unread,

Jun 25, 2008, 5:06:33 PM6/25/08

to

Rob Johnson wrote:
> >Thank you, just what I needed. Btw. should I think of calculus of
> >variations as a just another small tool in the toolbox, or as something
> >larger, from which there are several books written?
>
> <http://www.google.com/search?q=calculus+of+variations+books>

On a second thought I should have found that myself. Thanks for all your
time!

--
http://kaba.hilvi.org

Rob Johnson

unread,

Jun 25, 2008, 5:20:15 PM6/25/08

to

In article <MPG.22ccdfcb2...@news.cc.tut.fi>,

I should have included the Amazon search: <http://tinyurl.com/5duton>

Ray Koopman

unread,

Jun 26, 2008, 7:31:57 PM6/26/08

to

On Jun 23, 2:16 pm, r...@trash.whim.org (Rob Johnson) wrote:
> In article <0fd1049b-10ee-4762-9d94-9289d69bd...@j1g2000prb.googlegroups.com>,
>

Close, but no cigar! In the simplified version of the pure rotation
problem (no shifting or rescaling allowed), where we're trying to
minimize tr E'E with E = Q - PM, M'M = I, and |M| = 1, I get
better results when |U| = -1 by taking w in [7] to be the right
singular vector associated with the smallest singular value of
Q + PM, instead of just the Q that [5] indicates. But even
that does not do as well as a sequence of single-plane rotations.

Rob Johnson

unread,

Jun 29, 2008, 7:50:32 AM6/29/08

to

In article <aa31493e-7cb1-4031...@v26g2000prm.googlegroups.com>,

Ray Koopman <koo...@sfu.ca> wrote:
>On Jun 23, 2:16 pm, r...@trash.whim.org (Rob Johnson) wrote:
>> In article <0fd1049b-10ee-4762-9d94-9289d69bd...@j1g2000prb.googlegroups.com>,
>>
>> Ray Koopman <koop...@sfu.ca> wrote:
>> >On Jun 20, 8:00 pm, r...@trash.whim.org (Rob Johnson) wrote:
>>
>> >> Thanks for the update. I have added the way to convert from Singular
>> >> Value Decomposition to Polar Decomposition to the page cited earlier:
>> >> <http://www.whim.org/nebula/math/conformalregress.html>.
>>
>> >Some time ago I ran into a similar problem (in > 2 dimensions) which
>> >had the additional requirement that the transformation be a "proper"
>> >rotation, with determinant = +1. The best I could suggest if the SVD-
>> >based approach gave a matrix whose determinant was -1 was to minimize
>> >the error by a Jacobi-like sequence of single-plane rotations. What
>> >would you do in this situation?
>>
>> In <http://www.whim.org/nebula/math/conformalregress.html> the least
>> squares conformal map is computed. I will refer to this as [CR].
>>
>> If the M computed in [CR] has negative determinant, then we must
>> reflect the result of P M + R across a hyperplane. The hyperplane
>> that would least affect the squares minimized in [CR], would be the
>> hyperplane which has the minimum square distance to the { Q_j }.
>> The unit normal to this hyperplane is the unit W that minimizes

[snip specious argument]

>> That should get us close.
>

>Close, but no cigar! In the simplified version of the pure rotation
>problem (no shifting or rescaling allowed), where we're trying to
>minimize tr E'E with E = Q - PM, M'M = I, and |M| = 1, I get
>better results when |U| = -1 by taking w in [7] to be the right
>singular vector associated with the smallest singular value of
>Q + PM, instead of just the Q that [5] indicates. But even
>that does not do as well as a sequence of single-plane rotations.

The assumption that we must flip across a hyperplane was wrong. It
looks good in cases where the data is close to that hyperplane, but
there is no reason it should be best in all cases. Most of all, no
least squares condition was achieved.

Also, in the original solution, Polar Decomposition is much too
restrictive. It requires the radial part to be symmetric positive
semi-definite, but all that is needed is symmetry. I was deceived by
the uniqueness of Polar Decomposition; I didn't realize that I might
need to look further. Let us look closer at Polar Decomposition.

Polar Decomposition is only unique if we require the radial part to
be symmetric positive semi-definite. If we require only symmetry,
then there are 2^k choices for the radial (symmetric) part, where k
is the rank of the matrix being decomposed.

Let us return to where we define

n
--- _ T _

S = > ( Q - Q ) ( P - P ) [1]
--- j j
j=1

and we need to find an M so that M M^T = r^2 I and S M is symmetric.

Finding an orthonormal eigenbasis of S S^T, we can find a unitary U
and diagonal D so that

T 2 T
S S = U D U [2]

Let E be the psedoinverse of D. Pseudoinverses of diagonal matrices
are easy to compute: invert the non-zero elements and leave the zero
elements alone. If D is non-singular, then E is the inverse of D.
The important identity is

D^2 E = D [3]

Using [2], [3], and the fact that U is unitary, we can write the
symmetric matrix U D U^T as

T T
S ( S U E U ) [4]
\_________/
unitary

Thus, we can set

T T
V = S U E U [5]

and

M = r V [6]

However, by individually negating the non-zero diagonal elements of
D, there are 2^k matrices that satisfy [2] where k is the rank of S.
The sum of squares is minimized when we use the one that maximizes

n
--- _ _
> < Q - Q , ( P - P ) V > [7]
--- j j
j=1

I have updated the page at

<http://www.whim.org/nebula/math/conformalregress.html>

to include this change. This should allow us to find the best
solution whether positive or negative determinant.

Rob Johnson

unread,

Jul 2, 2008, 2:13:56 AM7/2/08

to

In article <2008062...@whim.org>,

The proper decomposition to use is the Singular Value Decomposition.
I have changed too much of the argument to copy here. I have updated
the page at

<http://www.whim.org/nebula/math/conformalregress.html>

The update covers how to compute the least squares for the case where
we want positive determinant and also for the case where we do not
want a variable scale.

Ray Koopman

unread,

Jul 2, 2008, 9:35:31 PM7/2/08

to

On Jul 1, 11:13 pm, r...@trash.whim.org (Rob Johnson) wrote:
> In article <20080626.223...@whim.org>,
> Rob Johnson <r...@trash.whim.org> wrote:
>> In article <aa31493e-7cb1-4031-9b3e-6d12d3b97...@v26g2000prm.googlegroups.com>,

Yes, that's what was needed. Now it works. Thanks.