Need some help with variational calculus: First variation

Luca

unread,

Jan 13, 2011, 5:57:57 PM1/13/11

to

Hello,

I have asked this question on another forum where the equations are
rendered. Please see here:

http://physicsforums.com/showthread.php?t=463278

I need a bit of help with finding the first variation of an expression
that I have come up with in my problem. I am working on an image
processing algorithm where I minimise a cost function using a
conjugate gradient optimisation scheme and hence need to calculate the
gradient of the cost function. I never took variational calculus in
school, so am having a bit of trouble.

So, my cost function is as follows:

[tex]f(\varphi) = \int[(T(x) - S(u(\nabla\varphi)]^{2} dx [/tex]

Here, 'x' are just the 3-D spatial coordinates and the expression
above translates to the sum of square difference between an image (T)
and a transformed image (S). The transformation map is given by
[tex]u(\nabla\varphi)[/tex].

So what I tried to do was compute the first variation for this
expression with respect to [tex]\varphi[/tex]. So, my attempt is as
follows. I looked up some information that I could find on the
internet regarding this and did the following:

Let, [tex]A=[(T(x) - S(u(\nabla\varphi)][/tex], then introducing a
perturbation in [tex]\varphi[/tex], we have:

[tex]f(\varphi + \delta\varphi) = \int[A + \delta\varphi]^{2} dx [/
tex]

Now expanding, I have:

[tex] f(\varphi + \delta\varphi) = \int(A^{2})dx + 2\int(A\delta
\varphi) dx + \int(\delta\varphi)^{2} dx [/tex]

which is the same as

[tex]f(\varphi + \delta\varphi) - f(\varphi) = 2\int(A\delta\varphi)
dx + \int(\delta\varphi)^{2} dx [/tex]

Now, this is where I am stuck. I need to basically get the expression:

[tex]f(\varphi + \delta\varphi) / \delta\varphi[/tex], right? How can
I go from the expression above to that step?

I would really appreciate your help on this. I have been stuck for a
while now...

Cheers,
Luca

James Burns

unread,

Jan 13, 2011, 7:04:49 PM1/13/11

to

Luca wrote:
> Hello,
>
> I have asked this question on another forum where the equations are
> rendered. Please see here:
>
> http://physicsforums.com/showthread.php?t=463278

I found this very helpful. I had a hard time reading Tex.
That may not be true of others, though.

>
> I need a bit of help with finding the first variation of an expression
> that I have come up with in my problem. I am working on an image
> processing algorithm where I minimise a cost function using a
> conjugate gradient optimisation scheme and hence need to calculate the
> gradient of the cost function. I never took variational calculus in
> school, so am having a bit of trouble.
>
> So, my cost function is as follows:
>
> [tex]f(\varphi) = \int[(T(x) - S(u(\nabla\varphi)]^{2} dx [/tex]
>
> Here, 'x' are just the 3-D spatial coordinates and the expression
> above translates to the sum of square difference between an image (T)
> and a transformed image (S). The transformation map is given by
> [tex]u(\nabla\varphi)[/tex].
>
> So what I tried to do was compute the first variation for this
> expression with respect to [tex]\varphi[/tex]. So, my attempt is as
> follows. I looked up some information that I could find on the
> internet regarding this and did the following:
>
> Let, [tex]A=[(T(x) - S(u(\nabla\varphi)][/tex], then introducing a
> perturbation in [tex]\varphi[/tex], we have:

Here is the beginning of your problem. A() is a function of
phi and various derivatives of phi. When you vary phi, the derivatives
of phi need their own delta phi' (sorry about the notation).
Then you can expand, as you did, and collect terms. All of
the terms first-order in phi or any of its derivatives is
what you want. (Zero-th order cancels, higher orders are neglected.)

You say (below) that you want basically want to get to the
expression f(g + dg)/dg. Nope. What you want is
int {big mess, no deltas}*dg dx
where dg is quick and dirty "delta phi" of course.

The essential point of calculus of variations is that
if that integral is zero for every variation function dg,
then the big mess in curly brackets is identically zero.

What you have after expanding the "delta's" above is
an integral with BigMess1*dg + BigMess2*(d(dg)/du) + etc.
You get rid of the derivatives of "delta's" by integrating
by parts. This may intruduce boundary conditions that
need to be satisfied. As in, for

INT [mess*D(dg)] dx = [mess*dg] - INT [D(mess)*dg] dx

"mess*dg" should = zero evaluted at the integrals limits

I've got to go now, but maybe this is a start. I expect
you'll get other help, including correction to my "help"
if I goofed anywhere.

Jim Burns

Luca

unread,

Jan 13, 2011, 7:12:46 PM1/13/11

to

Oh wow... I guess I missed a lot. I will need to rethink this. I am
not sure I understood all of your explanation but I will brood over it
tonight and try to figure it out in the morning.

Many thanks for replying,

Luca

Ray Vickson

unread,

Jan 14, 2011, 12:23:40 PM1/14/11

to

On Jan 13, 2:57 pm, Luca <luca.pampar...@gmail.com> wrote:
> Hello,
>
> I have asked this question on another forum where the equations are
> rendered. Please see here:
>
> http://physicsforums.com/showthread.php?t=463278
>
> I need a bit of help with finding the first variation of an expression
> that I have come up with in my problem. I am working on an image
> processing algorithm where I minimise a cost function using a
> conjugate gradient optimisation scheme and hence need to calculate the
> gradient of the cost function. I never took variational calculus in
> school, so am having a bit of trouble.

Just go to the library and take out a book on the subject. Looking at
the first couple of chapters should help a lot.

>
> So, my cost function is as follows:
>
> [tex]f(\varphi) = \int[(T(x) - S(u(\nabla\varphi)]^{2} dx [/tex]
>
> Here, 'x' are just the 3-D spatial coordinates and the expression
> above translates to the sum of square difference between an image (T)
> and a transformed image (S). The transformation map is given by
> [tex]u(\nabla\varphi)[/tex].
>
> So what I tried to do was compute the first variation for this
> expression with respect to [tex]\varphi[/tex]. So, my attempt is as
> follows. I looked up some information that I could find on the
> internet regarding this and did the following:
>
> Let, [tex]A=[(T(x) - S(u(\nabla\varphi)][/tex], then introducing a
> perturbation in [tex]\varphi[/tex], we have:
>
> [tex]f(\varphi + \delta\varphi) = \int[A + \delta\varphi]^{2} dx [/
> tex]

This looks wrong. I'll us p instead of varphi and e instead of
epsilon. Let h be a "perturbation function", so the new argument of f
is p + e*h. Here, e is a small parameter, that we will ultimately
allow to --> 0. We have
f(p+e*h) = int [T(x) - S(u*grad(p + e*h))] dx.

Now, the nature of the expressions you get subsequently depend on a
lot of information you have not told us. Is u*grad(p) a scalar product
of a vector u and the vector grad(p)? That is, S is a univariate
function? Is u a scalar and so S is a multivariate function? I am
going to ASSUME u*grad(p) is a scalar product and S is univariate. For
clarity, I will write the scalar product of two vectors 'a' and 'b' as
<a,b>. Then f(p + e*h) = int [T(x) - S(<u,grad(p)+e*grad(h)>)]^2 =
int{ [T(x)-S(<u,grad(p)>)]^2 dx - 2*S'(<u,grad(p)>)*e* <u,grad(h)> }dx
= f(p) - 2*e*int T(x)*S'(<u,grad(p)>)*<u,grad(h)> dx, so the first
variation (the coefficient of e) is
delta f(p;h) = - 2*int[T(x) -
S(<u,grad(p)>)]*S'(<u,grad(p)>)*<u,grad(h)> dx. This is essentially
the directional derivative of f(p) in the direction h.

If you had some other meaning in mind for S and for u*grad(p), you can
just change the necessary parts of the above derivation.

>
> Now expanding, I have:
>
> [tex] f(\varphi + \delta\varphi) = \int(A^{2})dx + 2\int(A\delta
> \varphi) dx + \int(\delta\varphi)^{2} dx [/tex]
>
> which is the same as
>
> [tex]f(\varphi + \delta\varphi) - f(\varphi) = 2\int(A\delta\varphi)
> dx + \int(\delta\varphi)^{2} dx [/tex]
>
> Now, this is where I am stuck. I need to basically get the expression:
>
> [tex]f(\varphi + \delta\varphi) / \delta\varphi[/tex], right?

No. delta \varphi is a _function_ and f is a functionAL, not a
function. You need to look at directional derivatives, such as the one
I derived above. Generally speaking, a notion such as "derivative"
must be re-defined and used a bit differently when you are looking at
functionals rather than functions.

R.G. Vickson

Luca

unread,

Jan 16, 2011, 5:28:05 AM1/16/11

to

u is a functional here. So, it is not a scalar product but a
transformation function (a spatial mapping function). So, in my image
processing example T(x) gives the intensity in image 'T' at spatial
position x. and S(u(grad(x)) gives the intensity at position
u(grad(x)) where u(grad(x)) is a mapping functional.

Luca

unread,

Jan 16, 2011, 5:33:14 AM1/16/11

to

Sorry, I meany u(grad(phi)) rather than x.

Lynne Vickson

unread,

Jan 16, 2011, 6:52:21 PM1/16/11

to

> > > f(p+e*h) = int [T(x) - S(u*grad(p + e*h))]^2 dx.

>
> > > Now, the nature of the expressions you get subsequently depend on a
> > > lot of information you have not told us. Is u*grad(p) a scalar product
> > > of a vector u and the vector grad(p)?
>
> > u is a functional here. So, it is not a scalar product but a
> > transformation function (a spatial mapping function). So, in my image
> > processing example T(x) gives the intensity in image 'T' at spatial
> > position x. and S(u(grad(x)) gives the intensity at position
> > u(grad(x)) where u(grad(x)) is a mapping functional.
>
> Sorry, I meany u(grad(phi)) rather than x.

So, you have f(p) = int [T(x) - S(u(grad(p)))]^2 dx. The integrand at f
+e*h is [T(x) - S[u(w + e*z)]^2, where I have written w = grad(p) and
z = grad(h). Now, to first order in small |e| we have u(w + e*z) =
u(w) + e*<z,U(w)>, where U = grad(u). Thus, S[u(w+e*z)] = S[u(w)] +
e*S'[u(w)]*<U(w),z>, so the integrand = [T(x)-S(u(w))]^2 - 2*[T(x) -
S(u(w))]*e*S'[u(w)]*<U(w),z>, hence the first variation of f is
Df(p;h) = -2*int [T(x) - S(u(w))]*S'(u(w))*<U(w),z> dx. I am assuming
that phi (= p) is a function of x---you have not said so---in which
case we should write w(x) and z(x).

We can say a bit more: at a minimum, the first variation must = 0 for
all perturbations h Therefore, if we write h = H(x1) (function of x1
alone, not of x2, x3, ..., xn) we have z = grad h = (H'(x1),
0,0,...,0), hence <U(x),z> = U_1(x)*H'(x1). Then Df(p;h) = -2*int
[T(x)-S(u(w))]*S'(u(w))*U_1(x)*H'(x1) dx = 0, for any function H1(x).
Depending on boundary conditions, etc, we may be able to simplify
further: if the integral is over all R^n, then we need that
int_{x2,x3,...,xn} [T(x)-S(u(w(x)))]*S'(u(w(x)))*U_1(x) dx2 dx3 ...
dxn = 0 for all x1. You get similar conditions by choosing h = H(x2)
or h = H(x3), ..., h = H(xn). I don't know if these conditions
simplify further; I have my doubts about that.

R.G. Vickson

Luca

unread,

Jan 17, 2011, 6:11:35 AM1/17/11

to

Many thanks for your reply again!

> So, you have f(p) = int [T(x) - S(u(grad(p)))]^2 dx. The integrand at f
> +e*h is [T(x) - S[u(w + e*z)]^2, where I have written w = grad(p) and
> z = grad(h). Now, to first order in small |e| we have u(w + e*z) =
> u(w) + e*<z,U(w)>, where U = grad(u).

Could you explain this step. How did you get:

u(w + e*z) = u(w) + e*<z,U(w)>

Many thanks again. I really appreciate your help.

Luca

unread,

Jan 17, 2011, 11:43:14 AM1/17/11

to

Because I was thinking that maybe through the use of Taylor series,
one could write something like:

u(w+e*z) = u(w) + u'(w)*<e, z> where z = grad(h)

But I am guessing that is not correct...

Luca