Although "sgn(x)" as you call it is not continuous,
it does have some nice properties (e.g. having
bounded total variation).
The function abs(x) = |x| is continuous and differentiable
everywhere except at x = 0.
The theory of integration for functions like "sgn(x)"
works well and up to an additive constant, the
indefinite integral of "sgn(x)" is abs(x). So the
Fundamental Thm. of Calculus is in good shape here!
hope this helps, chip
Perhaps because it only has a problem on a set of measure zero.
As a first approximation you can often replace things by the average of
their right and left continuous versions. Magically the usual version of sgn
appears.
On Nov 24, 8:15 pm, Gordon Sande <g.sa...@worldnet.att.net> wrote:
> On 2009-11-24 21:00:09 -0400, aruzinsky <aruzin...@general-cathexis.com> said:
>
> > Despite widespread pretense that d |x| / dx = sgn(x) for -oo<x<oo, d |
> > x| / dx is undefined at x = 0. How do so many people get away with
> > this?
>
> Perhaps because it only has a problem on a set of measure zero.
>
> As a first approximation you can often replace things by the average of
> their right and left continuous versions. Magically the usual version of sgn
> appears.
Not when sgn(x) is used in iterative methods where the sought solution
and non-solutions occur at sets of xi = 0. Such an iterative process
can stick at a wrong set of xi = 0. For example, see
http://www.duke.edu/~sf59/SRfinal.pdf , Eq. 22 which is falsely
alleged to solve Eq. 21. I say "falsely" because, if steepest descent
worked reliably here, it would be used to solve general Linear
Programming problems.
For similar reasons, I suspect that a discrete iterative method
arising from the PDE, http://www.general-cathexis.com/images/TVPDE.png,
to be invalid but I am uncertain whether grad(ui) = 0 occurs at
solutions and/or non-solutions.
You seem to have changed the question. Nonsmooth minimization has lots of
problems and there are many papers for various variations. Probably there
will be many more too come as there are many ways to not be smooth.
How did I change the question? Do you have anything to say about
http://www.general-cathexis.com/images/TVPDE.png ?
It is the result of a very rigororous field of math called
Distribution Theory
invented by Schwartz. DT is also known as Generalized Function Theory
named by Lighthill. I recall that Lighthill's book was understandable
in
my 1st year grad school Applied Math.
DT justifies, in a rigorous way, the cavalier use of derivatives of
the unit step function.
There are enough keywords in the 2nd paragraph to get a serious
search
underway.
Hope this helps.
Greg
That is the Dirac delta, not absolute value. I wouldn't cavalierly
use Dirac deltas in steepest descent, either.
I have no problems with reading
\frac{\partial u(x,t)}{\partial t} = \mbox{\div} \frac{\nabla u}{||\nabla u||}
as long as boundary and initial conditions guarantee that u has no
stationary points in the region. but even then, the equation could make quite well sense
if meant in the weak (L2) sense. (that means the author intended it as shorthand
notation)
as others also I have problems with your question:
for the theory of PDE jump discontinuities pose no problems as long as
you stay in L2 or even distribution theory.
for a practical solution, this is quite different:
first order hyperbolic pde's can be solved also numerically with jump discontinuities
in the initial and boundary conditions as long as you use appropriate
solvers, e.g. Riemann solvers and as long as you can satisfy stability criteria
but I cannot see what role a gradient method for linear systems solution should
play here. do you think about an implicit discretization scheme? and if, why at all?
In the finite dimensional scheme, if you have a function minimization with
a nonsmooth function, then indeed you never should use gradient based methods
since almost for sure they will stick up at a nonoptimal point. but there
are meanwhile (even publicly available) methods for dealing with
the minimization of Lipschitz continuous functions (like the abs(.))
see
http://plato.asu.edu/guide.html
and if nothing works, you might think about smoothing methods
for abs(x) approx= (log(2)+log(1+cosh(a*x)))/a for a "large"
or even sign(x) approx = tanh(a*x)
hth
peter
Why do you assume I am familiar with this notation?
> as long as boundary and initial conditions guarantee that u has no
> stationary points in the region. but even then, the equation could make quite well sense
> if meant in the weak (L2) sense. (that means the author intended it as shorthand
> notation)
> as others also I have problems with your question:
> for the theory of PDE jump discontinuities pose no problems as long as
> you stay in L2 or even distribution theory.
> for a practical solution, this is quite different:
> first order hyperbolic pde's can be solved also numerically with jump discontinuities
> in the initial and boundary conditions as long as you use appropriate
> solvers, e.g. Riemann solvers and as long as you can satisfy stability criteria
> but I cannot see what role a gradient method for linear systems solution should
> play here. do you think about an implicit discretization scheme? and if, why at all?
> In the finite dimensional scheme, if you have a function minimization with
> a nonsmooth function, then indeed you never should use gradient based methods
> since almost for sure they will stick up at a nonoptimal point.
That last part is sort of what I already said. The question whether
the cause of failure is discontinuous or undefined derivatives seems
moot.
> but there
> are meanwhile (even publicly available) methods for dealing with
> the minimization of Lipschitz continuous functions (like the abs(.))
> seehttp://plato.asu.edu/guide.html
> and if nothing works, you might think about smoothing methods
> for abs(x) approx= (log(2)+log(1+cosh(a*x)))/a for a "large"
> or even sign(x) approx = tanh(a*x)
>
You have wrongly assumed that this is my problem, It is not my
problem, i.e., I already know how to do proper L1 norm minimization,
and I find your insuation otherwise offensive. The question that you
failed to address was, "How do so many people get away with this?" In
fact, http://www.duke.edu/~sf59/SRfinal.pdf , and many other papers
are wrong because "if you have a function minimization with a
nonsmooth function, then indeed you never should use gradient based
methods since almost for sure they will stick up at a nonoptimal
point."
You and others who are aware of this either are
1. unaware of such published mistakes.
or
2. remiss as referees by passing such mistakes in peer reviewed
papers.
or
3. failed to write letters to publishers correcting mistakes after the
papers are published.
Why, it is almost as if someone published 2 + 2 = 5, and then many
other authors copied and published that mistake while people who knew
better allowed it to happen. Just think how much social chaos that
would cause.
Now that you know, why don't you write a letter to the publisher?
And, seek out similar papers and write letters to their publishers?
You missed the point.
sgn(x) = 2*u(x) - 1
D( sgn(x) )/Dx = 2*d(x)
abs(x) = x*sgn(x)
D( abs(x) )/Dx = 1*sgn(x) + x*2*d(x)
= sgn(x)
> I wouldn't cavalierly use Dirac deltas in steepest descent,
> either.
Correct; neither cavalierly nor civilianly.
However, the magician astounds all:
He reaches in a hat full of PDEs and absolute values
and pulls out a steepest descent.
Amazing!
Hope this helps.
Greg
>You have wrongly assumed that this is my problem, It is not my
>problem, i.e., I already know how to do proper L1 norm minimization,
>and I find your insuation otherwise offensive. The question that you
>failed to address was, "How do so many people get away with this?" In
>fact, http://www.duke.edu/~sf59/SRfinal.pdf , and many other papers
>are wrong because "if you have a function minimization with a
>nonsmooth function, then indeed you never should use gradient based
>methods since almost for sure they will stick up at a nonoptimal
>point."
>
>You and others who are aware of this either are
>
>1. unaware of such published mistakes.
>
>or
>
>2. remiss as referees by passing such mistakes in peer reviewed
>papers.
>
>or
>
>3. failed to write letters to publishers correcting mistakes after the
>papers are published.
>
>Why, it is almost as if someone published 2 + 2 =3D 5, and then many
>other authors copied and published that mistake while people who knew
>better allowed it to happen. Just think how much social chaos that
>would cause.
>
>Now that you know, why don't you write a letter to the publisher?
>And, seek out similar papers and write letters to their publishers?
>
aaaah, no I see the sense behind your original question, but you had better
wrote in your initial contribution what you wrote here.
Concerning the paper you pointed too: yes, there are errors
(the gradient of the l1-norm is indeed quite funny)
but due to the regularization terms used this seemingly had no severe influence.
(although I found the results there not quite impressive)
well, maybe I am too happy to read only journals with a high quality refereeing
process and detect errors of this level not too often.
If you feel disappointed by such, then you should not address this to this
group which cannot hinder this.
you have two possibilities:
1. write to the authors and ask for explanation.
maybe you get no answer.
2. write a paper "remark on the paper.... of ... " there you point out
the error and show that you can do better.
if this is really better, then such a paper will be published,
possibly after the editor did contact the authors of the wrong
statement giving them a chance for correction (which results in another paper,
and indeed, there are too many papers around)
you should be assured that as a referee and associate editor I do my best
to avoid the publication of nonsense, and this takes a lot of my time.
best wishes
p. spellucci
There is no such word as "civilianly" but I will civilly tell you that
sgn(x) can be replaced by an infinite number of functions f(x) in your
equations satisfying.
f(x) = -1, x<0
f(x) = 1, x>0
f(x) = C, x=0, -oo<C<oo
and, yet, using different C would likely affect the stationary points
in the aforementioned steepest descent. And, let's make it perfectly
clear that I am the least magician like here.
How "seemingly"? There are no exact minimum L1 norm results to
compare. For all you know, the author's false solutions are MORE
visually appealing than the exact solutions. But, then he would have
a very hard time explaining it, huh?
> (although I found the results there not quite impressive)
>
> well, maybe I am too happy to read only journals with a high quality refereeing
> process and detect errors of this level not too often.
Your greater concern should be correcting mistakes in applied
research, whereby those mistakes are likely to manifest in your
everyday life, e.g., cars and appliances, not to mention global
warming, and bite you in the ass.
> If you feel disappointed by such, then you should not address this to this
> group which cannot hinder this.
Why can't you hinder this? This is only a specific instance of a more
general problem.
> you have two possibilities:
> 1. write to the authors and ask for explanation.
> maybe you get no answer.
Done; no answer.
> 2. write a paper "remark on the paper.... of ... " there you point out
> the error and show that you can do better.
> if this is really better, then such a paper will be published,
> possibly after the editor did contact the authors of the wrong
> statement giving them a chance for correction (which results in another paper,
> and indeed, there are too many papers around)
My methods are commercially proprietary so I can't show that I can do
it better.
> you should be assured that as a referee and associate editor I do my best
> to avoid the publication of nonsense, and this takes a lot of my time.
> best wishes
> p. spellucci- Hide quoted text -
>
> - Show quoted text -
I have only refereed in IEEE Trans. Acoust, Speech, and Signal
Processing and had no problems with the editors. However, most of the
other referees I've encountered were sloppy and unethical. Typically,
authors who previously published in the journal were asked to
referee. The most prolific authors/referees spent most of their free
time rewriting minor variations of the same papers and submitting them
to different journals at the same time. Then, as referees, they
delayed the publication process by about 2 years and were sloppy. On
the publication end, their chance of success was unfairly increased.
That is why many prolific authors are evil, i.e, cause social chaos.
You may mean. -1 <= C <= 1.
However, even then it is not true.
It depends on the application. For example, the Fourier Series
representation of any piecewise continuous function converges
to the average of the right and left hand limits.
I cannot think of any uniform approximator for which this is not true.
> and, yet, using different C would likely affect the stationary points
> in the aforementioned steepest descent. And, let's make it perfectly
> clear that I am the least magician like here.
Magicians perform sleight of hand, not magic. T
Transforming a thread with a title concerning PDEs and absolute
values to one concerning the inappropriate use of steepest descent
to minimize a discontinuous objective is a pretty good trick.
Don't sell yourself short.
Greg
No, I don't.
f(x) = 2*u(x) - 1 for x!=0
f(x) = 2*u(x) - 1 + C for x=0. -oo<C<oo
D( f(x) )/Dx = 2*d(x)
abs(x) = x*f(x) for x!=0
abs(x) = x*(f(x) + C) for x = 0, -oo<C<oo
D( abs(x) )/Dx = 1*f(x) + x*2*d(x)
= f(x)
As I said before, I believe ( I took the course 45 years ago!)
both Schwartz and Lighthill require the values of piecewise
continuous functions to be evaluated as the average of the
left and right hand limits. Therefore, your 2nd equation
should be replaced by
f(x) = (-1+1)/2 for x = 0.
Hope this helps,
Greg
One should not aggressively assert irrelevant arbitrary definitions,
especially from 45 year old memory.
If you look at http://en.wikipedia.org/wiki/Gradient_descent,
"Gradient descent is based on the observation that if the real-valued
function is defined and differentiable in a neighborhood of a
point ...". If you believe that abs(x) is defined and differentiable
at x = 0, what would be your rationale to not use abs(x) in steepest
descent (You previously implied that you wouldn't by calling the
author a "magician")?
One should not aggressively denigrate relevant information just
because you cannot understand it.
> If you look athttp://en.wikipedia.org/wiki/Gradient_descent,
> "Gradient descent is based on the observation that if the real-valued
> function is defined and differentiable in a neighborhood of a
> point ...". If you believe that abs(x) is defined and differentiable
> at x = 0, what would be your rationale to not use abs(x) in steepest
> descent (You previously implied that you wouldn't by calling the
> author a "magician")?
If you look at http://groups.google.com/group/sci.math.num-analysis
you will see that my posts refer to the OP
START QUOTE
Despite widespread pretense that d |x| / dx = sgn(x) for -oo<x<oo,
d |x| / dx is undefined at x = 0. How do so many people get away
with this?
END QUOTE
I have given you references that rigorously justify the extension
of differentiation to include d |x| / dx = sgn(x). Hence, I have
answered
your OP with relevant information.
Your subsequent magical escalation to the assumption that I would use
or recommend the extension to include numerical optimization via
steepest descent is self-generated delusion.
Greg Heath
You are a liar in stating I made the assumption that you would "use or
recommend the extension to include numerical optimization via
steepest descent" because I clearly asked you why you would not:
> > If you believe that abs(x) is defined and differentiable
> > at x = 0, what would be your rationale to not use abs(x) in steepest
> > descent (You previously implied that you wouldn't by calling the
> > author a "magician")?
which you have failed to answer.
Go back and read my posts more carefully. If you need help in
understanding them, ask your 3rd grade teacher because I am
outta here.
G. Heath
Such an emotional display over semantics indicates that you are
insane. A sane person wouldn't care so much that I called steepest
descent a "PDE". Apparently, in your mind, my usage is a travesty
much worse than you lying "Hope this helps" in your posts.