Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Fisher Information - Frieden unification Of Physics

53 views
Skip to first unread message

james mchugh

unread,
Oct 5, 1999, 3:00:00 AM10/5/99
to
In article <37f7fae4...@news.btinternet.com>, james....@btinternet.com
says...
>
>In regards to Roy Frieden book "Physics from Fisher Information" has
>there been any futher developments in this field or any criticism on
>his theory by any others?

any comments anyone?


Christopher Tong

unread,
Oct 6, 1999, 3:00:00 AM10/6/99
to

There is a book review by TWB Kibble in Contemporary Physics,
vol. 40, p. 357 (1999). The upshot of the review is that there
are many valid
applications, especially in statistical mechanics, but trying
to derive "all" of physics from "the principle of extreme
physical information" is too ambitious and "misguided".

tonyC

unread,
Oct 7, 1999, 3:00:00 AM10/7/99
to
In article <
7td9db$jvj$3...@willow.cc.kcl.ac.u
k>,
james....@kcl.ac.uk

(james mchugh) wrote:
> In article <37f7fae4...@news.btinternet.com>, james....@btinternet.com
> says...
> >
> >In regards to Roy Frieden book "Physics from Fisher Information" has
> >there been any futher developments in this field or any criticism on
> >his theory by any others?
>
> any comments anyone?
>
>

Well, I can't afford the book just
for play, but what i've seen on
the net [very little of substance
btw] i can only wonder why this
universal panacea, touted as the
TOE/master thy/problem solver
to unify all physics it seems, has
been around a pretty long time,
and has gotten almost no pop
press. The little mentioned 'bout
the basic premise seems to talk
of stat thy and "expectation"-
like values etc etc. I wonder if it's
all simply some advanced stat
approach clothed in new garb
language, and a very small dash
of new age philosophy/
metaphysics to justify general
hypothesis tests and problem
solving. 'Course it'snot fair of me
to say this w/o reading the book,
but I am jes speculating based on
why more physicists haven't
come out on this more widely.
????????????? ....tonyC


Sent via Deja.com http://www.deja.com/
Before you buy.


Chris Hillman

unread,
Oct 9, 1999, 3:00:00 AM10/9/99
to

On 6 Oct 1999, Christopher Tong wrote:

> There is a book review by TWB Kibble in Contemporary Physics,
> vol. 40, p. 357 (1999). The upshot of the review is that there
> are many valid
> applications, especially in statistical mechanics, but trying
> to derive "all" of physics from "the principle of extreme
> physical information" is too ambitious and "misguided".

Just thought I'd add that a much older information theoretic approach to
statistical mechanics, due to Jaynes, has the advantage of being
nonparametric (doesn't assume any model) and IMO has a clearer theoretical
rationale for the interpretation of the "entropies" defined in that
theory.

Chris Hillman

Home Page: http://www.math.washington.edu/~hillman/personal.html

Barry Adams

unread,
Oct 10, 1999, 3:00:00 AM10/10/99
to
On 7 Oct 1999 13:40:45 GMT, tonyC <gk...@my-deja.com> wrote:

[Text reformatted to longer lines]

>Well, I can't afford the book just for play

I could, since leaving university and entering the bad old world of
work i've been eating up popularisations and deeper books about physics.
(I'm looking forward to reading Julian Barbour's End of Time). Frieden's
book is not deserving of some of the bad press it gets. It certainly
sound physics and Frieden spends a lot of time rederiving know physics
using the EPI principle. In particular he rederives the Maxwell's,
Dirac's, Klein-Gordons and Schroedinger's, equation(s), plus the
Einstein field equations some statistical physics and finally the
Free-wave Wheeler DeWitt equation (the quantum equation for
gravitational waves). Any book that covers that much physics in 300
pages, has got to be worth recommending to third year physics
students, even if the principle doesn't turn out to be useful for
research (and i think it might).

>but what i've seen on the net [very little of substance btw]
>i can only wonder why this universal panacea, touted as the
>TOE/master thy/problem solver to unify all physics it seems

That isn't really what it is. What it is a quick and clear way of
cooking up Langrangians for particular problems, its not going to
instantly give us the TOE, someone still has to describe a model
of reality and describe what degrees of freedom a field or whatever
has. But you can then feed that model into Frieden's method and
pop out a languagian or a field equation for the model (but not
in an ad hoc wayl). As an aside, it might be interesting to see how
Frieden's Method compares/contrasts or works with the renormalizion
group.

>has been around a pretty long time,
>and has gotten almost no pop press.

I know a lot of the Maths (like the Cramer-Rao Inequalities) has
been around for a while. I think the Principle Extreme Physical
Information (EPI) is new to Frieden (I could be wrong there). But like
say Chaos, a theory can bubble around for a bit, before it becomes
well known.

>The little mentioned 'bout
>the basic premise seems to talk of stat thy and "expectation"-
>like values etc etc. I wonder if it's all simply some advanced stat
>approach clothed in new garb language

It's certainly similar to statistical mechanics, and at one point he
derives EPI from statistical mechanics, while at another from
the Fokker-Planck Equations.

>and a very small dash of new age philosophy/metaphysics

He's a bit more instrumentationalist than is my taste, but that view
point does fit his method.

> to justify general hypothesis tests and problem solving.
> 'Course it'snot fair of me to say this w/o reading the book, but I am
>jes speculating based on why more physicists haven't come out on
> this more widely.

I don't know either, maybe the bad press stopped them from reading
it, or they have misgiving that they can't justify well enough to come
out with. I've seen nothing that shows it to be deeply wrong, or
completely uninteresting, so i'd have to ask a few physicists to
take the time to read and digest the book.

Barry Adams


Stephen Paul King

unread,
Oct 11, 1999, 3:00:00 AM10/11/99
to
[Moderator's note: Quoted text trimmed and moved to top. -P.H.]

tonyC <gk...@my-deja.com> wrote:

>In article <7td9db$jvj$3...@willow.cc.kcl.ac.uk>,


> james....@kcl.ac.uk
>(james mchugh) wrote:
>> In article <37f7fae4...@news.btinternet.com>, james....@btinternet.com
>> says...
>> >
>> >In regards to Roy Frieden book "Physics from Fisher Information" has
>> >there been any futher developments in this field or any criticism on
>> >his theory by any others?
>>
>> any comments anyone?

Dear Tony,

Do you have inter-library loan services available? Get the
book from the library. I bought the book and read it; Frienden does
not use metaphysics to justify his work! Perhaps it is the fact that
Frieden is not part of the "elite" and not using the doctrinal
"standard" paradigm that his work is so easily dismissed!

Kind regards,

Stephen

http://members.home.net/stephenk1/outlaw/outlaw.html


Chris Hillman

unread,
Oct 11, 1999, 3:00:00 AM10/11/99
to
On 7 Oct 1999, tonyC wrote:

> > >In regards to Roy Frieden book "Physics from Fisher Information" has
> > >there been any futher developments in this field or any criticism on
> > >his theory by any others?
> >

> Well, I can't afford the book just

> for play, but what i've seen on


> the net [very little of substance
> btw] i can only wonder why this
> universal panacea, touted as the
> TOE/master thy/problem solver

> to unify all physics it seems, has


> been around a pretty long time,
> and has gotten almost no pop
> press.

Actually, its gotten a -lot- of pop press. The actual scientists have
been much more sceptical.

There was a lot of (critical) discussion of this book in
sci.physics.research some months back, IIRC. You might try to find those
posts using DejaNews.

Vesselin G Gueorguiev

unread,
Oct 12, 1999, 3:00:00 AM10/12/99
to

Barry Adams wrote:

> [...] rederiving know physics


> using the EPI principle. In particular he rederives the Maxwell's,
> Dirac's, Klein-Gordons and Schroedinger's, equation(s), plus the
> Einstein field equations some statistical physics and finally the
> Free-wave Wheeler DeWitt equation (the quantum equation for
> gravitational waves). Any book that covers that much physics in 300
> pages, has got to be worth recommending to third year physics
> students, even if the principle doesn't turn out to be useful for
> research (and i think it might).

Sounds interesting, really. If not for something else at least to
take a different angle of view and look at these things.

[....]

> I don't know either, maybe the bad press stopped them from reading
> it, or they have misgiving that they can't justify well enough to come
> out with. I've seen nothing that shows it to be deeply wrong, or
> completely uninteresting, so i'd have to ask a few physicists to
> take the time to read and digest the book.

I don't know about the book but there few articles in Phys. Rev., so I
guess some have spend time to study this ideas. I have not read them,
I just saved the references. May be some day I will get the chance to
read them.

"Lagrangians of physics and the game of Fisher-information transfer",
B.R.Frieden & B.H.Soffer,Phys. Rev. E52, 2274-2286 (1995)

"Foundation for Fisher-information-based derivations of physical laws",
B.R.Frieden & W.J.Cocke,Phys. Rev. E54, 257-260 (1996)

"Limitation on entropy increase imposed by Fisher information",
B. Nikolov and B. R. Frieden,Phys. Rev. E 49, 4815-4820 (1994)

"Spectral 1/f noise derived from extremized physical information",
B.R.Frieden & R.J.Hughes, Phys. Rev. E49, 2644-2649 (1994)

"Fisher information, disorder, and the equilibrium distributions of physics",
B. R. Frieden, Phys. Rev. A 41, 4265-4276 (1990)


Chris Hillman

unread,
Oct 15, 1999, 3:00:00 AM10/15/99
to
On 10 Oct 1999, Barry Adams wrote:

> Frieden's book is not deserving of some of the bad press it gets. It

> certainly sound physics and Frieden spends a lot of time rederiving


> know physics using the EPI principle. In particular he rederives the
> Maxwell's, Dirac's, Klein-Gordons and Schroedinger's, equation(s),
> plus the Einstein field equations some statistical physics and finally
> the Free-wave Wheeler DeWitt equation (the quantum equation for
> gravitational waves).

[snip]

> What it is a quick and clear way of cooking up Langrangians for
> particular problems,

Everyone who has visited my entropy pages knows what I am a big fan of
information theory and the nonparametric statistics inspired by Jaynes's
principle of maximal entropy, which has been spun off into such
generalizations or variations as the principle of minimal discrimination
or the principle of minimal description length. So no-one would be more
delighted than I if Frieden has really discovered what he claims:

1. a coherent -method- for guessing (or even -computing-?!!) the right
Lagrangian for a given theory or problem,

2. a profound explanation for why the most useful Lagrangians in physics
tend to share certain characteristics (if IRC, he means a quadratic term),
in terms of information theory or at least the related idea of Fisher
information.

When I first heard about his work, I rushed to download the five or six I
could find freely available on the InterNet, eager to learn more about
this intriguing work. Unfortunately, I quickly found that none of these
papers seemed to contain a clear explanation of exactly what the alleged
"method" and "explanation" really are. Even worse, nowhere could I find a
convincing statement of just what the alleged "principle" -is-. I had the
impression that the alleged "method" is in fact ad hoc, essentially no
better than the current wisdom that "Lagrangians are good for you", plus
somewhat ad hoc methods for writing down the "right" one for each
situation. At the time, I expressed my disappointment and my suspicion
that there really wasn't much to Frieden's work.

If my (unmet) demand that Frieden be able to explain his idea clearly and
simply seems excessive, permit me to say that I speak as someone who has
had many ideas and can generally boil them down to a paragraph or two. And
I think history shows (particularly the example of Shannon and Jaynes)
that the best ideas are the simplest, although the most unexpected and
original, and that these are precisely the ideas which can be expressed
most succintly. Particularly if you are allowed to use mathematical
equations.

But maybe Frieden and his coauthors are simply inexperienced writers.
Hence my question: can you summarize in a paragraph or two what the
"method" and the "explanation" boil down to?

> Any book that covers that much physics in 300 pages, has got to be
> worth recommending to third year physics students,

I don't think that's necessarily true. If Frieden has a single coherent
idea which third year physics students can use to derive the right
Lagrangians for all those subjects you listed as -homework problems-, you
might have a point. But if his idea is not coherent or even worse, is
comparatively vapid but draws attention away from the really deep stuff in
those subjects, I think it could be actively -harmful- to teach it.
Superficial "unifications" are as inadvisable in pedagogy as in science.
Average students may not realize they are being had (perhaps acceptable)
but the best students will sense something is wrong, and lose confidence
in their instructor, their textbook, and their school.

Coming back to "it's certainly sound physics", I don't that's really the
issue. I haven't heard anyone say that Frieden made sign errors in
equation (10.2) or that he badly misunderstood the meaning of the
Wheeler-DeWitt equation. I think the issue is whether EPI really means
much of anything. I don't think the issue is even whether EPI could be a
"false messiah" (if that were a danger, people like me would probably be
very intrigued and spend months or years studying this stuff, instead of
putting it aside after a quick sniff).

If my unfavorable impressions from looking over Frieden's papers are
wrong, though, I'd like you to convince me of that.

> I know a lot of the Maths (like the Cramer-Rao Inequalities) has
> been around for a while.

The Fisher information involves parametric statistics. Roughly speaking
that means assuming particular multiparameter families of densities, and
finding the one which best fits the data, perhaps using some maximal
likelihood criterion of the problem is underdetermined. This is perhaps
not unlike the minisuperspaces John Baez has been discussing. The Shannon
entropy leads to nonparametric (Bayesian) statistical applications, i.e.
no particular family of density is assumed (although some "prior" may be;
each style of statistics has its own characteristic flaws). Here is a
question for John, if he gets around to studying Frieden's book: can the
Fisher information in Frieden's work be replaced by Shannon information,
and does this then lead to a superspace?

IOW, if there's anything to Frieden's work, my hunch is that we should
have the following analogies:

parametric statistics nonparametric statistics
maximal likelihood(?) maximal entropy
minisuperspaces superspace
Fisher information Rokhlin metric?
Riemannian geometry Finsler geometry?

Re the last line: I believe the original idea of statistical manifolds
(part of parametric statistics) appeared in a short paper by Bradley
Effron of Stanford, and there is at least one book by Amari on this stuff.
IIRC, the Fisher information matrix appears as the metric tensor, and that
then the Guassian curvature (for a two parameter model) is some kind of
entropy. On the right side, the Rokhlin metric is

H(A/B) + H(B/A)

(sum of conditional entropies) which is a true metric if you mod out by
zero entropy stuff, much like what you do in forming L^p spaces.

> I think the Principle Extreme Physical Information (EPI)

Yes, yes, but what -is- the EPI?! I can summarize Jaynes' PME (principle
of maximal entropy) in one paragraph (ask me!), but why is that noone
seems able to sum up EPI in a paragraph or two? Prove me wrong and I'd be
really happy.

> It's certainly similar to statistical mechanics, and at one point he
> derives EPI from statistical mechanics, while at another from the
> Fokker-Planck Equations.

But I thought EPI was supposed to explain all these Lagrangians in other
subjects, i.e. to be more fundamental than traditional physics?



> I don't know either, maybe the bad press

Actually, I hear the book reviews (by nonscientists) were raves and that
the book is selling well (probably to nonscientists). The only "bad
press" I know about are the negative comments by myself and Nathan Urban
and a few others who actually looked up some of Frieden's papers and
couldn't get any quick sense of what EPI is all about.

> stopped them from reading it, or they have misgiving that they can't
> justify well enough to come out with.

I for one have been forthcoming with my reservations from the day I first
heard about this stuff, the same day I downloaded those papers, the same
day I posted what I think where well justified reservations.

Incidentally, I wasn't just having a bad day that day, because once or
twice since I have taken a second look and each time I came away with the
same sense that this work is not very coherent.

> I've seen nothing that shows it to be deeply wrong, or completely
> uninteresting, so i'd have to ask a few physicists to take the time to
> read and digest the book.

Well, you've already read the book, and I take it you have some
mathematical and physics background. I have alot of background in
information theory and math (much less in physics), including a full year
graduate course in functional analysis. So I ask again: having read the
book, can you summarize in a paragraph or two, in language I can
understand, what the "method" and the "explanation" boil down to? Can you
tell me in one sentence just what the EPI -is-?

Barry Adams

unread,
Oct 18, 1999, 3:00:00 AM10/18/99
to
On 15 Oct 1999 13:34:36 GMT, Chris Hillman
<hil...@math.washington.edu> wrote:

>Well, you've already read the book, and I take it you have some
>mathematical and physics background. I have alot of background in
>information theory and math (much less in physics), including a full year
>graduate course in functional analysis. So I ask again: having read the
>book, can you summarize in a paragraph or two, in language I can
>understand, what the "method" and the "explanation" boil down to? Can you
>tell me in one sentence just what the EPI -is-?

Ok, I'll have a shot; it's a bit more than a sentence through. You
seem to know Fisher information already, but I'll briefly cover it.
(This is all taken from his book BTW). If we do an experiment to
measure some of values T (normally written Theta but ASCII is lacking in
greek letters), we may consider our noisy results, y, as containing some
noise x:

y = T + x

Fisher information, I is

/
I = | (dp(x)/dx)^2 / p(x) dx
/

where p(x) is the probablity density functional for getting a noise
value x. The mean square error e^2 in our estimates of <T(y)> of
the real value of T obeys

e^2 I >= 1

I is a measure of how much information we may obtain from
a measurement of a system. Frieden introduces J, the bound
information, another functional, describing how much information was
originally in the system. J is where most of the model dependent parts
of the equation go in.

Using probability amplitudes q, q^2(x) = p(x), I may be writen.

/
I = | ( dq(x)/dx ) ^2 dx
/

(x is generally a four vector).

The EPI principle is

functional delta (I - J) = 0

which may be solved using the usual Euler-Lagrange method.

Since the information measured should be some or all of the
bound information.

I = kJ where ( 0<= k <= 1)

The hard part is in finding J for the problem. J is a scalar
functional of all the physical aspects of the problem. Typically
Frieden writes J in some general form. For instance for
classical electrodymanics, with current a vector j, and
charge density rho, he writes

/
J = 4c | d^3 r dt Sum E_n J_n (q,j,rho)
/

He additionally always requires one or more equations
describing the invariance of the problem. So for example
in classical electrodynamics he solves the above to obtain,

d' Alembertian q_n = - k E_n q_n^(2k-1) G_n(j,rho)

Then he introduces the Lorentz invariance of the amplitudes q:

1 dq_4 d q_n
__ ------ + Sum (n=1 to 3) ---------- = 0
c dt dx_n

He takes the equations and after several pages of manipulation he
obtains the usual wave equation for the problem.

Barry Adams

Chris Hillman

unread,
Oct 18, 1999, 3:00:00 AM10/18/99
to
On 15 Oct 1999, Vesselin G Gueorguiev wrote:

> Chris Hillman wrote:

> > I can summarize Jaynes' PME (principle of maximal entropy) in one
> > paragraph

> Please do so.

It often happens that we are confronted with a large set of hypotheses,
all of which are consistent with a few known facts. In such situations,
it is very helpful to have a general and theoretically well motivated
criterion for choosing the "simplest" hypothesis of all those consistent
with the data. The Principle of Maximal Entropy is the best known of many
such criteria, and one of the most general. To take the simplest
nontrivial example, suppose we have a known function f:X -> R, where X is
finite, and that we also know the mean

a = <f> = sum_{j=1}^n p(x_j) f(x_j)

with respect to an -unknown- probability density p: X -> [0,1]. (That is,
sum_j p(x_j) = 1, and p >= 0 everywhere.) In this case, the data gives
only one constraint on p, and we need some additional criterion to pick
out the "best" hypothesis (density) consistent with this constraint.
The PME says that we should use that p which maximizes the entropy

H(p) = -sum_{j=1}^n p(x_j) log p(x_j)

among all those p which obey the constraint

a = sum_{j=1} p(x_j) f(x_j)

The intuition behind this principle is that we should make the least
possible -additional assumptions- about p, beyond the one fact we know,
the mean of f with respect to p. It turns out there is always a unique
maximal entropy measure, and it can be readily computed by the classical
method of Lagrange multipliers, so the PME is computationally effective
and gives a unique "answer". In many situations it has a kind of magical
power; for example it is the best way to very easily -derive- all the
classical probability densities studied in one year introductory
probability theory courses (and yes, undergraduates can do this as easy
homework exercises), by assuming various moments (the mean is only the
first moment; the PME applies when there are any finite number of
constraints, e.g. various moments for various known functions). The
original application was the beautiful derivation by Jaynes from the PME
of the Gibbs density used in statistical mechanics (the example given
above).

> PROBABILITY THEORY -- THE LOGIC OF SCIENCE
> http://omega.math.albany.edu:8008/JaynesBook.html
>
> E. T. Jaynes, Just go and take a look at it, very good!
> Probability Distributions, Decision theory, Orthodox Statistics,
> Physical Measurements, Regressions, Time series Analysis,
> Spectrum/Shape Analysis, Statistical Mechanics ...

Yes, there is a link to this book, which was unfortunately left incomplete
at the time of Jaynes's death (and which was clearly intended as a
passionate plea for the virtues of Bayesian analysis in general and PME in
particular a huge variety of fields) on my entropy pages. Most of the key
chapters are complete and very clearly written, including the introduction
to PME, which more or less renders my description above redundant. Another
good introduction to the PME may be found in the undergraduate textbook by
Thomas & Cover, Elements of Information Theory. See also the books of J.
N. Kapur on maximal entropy.

Cf. also the principle of minimal description length (see preprints by Li
and Vitanyi and Gacs on LANL), which is a formulation of PME in the
context of algorithmic information theory. (AIT and classical Shannonian
theory are in some sense equally general, since AIT involves a universal
probability and thus Shannonian theory applies within AIT. On the other
hand, AIT explicitly applies to any universal Turing machine, so in some
sense these principles are all about as general as one can have within
"ordinary mathematics"*).

Chris Hillman

Home Page: http://www.math.washington.edu/~hillman/personal.html

*Logicians study "weird mathematics" in which there may be more general
notions of "sets" or "numbers", for instance, and I am pretty sure this
can lead to a distinct notion of a computation. Cf. too the computation
theory of Smale. No doubt John Baez can comment about this point.

Chris Hillman

unread,
Oct 19, 1999, 3:00:00 AM10/19/99
to
On 18 Oct 1999, Barry Adams wrote:

> Fisher information, I is
>
> /
> I = | (dp(x)/dx)^2 / p(x) dx
> /
>
> where p(x) is the probablity density functional for getting a noise
> value x.

OK, in my post I was referring to the -discrete- version of Fisher's
information (N.B.: thinking of this guy with "counting measure" as the
discrete version might be misleading). Both types of Fisher information
are covered in Cover & Thomas, Elements of Information Theory, Wiley,
1991, IIRC.

Also, I was referring to the Fisher information for a multiparameter
model, which is a matrix involving partial derivatives. I think your guy
is the "continuous" analog (possibly, as I said, a misleading analogy) of
the Fisher information for a one parameter model, i.e. a scalar. I
haven't looked at Cover and Thomas in a while, so I'm not sure I remember
exactly how this goes.

> The mean square error e^2 in our estimates of <T(y)> of
> the real value of T obeys
>
> e^2 I >= 1

A special case of the Cramer-Rao inequality. I don't see where the 1 on
the right hand side comes from, though. Presumably he proves this
inequality?



> I is a measure of how much information we may obtain from a
> measurement of a system.

This is one of the points where I want to ask Frieden to stop and back up
a bit. I have the impression that this interpretation involves a
"measurement" in the sense of quantum mechanics. This notion of
"measurement" is less general, I maintain, than analogous notions in those
parts of decision theory, measurement theory, estimation theory, etc.,
which are based on classical information theory, but I don't want to get
into an argument about this with anyone who hasn't studied (at least)
Thomas & Cover.

Another point: there is an excellent theoretical rationale for the
"information" interpretation of Shannon information (see Shannon 1948,
available free on my entropy pages, for a very clear explanation, and see
my own preprints on entropy, including "An Entropy Primer" and "A Formal
Theory of Information" for generalizations. AFAIK, the theoretical
rationale for the "information" interpretation of Fisher information is
not as clear or as general, although the Cramer-Rao inequality (the
version in Cover and Thomas) helps. If I'm wrong about this, I'd welcome
correction, of course.

I'm not saying the "information" interpretation of I is unreasonable---
I'm trying (perhaps not very successfully) to say that AFAIK, using Fisher
information amounts to making some assumptions which Frieden has not
(AFAIK) made explicit.

> Frieden introduces J, the bound information, another functional,
> describing how much information was originally in the system. J is
> where most of the model dependent parts of the equation go in.

Hmm... well, I guess it's a good sign that he (Frieden) is aware of the
model dependence. Cf. my remarks about parametric versus nonparametric
statistics.



> Using probability amplitudes q, q^2(x) = p(x), I may be writen.
>
> /
> I = | ( dq(x)/dx ) ^2 dx
> /
>
> (x is generally a four vector).
>
> The EPI principle is
>
> functional delta (I - J) = 0

I.e. first variation of the integral vanishes?

Others might be wondering: why not I + J? IJ? I can answer that myself:
I - J is supposed to be something like the "free" part of the Fisher
information. Or even better, see comment below.



> which may be solved using the usual Euler-Lagrange method.

I think this must mean that the variation of the integral to some order
vanishes (there are n-th order generalizations of the usual Euler-Lagrange
equation as treated for instance in Landau & Lifschitz, Mechanics, which
can be proven the same way, by further integration by parts, etc.).

Anyway, we are making progress because you have reminded me that the
quantity J is -exactly- where I had a big problem with Frieden's papers.
(My feeling is that the Fisher versus Shannon thing is a lesser issue.)
I have the impression (and apparently so did Nathan and John) that J is
the place where this method becomes rather "fuzzy", and that maybe Frieden
has replaced a "bag o' tricks" for picking a Lagrangian with a "bag o'
tricks" for picking J. And Lagrangians have been intensively studied for
a long time and have many deep connections to other things which are also
known to be very important. Whereas none of us (I presume) know what J
"means".

Another point: this I - J has the form

quadratic stuff - right stuff (to make things come out)

and looks awfully like kinetic minus potential energy. Which sounds
familiar :-)

Does Frieden offer a succint English language rationale for why we should
expect this integral is stationary under small perturbations? Why we
should indeed regard it as the "general law of physics"?

As a rule of thumb, in Jayne's approach to statistical mechanics (which
employs the theoretically preferable Shannon information), 1/(kT) times an
energy is an analogous information quantity (and this "reciprocal absolute
temperature" is a natural variable, not absolute temperature). I.e. in the
Legendre duality which naturally arises in that approach, entropy as a
function of internal energy is dual to "free information" as a function of
reciprocal temperature. This free information is 1/(kT) times the free
energy.

Incidently, Pesin turns this around, -starts- with an extremely general
Legendre duality and uses it to define topological entropy, metric
entropy, etc., by analogy with Caratheordory's construction of Hausdorff
dimension. Again as a rule of thumb, Shannon entropies (and related
quantities like Kolmogorov-Sinai entropy or metric entropy and topological
entropy) are the Hausdorff dimension of some fractal set. For this, see
the recent book by Pesin, Dimension Theory.

Anyway, this brings us back to my guess that Frieden is working up a
Fisher information analog of Jaynes' approach. The good part about this
idea is that Fisher information has a quadratic nature, so to speak. The
bad part is that Shannon information involves no hidden assumptions (e.g.
existence of a parameterized model) and has a theoretically better
grounded interpretation (IMO) as "information", as compared with Fisher
information.



> Since the information measured should be some or all of the
> bound information.
>
> I = kJ where ( 0<= k <= 1)

This is the kind of thing which makes me go bug-eyed. Why can't Frieden
just say 0 <= J <= I? And is that an interesting little theorem, part of
the definition of J, a mathematical triviality, or what?

In the analogous Shannon theory, the conditional entropy obeys

0 <= H(A/B) <= H(A)

This is necessary for the "information" interpretation of the difference

I(A,B) = H(A) - H(A/B) = H(B) - H(B/A)

but it's an interesting little lemma in Shannon's approach. Note that the
Roklin metric I mentioned in my previous post is

d(A,B) = H(A/B) + H(B/A)

which I was guessing might be a nonparametric analogue of the Fisher
information for multiparameter models.

Also, it seems that H(A), not I(A,B) is analogous to Fisher "information"
in Frieden's approach, and that perhaps I-J is analogous to

H(A) - H(A/B)

If so, it would be interesting if there is some "factor ordering" whereby
one can figure out the analog of

H(B) - H(B/A)

and it is not the same, in Frieden's approach. The "symmetry" of
Shannon's information is necessary to the straightforward information
interpretation, but whole books have been written on "quantum entropies",
defined in terms of operator algebras, whose definition involves factor
ordering problems. The short story is that (last time I checked) noone
knew the "right" generalization of von Neumann's "entropy operator", which
would be involved in Connes programme for instance.

This raises the question of connections between Frieden's approach and
operator algebras.



> The hard part is in finding J for the problem.

Aha.

> J is a scalar functional of all the physical aspects of the problem.

Sounds rather like a Lagrangian :-) Or rather (see above) an information
analogue of potential energy.

Let me throw this thought out for discussion: by Occam's razor, the
Lagrangian approach is intrinsically superior because it requires one
scalar function, not two.

Also, if the Fisher information used by Frieden is always a scalar, I
think he is in effect restricting himself to one parameter models (at
least with respect to the "information" interpretation of I; I have the
feeling that -different- notions of "parameterized model" might be
involved for I and J; if so this is a further complication in his
approach; if not, it is an unexplained limitation in the type of models
considered).

> Typically Frieden writes J in some general form. For instance for
> classical electrodymanics, with current a vector j, and charge density
> rho, he writes
>
> /
> J = 4c | d^3 r dt Sum E_n J_n (q,j,rho)
> /

Spherically or cylindrically symmetric expanding wave? (r dt?)

And what is the method whereby he picks this particular J? What goes
wrong if you pick a different J? How do you know which one will work,
other than trial and error?

I think we'll all grant this much, though: it's always interesting to have
a different approach to deriving the fundamental equations of physics, and
maybe further work in this area will uncover connections between this J
and other things in math/physics. Nontrivial connections like that would
make Frieden's approach look much more interesting, I think. For
instance, I wouldn't be as interesting to me if I didn't know about the
more general form of the Cramer-Rao inequality given in Cover and Thomas.

Vesselin G Gueorguiev

unread,
Oct 20, 1999, 3:00:00 AM10/20/99
to
Chris Hillman wrote:
>
> On 15 Oct 1999, Vesselin G Gueorguiev wrote:
>
> > Chris Hillman wrote:
>
> > > I can summarize Jaynes' PME (principle of maximal entropy) in one
> > > paragraph
>
> > Please do so.

[...]

> H(p) = -sum_{j=1}^n p(x_j) log p(x_j)
>
> among all those p which obey the constraint
>
> a = sum_{j=1} p(x_j) f(x_j)
>

[...]

Thanks. This is so simple and clear! I would like to apply it some day
when I have a chance, and thus excuse to learn more about Jaynes' PME.
I just hope to remember it as really good way of deriving p(x).
Any chance of getting Shrodinger's equation somehow? I mean the
Hamilton-Jacoby equivalent in terms of probability p(x) and phase S(x).


Joachim Draeger

unread,
Oct 20, 1999, 3:00:00 AM10/20/99
to
>Cf. also the principle of minimal description length (see preprints by Li
>and Vitanyi and Gacs on LANL), which is a formulation of PME in the
>context of algorithmic information theory. (AIT and classical Shannonian
>theory are in some sense equally general, since AIT involves a universal
>probability and thus Shannonian theory applies within AIT. On the other
>hand, AIT explicitly applies to any universal Turing machine, so in some
>sense these principles are all about as general as one can have within
>"ordinary mathematics"*).
>
>*Logicians study "weird mathematics" in which there may be more general
>notions of "sets" or "numbers", for instance, and I am pretty sure this
>can lead to a distinct notion of a computation. Cf. too the computation
>theory of Smale. No doubt John Baez can comment about this point.

Any theory of computation is related to some 'description' of the allowed
computational procedures. It is hard to give a formal definition here (except in
category theory, of course :-)), because there are so many different
approaches. You can use grammars, automata, machines, mathematical
theories, algebraic definitions and so on. Commonly, you have to define
the elementary computation steps and the way how these single steps are
related to each other (Sometimes the latter is omitted; in this case, your
computer 'program' always consists of only one single step). A single step
can be both changing a single bit and solving a PDE system.

You see the point. What a specific computer can do, depends on the
underlying model of computation. A function is computable in one
model and it is not in another. Be aware, that computation depends
not only on the underlying version of logic and set theory, but also
on the set of applicable single computational steps. Some years ago
there was an interesting article about this topic in Scientific American.
The examples of computers on this article include e.g. the soap
bubble computer for calculating minimal surfaces.

Joachim


ar...@csstupc28.cs.nyu.edu

unread,
Oct 22, 1999, 3:00:00 AM10/22/99
to
Chris Hillman <hil...@math.washington.edu> writes:

> the mean of f with respect to p. It turns out there is always a unique
> maximal entropy measure, and it can be readily computed by the classical
> method of Lagrange multipliers, so the PME is computationally effective
> and gives a unique "answer". In many situations it has a kind of magical

Just to pick a minor nit. I don't think the existence is actually
guaranteed. I read Cover + Thomas quite some time ago, but I vaguely
remember that they give an example, where the maximum entropy is never
attained, though the supremum of the possible entropy values would
still be given by what you get from Lagrange multipliers.

-- Archi


Barry Adams

unread,
Oct 22, 1999, 3:00:00 AM10/22/99
to
On Tue, 19 Oct 1999 02:08:18 GMT, Chris Hillman
<hil...@math.washington.edu> wrote:

(snip)

>> The mean square error e^2 in our estimates of <T(y)> of
>> the real value of T obeys
>>
>> e^2 I >= 1
>
>A special case of the Cramer-Rao inequality. I don't see where the 1 on
>the right hand side comes from, though. Presumably he proves this
>inequality?

Yes, as well as referencing Van Trees and Thomas & Cover.

>
>> I is a measure of how much information we may obtain from a
>> measurement of a system.
>
>This is one of the points where I want to ask Frieden to stop and back up
>a bit. I have the impression that this interpretation involves a
>"measurement" in the sense of quantum mechanics. This notion of
>"measurement" is less general, I maintain, than analogous notions in those
>parts of decision theory, measurement theory, estimation theory, etc.,
>which are based on classical information theory,

Frieden use of measurement here is entirely based on classical
measurement theory. This is true even when he's deriving quantum
wave equations. He does for completeness aderivations of EPI based on
quantum machanics but the main derivation is based on classical
measurement theory.. At the end of the book he using EPI to derive
Feynman - Mensky and Bohm - Von Neumann quantum wave
equations in the presence of measurements.

>Another point: there is an excellent theoretical rationale for the
>"information" interpretation of Shannon information (see Shannon 1948,
>available free on my entropy pages, for a very clear explanation, and see
>my own preprints on entropy, including "An Entropy Primer" and "A Formal
>Theory of Information" for generalizations. AFAIK, the theoretical
>rationale for the "information" interpretation of Fisher information is
>not as clear or as general, although the Cramer-Rao inequality (the
>version in Cover and Thomas) helps. If I'm wrong about this, I'd welcome
>correction, of course.

Fisher does into quite a bit of detail comparing Fisher Information
and Shannon Information. He shows Fisher information is also
additive for mutually isolated systems. He also proves a thoerem
analogous to the Boltzmann H-theorem, that

dI(t) / dt <= 0

(Providing the system obeys a Fokker-Planck equation).

I.E. that Fisher information may be used as an entropy. He even
defines a Fisher temperature of a parameter being measured analoguous
to temperature from the usual Boltzmann entropy.



>I'm not saying the "information" interpretation of I is unreasonable---
>I'm trying (perhaps not very successfully) to say that AFAIK, using Fisher
>information amounts to making some assumptions which Frieden has not
>(AFAIK) made explicit.

Frieden seems pretty rigous and complete to my (admitted not too
thougher) reading. Its pretty easy to find holes in my brief
summaries. If you want to find holes in his work, you better buy the
book. I suppose under copyright law I can post you up to 10% of the
book scanned to computer, for research proposes only.

>> The EPI principle is
>>
>> functional delta (I - J) = 0
>
>I.e. first variation of the integral vanishes?

Yes. This is the variation in I and J as functional of q(x) or p(x).
The amplitude or probability of obtaining the moise value x in the
measurement.

>I think this must mean that the variation of the integral to some order
>vanishes (there are n-th order generalizations of the usual Euler-Lagrange
>equation as treated for instance in Landau & Lifschitz, Mechanics, which
>can be proven the same way, by further integration by parts, etc.).

Frieden only first order variational derivatives, and the usual
Euler-Lagrange equation.

> Whereas none of us (I presume) know what J
>"means".

This may or may not help, but Frieden makes the identification

If I is analoguous to the Kullback Leibler cross entropy between
p(x) and p(x+ delta x), then J is analoguous to the system entropy
H_B of a system, we must increase by at least dH after a measurement
has extracted dH Shannon Information from the system, in Brillouin
work on Szilards engine.
J is supposed to be the information stored inside the physical
object we are measurement.

>Another point: this I - J has the form
>
> quadratic stuff - right stuff (to make things come out)
>
>and looks awfully like kinetic minus potential energy. Which sounds
>familiar :-)

Indeed, in fact Frieden often calls I-J the informational Lagrandian

>Does Frieden offer a succint English language rationale for why we should
>expect this integral is stationary under small perturbations? Why we
>should indeed regard it as the "general law of physics"?

The simplest form would be that a good theory of nature should
conserve information. When a measurement of a system has gained
information delta I, the system has lost information delta J.
Frieden also derives EPI from an optical model, when I[ psi(x) ] is
the information of an image of an object with wavefunction phi(mu),
whose information is J = 1/h-bar^2 < mu^2 > . And shows it holds
whenever coordinates x and mu are connected by a unitary
transformation.

>> Since the information measured should be some or all of the
>> bound information.
>>
>> I = kJ where ( 0<= k <= 1)
>
>This is the kind of thing which makes me go bug-eyed. Why can't Frieden
>just say 0 <= J <= I? And is that an interesting little theorem, part of
>the definition of J, a mathematical triviality, or what?

So he can use the constant k (which is calles the efficiency value)
in he's derivations and find its value at the end of the derivations.
In fact of quantum derivations he finds k = 1. But for classical
electrodymanics and gravity he gets k = 1/2. Where the lose of
information is do to ignored quantum effects.



>> J is a scalar functional of all the physical aspects of the problem.
>
>Sounds rather like a Lagrangian :-) Or rather (see above) an information
>analogue of potential energy.

Sound fair.

>Let me throw this thought out for discussion: by Occam's razor, the
>Lagrangian approach is intrinsically superior because it requires one
>scalar function, not two.

I'd use an more pragmatic form of Occam's razor, use whatever
gets you the answer quicker. But even more pragmatical way not
try something newer, if old methods been done to death, the knew
ones more likely to gain knowledge.

>Also, if the Fisher information used by Frieden is always a scalar, I
>think he is in effect restricting himself to one parameter models (at
>least with respect to the "information" interpretation of I;

I is a scalar functional, I don't why he can't generalise it as far
as needed. For instance, he write I[ phi(x,y,zict) ] as a scalar
functional of a wavefunction which is itself a N dimenision vector,
when he derives the Dirac equation.

>> Typically Frieden writes J in some general form. For instance for
>> classical electrodymanics, with current a vector j, and charge density
>> rho, he writes
>>
>> /
>> J = 4c | d^3 r dt Sum E_n J_n (q,j,rho)
>> /
>
>Spherically or cylindrically symmetric expanding wave? (r dt?)

Damn ascii, integral over all space and time (why not only in the a
light cone?), whatever coordinates you like. E_n is are constants to
be found. J_n is function of the noise amplitudes q, current j and
charge rho, also to be found.

>And what is the method whereby he picks this particular J?

It a pretty general form, J is dependent on a additive local
functions of the available inputs. How much more general could
you write it.

> What goes
>wrong if you pick a different J? How do you know which one will work,
>other than trial and error?

Pass. If J isn't written generally enough, these no way to get the
correct answer. If its written even more generally maybe it would
get a wrong answer but more likely it would be just to differicult to
solve for any answer.

Barry Adams


Chris Hillman

unread,
Oct 22, 1999, 3:00:00 AM10/22/99
to
On 20 Oct 1999, Vesselin G Gueorguiev wrote:

> Thanks. This is so simple and clear! I would like to apply it some day
> when I have a chance, and thus excuse to learn more about Jaynes' PME.
> I just hope to remember it as really good way of deriving p(x).

Yes.

> Any chance of getting Shrodinger's equation somehow? I mean the
> Hamilton-Jacoby equivalent in terms of probability p(x) and phase S(x).

Shannon's ideas have turned out to be incredibly versatile, with
applications far more varied than any one person could possibly hope to
keep up with. I can't give you good references, but there are certainly
relations with wavelet analysis, Fourier analysis, versions of the
uncertainty principle, and quite possibly Schroedinger's equation. You
can look for Folland's book on harmonic analysis and DiBenedetto et al.'s
book on wavelets for some of this stuff.

Chris Hillman

unread,
Oct 22, 1999, 3:00:00 AM10/22/99
to

On Fri, 22 Oct 1999, Barry Adams wrote:

> Frieden use of measurement here is entirely based on classical
> measurement theory.

I suspect we aren't using that word the same way. I'm still confused, but
not yet intrigued enough to buy the book.

> Fisher does into quite a bit of detail comparing Fisher Information
> and Shannon Information. He shows Fisher information is also
> additive for mutually isolated systems.

That's not really going very far in terms of formal properties. See my
eprint "A Formal Theory of Information". (For John: disjoint unions or
non-interacting systems aren't interesting; interacting systems are.)

> He also proves a thoerem
> analogous to the Boltzmann H-theorem, that
>
> dI(t) / dt <= 0
>
> (Providing the system obeys a Fokker-Planck equation).

At least the sign of the inequality looks right to me, although I guess
mot other readers will think you have it going the wrong way. So
according to Frieden

0 <= J <= I dI/dt <= 0

(Draw a sketch to see what this implies!)



> I.E. that Fisher information may be used as an entropy.

See my "Formal Theory" and compare Shannon 1948 to understand why I'd say
Frieden has done no such thing. There are several more things you need to
prove. So far, according to you, he's proven one and a quarter of the
three things I listed, the third of which is often quite hard to prove and
which you haven't even mentioned.

> He even defines a Fisher temperature of a parameter being measured
> analoguous to temperature from the usual Boltzmann entropy.

I'm still underwhelmed. This is pretty much guaranteed to happen in
almost any situation, if you think about it the right way. The
construction of Pesin explains why.



> If you want to find holes in his work, you better buy the book.

I don't have time or money for that right now. Maybe John will read it; I
am pretty sure he is a much faster reader than most of us.

> > Whereas none of us (I presume) know what J "means".
>
> This may or may not help, but Frieden makes the identification
>
> If I is analoguous to the Kullback Leibler cross entropy between
> p(x) and p(x+ delta x),

AKA "discrimination" (Kullback's preference--- see his book), "divergence"
(my preference--- see Cover & Thomas, the chapter on the "theory of
types"), "relative entropy", "Kullback-Liebler entropy", etc. This
quantity has been rediscovered and renamed more times than any other
quantity I have ever encountered. One paper lists about twenty synonyms.
Arghghgh!

> then J is analoguous to the system entropy H_B of a system, we must
> increase by at least dH after a measurement has extracted dH Shannon
> Information from the system, in Brillouin work on Szilards engine.

Uh oh! I am convinced that Brillouin's analysis was wrong as wrong can
be. I read that book a long time ago as an undergraduate and thought that
the second half was basically entirely wrong, based upon a
misinterpretation of a mathematical triviality. There is no such thing as
"negentropy"! Is Charles Bennett in the house?

> J is supposed to be the information stored inside the physical object
> we are measurement.

Well, I'll go on record predicting that when the dust settles, this will
not be the accepted interpretation of Frieden's J.



> >Another point: this I - J has the form
> >
> > quadratic stuff - right stuff (to make things come out)
> >
> >and looks awfully like kinetic minus potential energy. Which sounds
> >familiar :-)
>
> Indeed, in fact Frieden often calls I-J the informational Lagrandian

Which is of course exactly what I was getting at. It would help if he
would say at the beginning of his papers something like this:

"As a rule of thumb, 1/(kT) times an energy quantity is an entropy, for
instance in Jaynes' formalism. Lagrangians are well known to be a
unifying principle in physics, and sometimes take the form kinetic minus
potential energy. It is therefore natural to seek an information
theoretic analogue. However, Shannon's entropies are not conducive to
obtaining a "kinetic" term with a quadratic nature. On the other hand, as
a rule of thumb, the parametric statistical estimation analogue of the
Shannon entropy, which is used in nonparametric statistical estimation, is
the Fisher information, which -does- have a quadratic form. This suggests
trying to reformulate the Lagrangian approach to physics in terms of
Fisher information I (taken as the information analogue of kinetic energy)
and a new quantity J (taken as the information analogue of potential
energy)."

Such a paragraph would show right away that the idea is perfectly natural,
but that it also has an obviouis built-in flaw (parametric rather than
nonparametric statistics, i.e. hidden assumptions), and also would show
that the real work in creating such a theory is figuring out what J should
be and what the proper interpretation of I and J should be in physics.
Unfortunately, it looks to me like Frieden has a good idea, but has messed
up the real work, the interpretation part. In particular, he has a big
job to show that his approach is really conceptually superior to the tried
and true Lagrangian approach, which has two mysteries:

1. How do you find the "right" Lagrangian?,

2. Why does Nature adore stationary Lagrangian integrals?

whereas it seems to me Frieden's idea leaves us with three mysteries:

1. What is the right interpretation of I, J? (I don't think he's got
it right yet, from your description).

2. How do you find the "right" J?

3. Why does Nature adore stationary free-information integrals?

> The simplest form would be that a good theory of nature should
> conserve information. When a measurement of a system has gained
> information delta I, the system has lost information delta J.

Am I the only one who's a bit underwhelmed here? Maybe I know too much
about classical information theory and too little about physics, but
basically, "information" is a very supple concept and you can twist it to
fit almost any preconcieved notion. Shannon's concept is spectacularly
"right" (we know that because his ideas have so many and such powerful
applications, for instance to designing devices which work, and far, far
better than anyone could have dreamed about did we not have his theory to
tell us how high to aim), but IMO most of the "spin-off"'s are pretty
vapid. "A good theory of nature should conserve information" seems vapid
to me.

IMO, when it comes to entropies, one way to tell "the real McCoy" is to
check out the formal properties. I think I give a very good rationale in
"Formal Theory" for why particular formal properties are particularly
desirable if you want to have an "information" interpretation. (Although
readers of that paper will see why in the end I concluded I probably
overlooked a fourth axiom. A sin of omission which I may correct, time
permitting, some day.)

> >> Since the information measured should be some or all of the
> >> bound information.
> >> I = kJ where ( 0<= k <= 1)
>
> > This is the
> > kind of thing which makes me go bug-eyed. Why can't Frieden just say
> > 0 <= J <= I? And is that an interesting little theorem, part of the
> > definition of J, a mathematical triviality, or what?
>
> So he can use the constant k (which is calles the efficiency value)
> in he's derivations and find its value at the end of the derivations.
> In fact of quantum derivations he finds k = 1. But for classical
> electrodymanics and gravity he gets k = 1/2. Where the lose of
> information is do to ignored quantum effects.

You didn't answer my question. Probably because Frieden didn't address
this.

> >Sounds rather like a Lagrangian :-) Or rather (see above) an information
> >analogue of potential energy.
>
> Sound fair.

Yes. And why couldn't he be up front about what he is up to? It is after
all a very natural idea, and the fact that it has at least one obvious
built-in flaw should not be a deterrent, since -every- physical theory
(known to me) has at least one built-in flaw. It's good to know what the
flaws are, so that you know in advance what you should watch out for!



> >Let me throw this thought out for discussion: by Occam's razor, the
> >Lagrangian approach is intrinsically superior because it requires one
> >scalar function, not two.
>
> I'd use an more pragmatic form of Occam's razor, use whatever
> gets you the answer quicker. But even more pragmatical way not
> try something newer, if old methods been done to death, the knew
> ones more likely to gain knowledge.

I think the old pros will probably disagree, to some extent. Deep
connections beget more deep connections, as a rule. For this reason, a
concept like the Lagrangian which is already known to have deep
connections to all kinds of other mathematical ideas (John could explain
them much better than I, so I won't even attempt to list these) is always
going to be out "ahead", and a new idea will have to work twice as hard
just to catch up, so to speak.

Which is not to say that new ideas should be ignored as a matter of
priniciple, of course, just that in such a highly developed subject as
mathematical physics, unless you can right away provide at least -one-
indisputable -deep- connection to something known to be of central
importance, the old pros will probably be unimpressed. See below for a
statement of something which would impress -me- (a young semi-pro).



> I is a scalar functional, I don't why he can't generalise it as far
> as needed. For instance, he write I[ phi(x,y,zict) ]

Ritual groan at "ict".

> Damn ascii, integral over all space and time (why not only in the a
> light cone?),

You tell me. Past light cone sounds good for electrodynamics.

> whatever coordinates you like. E_n is are constants to
> be found. J_n is function of the noise amplitudes q, current j and
> charge rho, also to be found.

"Noise amplitudes" sounds suspicious to me. Whenever I see people
throwing this around in an "information theory", I begin to suspect that
if I did enough work it would turn out that "noise" would better be called
"utterly meaningless fudge factor required to make something look vaguely
plausible". I don't know, of course, if that is really the case here.
What I'm trying to say is, I see (bad) papers all the time where the
author simply calls something an "entropy" on the basis of some or less
vague formal analogy, never noticing that his "entropy" does not in fact
behave like an information theoretic quantity at all. The early
literature on alleged ecological applications of Shannon entropy provides
many blatant examples of this practice of "proof by naming". In fact,
before you can legitimately call something "information" or "entropy" or
"noise", I maintain that you have some mathematical work to do. You can
prove some formal properties, for instance (not just additivity of
noninteracting systems).



> >And what is the method whereby he picks this particular J?
>
> It a pretty general form, J is dependent on a additive local
> functions of the available inputs. How much more general could
> you write it.

You didn't answer the question. Probably because Frieden didn't address
it either.



> >What goes
> >wrong if you pick a different J? How do you know which one will work,
> >other than trial and error?
>
> Pass. If J isn't written generally enough, these no way to get the
> correct answer. If its written even more generally maybe it would
> get a wrong answer but more likely it would be just to differicult to
> solve for any answer.

Pass acceptable, but I hope you appreciate why I still have a whole bunch
of important questions:

1. not clear what the correct interpretations of J and I-J are

2. not clear how to write down the right J for a given theory

3. not clear how parameters of statistical models enter in
(they must be there somewhere, because we are talking Fisher)

4. not clear how this stuff is related to statistical manifolds
(Bradley Effron, Amari, etc.)

5. not clear to me how this is related to the Kullback-Liebler
divergence D(A||B), although maybe Frieden does explain this

6. not clear to me how these ideas are related to measurement theory,
classical or otherwise.

The good news is that at least I think I understand the basic idea, and
agree that it is a very natural thing to try. I even agree that the
obvious built-in flaws might be something one could live with. But while
I would love to see fundamental physics reformulated in terms of
"information-theoretic principles" (I am a long-time and devoted fan of
information theory---- just visit my entropy pages!), I am not yet
convinced that Frieden's approach is the right way.

The only thing I've heard that sounds "right" in all you've said is that
the Fisher information is indeed quadratic in nature and therefore it is
indeed the obvious choice to try to substitute for kinetic energy in the
traditional Lagrangian approach. An obvious point, once someone says it
like this :-/

It seems to me at this point that at best Frieden has partially suceeded
in coming up with a Fisher-information reformulation of the usual
Lagrangian approach, but apparently he has not given a precise dictionary
enabling one to pass back and forth freely between the two methods. Now
-that- would be useful, because then you would always have two slightly
different ways to think about a given problem. And of course such a
dictionary would carry all the deep connections already known for the
classical "energy-like" Lagrangian approach into this
"Fisher-information-like" Lagrangian approach. So such a dictionary,
stated as a mathematical theorem, would really get the attention of a lot
of people, including me.

Hopefully some physics grad student will run out and set things to right!

Barry Adams

unread,
Oct 23, 1999, 3:00:00 AM10/23/99
to
On 22 Oct 1999 11:22:08 -0500, Chris Hillman
<hil...@math.washington.edu> wrote:

[snip]

I'm not going to push Frieden on you any further, I got you to the
point where you think it might be somewhat interesting for some
researcher to go into in more detail, and that's as much as I can ask
of anybody with anything else to do. I'll try an answer the questions
you didn't think I answered and ask about bit more about some
digressions you made.

>Uh oh! I am convinced that Brillouin's analysis was wrong as wrong can
>be. I read that book a long time ago as an undergraduate and thought that
>the second half was basically entirely wrong, based upon a
>misinterpretation of a mathematical triviality. There is no such thing as
>"negentropy"! Is Charles Bennett in the house?

You don't believe his resolution of the Maxwell's Demon/Szilard's
Engine problem? Or is it something else that was wrong. I'd be
interested to see your position on Maxwell's Demon if the first is the
case.

[snip]

>In particular, he has a big
>job to show that his approach is really conceptually superior to the tried
>and true Lagrangian approach, which has two mysteries:
>
> 1. How do you find the "right" Lagrangian?,
>
> 2. Why does Nature adore stationary Lagrangian integrals?

I thought Feynman's path integral approach solved 2.
i.e. a system takes every path with equal amplitude up to a phase, but
the phase depends on the path integral of the action. Paths that don't
extremize the action cancel out.

>whereas it seems to me Frieden's idea leaves us with three mysteries:
>
> 1. What is the right interpretation of I, J? (I don't think he's got
> it right yet, from your description).
>
> 2. How do you find the "right" J?

One thing to be clear about is Frieden doesn't just a guess J. He
writes in a general form and then solves for an exact form using
axioms 1,3 and an invariance principle of the q's. See below.
So your getting a lot of work done for you, that would have to have
done by trial and error with the Lagrangian method.

> 3. Why does Nature adore stationary free-information integrals?

Frieden shows this follows in a quantum optics demonstration. It
may well follow more generally from a path integral approach as
above.

> "A good theory of nature should conserve information" seems vapid
>to me.

Hey I was paraphrasing different sections from all over the book
into a few lines. Not surprising it seems vapid. The part which best
expresses the point, is section on game theory, were he considers the
EPI method as a game played between an observer and nature, for the
prize of information. The EPI solution being the saddle point or fixed
point the payout tends to.

>> >> Since the information measured should be some or all of the
>> >> bound information.
>> >> I = kJ where ( 0<= k <= 1)

>> > This is the
>> > kind of thing which makes me go bug-eyed. Why can't Frieden just say
>> > 0 <= J <= I? And is that an interesting little theorem, part of the
>> > definition of J, a mathematical triviality, or what?

>> So he can use the constant k (which is calles the efficiency value)
>> in he's derivations and find its value at the end of the derivations.
>> In fact of quantum derivations he finds k = 1. But for classical
>> electrodymanics and gravity he gets k = 1/2. Where the lose of
>> information is do to ignored quantum effects.

>You didn't answer my question. Probably because Frieden didn't address
>this.

I didn't answer where the I = kJ (0<= k <=1) comes from. Ok.

J >= I, comes from him Brillouin's system Entropy increase in a
measurement and transferring this over to the Fisher Information
approach, since you didn't seem to agree with Brillouin above you
still won't like this. Fisher say's k may be considered a constant for
a particular physical situation, because J's functional form depends
on the physical parameters of the scenario while I's does not. It fact
he further postulates that if i_n( X ), and j_n( X ), are local
information densities for I and J at a space time point (four vector)
X, i.e.
/
I = | (all space time) Sum_n i_n(X) (2)
/
and similarly for J;

Then i_n and j_n obey on the microscopic level,

i_n(x) - k j_n(x) (3)

and takes this as an additional axiom. This is he's axiom 3, for the
method,

axiom 1 being first variation (I-J) = 0.

axiom 2 being (2) above.

This makes more sense to me than the global condition I-kJ = 0 only,
since who is around check that the integrals over all space time
match. Of course its a much more powerful axiom, and one like any
axiom can only be justified by itself, or its results, being
consistent with experiment. I should have make these axioms clear
earlier.

>> integral over all space and time (why not only in the

>> light cone?),

>You tell me. Past light cone sounds good for electrodynamics.

Look again at the form of the integral,

/ /
J = 4c | d^3 r | dt Sum_n E_n J_n(q(x),j(x),rho(x) ) (4)
/ /

E_n an unknown constant
J_n an unknown information density

They are enough free factors here so that the functions J_n may be
made to cancel out outside the light cone latter once he has
imposed (at a later step) the Lorentz invariance of the q(x)'s

>> whatever coordinates you like. E_n is are constants to
>> be found. J_n is function of the noise amplitudes q, current j and
>> charge rho, also to be found.

>"Noise amplitudes" sounds suspicious to me.

The q(x) are the same ones as in the definition of I i.e. the
probability amplitudes of a noise result being found in a measurement,
and are central to Frieden's methodology. Its the q's he first solves
for and then introduces an invariance condition on the q (Lorentz
invariance in this case), to find the wave equations. In the quantum
derivations the q's become the wave functions and are complex valued.
Where here for a classical derivation they stay as real valued.

[On the general form of J used for his electrodynamics derivation (4)]



>> >And what is the method whereby he picks this particular J?
>>
>> It a pretty general form, J is dependent on a additive local
>> functions of the available inputs. How much more general could
>> you write it.

>You didn't answer the question. Probably because Frieden didn't address
>it either.

Actually looking into it, J is as general as possible to be consistent
with axiom 2, without introducing any extra source terms or
measurables (i.e. changing the problem). That's what you get if you
ask the monkey instead of the organ grinder :-)

Barry Adams


Chris Hillman

unread,
Oct 24, 1999, 3:00:00 AM10/24/99
to
On 23 Oct 1999, Barry Adams wrote:

> >Uh oh! I am convinced that Brillouin's analysis was wrong as wrong can
> >be. I read that book a long time ago as an undergraduate and thought that
> >the second half was basically entirely wrong, based upon a
> >misinterpretation of a mathematical triviality. There is no such thing as
> >"negentropy"! Is Charles Bennett in the house?
>
> You don't believe his resolution of the Maxwell's Demon/Szilard's
> Engine problem? Or is it something else that was wrong. I'd be
> interested to see your position on Maxwell's Demon if the first is the
> case.

This is a huge topic. I don't haven time or energy to get into it. But I
can give a reference which I am sure you will find thought provoking.
Recently in a post to some newsgroup on some topic (details forgotten) I
provided a long list of books which included a recent collection of
articles by Wheeler, Caves, Bennett, Davies, and most of the other leading
physicists who have written about the asymmetry of time. I -think- this
post might actually have been in the "General Relativity" thread some
weeks back. Anyway, this book is the place to start reading, I think.

> >In particular, he has a big
> >job to show that his approach is really conceptually superior to the tried
> >and true Lagrangian approach, which has two mysteries:
> >
> > 1. How do you find the "right" Lagrangian?,
> >
> > 2. Why does Nature adore stationary Lagrangian integrals?
>
> I thought Feynman's path integral approach solved 2.
> i.e. a system takes every path with equal amplitude up to a phase, but
> the phase depends on the path integral of the action. Paths that don't
> extremize the action cancel out.

OK, I guess that is true. No doubt someone else can jump in point out the
new question which is surely raised by the path integral approach. I can
think of one myself:

2'. How can one make Feynman's approach completely rigorous?

I know many claims have been made by various people to have done this, but
AFAIK the mathematicians found none of them satisfactory, last time I
checked (eight years ago). John Baez could probably say what the current
state of affairs is.



> >whereas it seems to me Frieden's idea leaves us with three mysteries:
> >
> > 1. What is the right interpretation of I, J? (I don't think he's got
> > it right yet, from your description).
> >
> > 2. How do you find the "right" J?
>
> One thing to be clear about is Frieden doesn't just a guess J. He
> writes in a general form and then solves for an exact form using
> axioms 1,3 and an invariance principle of the q's. See below.
> So your getting a lot of work done for you, that would have to have
> done by trial and error with the Lagrangian method.
>
> > 3. Why does Nature adore stationary free-information integrals?
>
> Frieden shows this follows in a quantum optics demonstration. It may
> well follow more generally from a path integral approach as above.

Which brings us back to my question 2'.

> Hey I was paraphrasing different sections from all over the book
> into a few lines. Not surprising it seems vapid. The part which best
> expresses the point, is section on game theory, were he considers the
> EPI method as a game played between an observer and nature, for the
> prize of information. The EPI solution being the saddle point or fixed
> point the payout tends to.

OK, that sounds interesting.



> I didn't answer where the I = kJ (0<= k <=1) comes from. Ok.
>
> J >= I,

Whoops! So it is 0 <= I <= J, not 0 <= J <= I? That doesn't sound good.

> comes from him Brillouin's system Entropy increase in a measurement
> and transferring this over to the Fisher Information approach, since
> you didn't seem to agree with Brillouin above you still won't like
> this.

Indeed, I have a hunch this is where Frieden might have introduced a
genuine error into his work. I have enough experience with trying to fix
Brillouin's negentropy goof to guess that if you do things correctly, you
find that 0 <= J <= I, so that I-J is nonnegative, as a good "entropy"
must be.

> Fisher say's k may be considered a constant for a particular physical
> situation, because J's functional form depends on the physical
> parameters of the scenario while I's does not.

Hmm... does anyone else understand this?

> It fact he further postulates that if i_n( X ), and j_n( X ), are
> local information densities for I and J at a space time point (four
> vector) X, i.e.
> /
> I = | (all space time) Sum_n i_n(X) (2)
> /
>
> and similarly for J;
>
> Then i_n and j_n obey on the microscopic level,
>
> i_n(x) - k j_n(x) (3)

That's neither an equation nor an inequality, so you left something
crucial out, I think :-/



> and takes this as an additional axiom. This is he's axiom 3, for the
> method,
>
> axiom 1 being first variation (I-J) = 0.
>
> axiom 2 being (2) above.
>
> This makes more sense to me than the global condition I-kJ = 0 only,

I -think- you are saying that Frieden defines local i(x), j(x). Indeed,
this is analogous to one approach to (Shannonian) entropies in ergodic
theory. I -think- you are saying that Frieden assumes that

i(x) = k j(x) for some constant k, 0 =< k =< 1

which implies, I guess, that I = k J, and finally he also assumes that the
first variation of I-J (a negative quantity, right?) vanishes.

One way to start explaining this would be (in analogy with Landau and
Lifschitz, Mechanics) to start with examples where J vanishes identically,
i.e. something like the Lagrangian of a free particle.

> They are enough free factors here so that the functions J_n may be
> made to cancel out outside the light cone latter once he has
> imposed (at a later step) the Lorentz invariance of the q(x)'s

You integrals consistently appear messed up in my newsreader. It would be
better if you adopted some notation like int_(lower)^(upper).



> Actually looking into it, J is as general as possible to be consistent
> with axiom 2, without introducing any extra source terms or
> measurables (i.e. changing the problem). That's what you get if you
> ask the monkey instead of the organ grinder :-)

Clearly, in an ideal world, I'd split into two copies, one of whom would
rush out and study Frieden's book. My strong impression from what you
have said is that the basic idea of looking for an information theoretic
analogue of Lagrangians, with Fisher information standing in for the
quadratic "kinetic energy" stuff, is a good one, but that Frieden went
-irretrievably wrong- when he read Brillouin instead of Bennett.

I suspect from what you have said that his method is deeply flawed in the
foundations (this is not inconsistent with his partial success at
recovering various bits of theoretical physics, although I think it is
telling that he has not recovered the -field equation- of gtr, for
instance, much less given the dictionary I demanded).

The good news is that if one starts with the partial dictionary which
sounds right

Lagrangian
Fisher information kinetic energy or other quadratic thing

and then tries to develop a precise dictionary using the ideas of Bennett,
and probably also using known connections to Shannonian information theory
such as the Cramer-Rao inequality (that's another thing Frieden did which
sounds right, except he apparently didn't use the general form) and to
discrimination (aka cross-entropy, etc.), using specific examples form
Landau and Lifschitz on the right hand side as a guide, that one could
come up with something really interesting, perhaps even a complete
dictionary between "information-theoretic" and "energetic" Lagrangian
approaches. I think many readers will agree that constructing a precise
dictionary like this would be a natural goal which follows from the basic
idea of Frieden.

Unfortunately, from what you say, I think some physicists who might be
unaware of the critiques of Brillouin's work, which appears to have been
used in an essential way by Frieden, will be lead down the wrong path as I
strongly suspect Frieden was. So maybe we have here a false Messiah after
all, although I would predict that the people who contributed to the book
I mentioned have already noticed the suspicious features of Frieden's work
(as you described it) which I have pointed out. Incidently, has anyone
seen any book reviews of Frieden's book in reputable mathematical or
statistical or physical journals?

So I think fixing up Frieden's idea to make it really work is a fabulous
problem for someone like me, I just don't have time to work on it :-( so
hopefully someone else will take the time to study Cover & Thomas, then
parts of the book by Kullback, then parts of the books by Walters and
Petersen (for ergodic theory and entropy), then part of the book by Pesin
(for the generalized Caratheordory construction of entropies as
dimension--- I maintain that any "genuine" entropy is the fractal
dimension of an "interesting" set, but you'd have to read all this stuff
to know why), then the physics book I mentioned, esp. the articles by
Bennett and Caves, and also a look at the book by Folland on harmonic
analysis for some more connections between Shannonian entropy and harmonic
analysis, and then read say the most recent book by J. N. Kapur on maximal
entropy methods, plus his short article in which he derives all the
standard distributions of the classical (parametric) mathematical theory
statistics using the method of minimal discrimination (aka cross-entropy),
and also the Springer book on maximal entropy methods, and also read about
wavelets and connections with Shannonian information theory.

THEN one would have what I think is the right background for fixing up
Frieden's natural and probably good idea (the partial dictionary above),
where "fixing" means providing a dictionary. So that for instance one
should be able to translate Hilbert's derivation of the Einstein field
equation directly into a Fisher information theoretic formulation which
also derives the Einstein field equation. At the same time, in
constructing the dictionary, I think you'd be forced to find interesting
connections with Shannonian theory and Jayne's maximal entropy method. Of
course, a good starting point would be trying to construct the dictionary
for the specific examples given in the book by Landau and Lifschitz,
starting with the free particles.

I hope I have convinced at least some readers that something much better
and more interesting than what Frieden has apparently done so far remains
to be discovered!

ca31...@bestweb.net

unread,
Oct 25, 1999, 3:00:00 AM10/25/99
to
Here's a link to Shannon's "A Mathematical Theory of Communication"
http://cm.bell-labs.com/cm/ms/what/shannonday/paper.html
kindly posted by someone in another NG.

Barry Adams

unread,
Oct 26, 1999, 3:00:00 AM10/26/99
to
On Sun, 24 Oct 1999 17:25:37 GMT, Chris Hillman
<hil...@math.washington.edu> wrote:

>> Fisher say's k may be considered a constant for a particular physical
>> situation, because J's functional form depends on the physical
>> parameters of the scenario while I's does not.
>
>Hmm... does anyone else understand this?

Actually not even I do, his axioms are vigorous but his jestification
for them sometimes seems vague. If k was a field or variable
the approach would give very different answers, but it isn't clear
that some principle stops k from being variable.

>I -think- you are saying that Frieden defines local i(x), j(x). Indeed,
>this is analogous to one approach to (Shannonian) entropies in ergodic
>theory. I -think- you are saying that Frieden assumes that
>
> i(x) = k j(x) for some constant k, 0 =< k =< 1
>
>which implies, I guess, that I = k J, and finally he also assumes that the
>first variation of I-J (a negative quantity, right?) vanishes.

Yes exactly, glad you see you understood past my typo.

>Clearly, in an ideal world, I'd split into two copies, one of whom would
>rush out and study Frieden's book.

According to Julian Barbour, David Deutech and of course Huge
Everett you do!

> My strong impression from what you
>have said is that the basic idea of looking for an information theoretic
>analogue of Lagrangians, with Fisher information standing in for the
>quadratic "kinetic energy" stuff, is a good one, but that Frieden went
>-irretrievably wrong- when he read Brillouin instead of Bennett.

Could someone explain to me the differences between Brillouin and
Bennetts approaches.

Barry Adams


0 new messages