The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Message from discussion Density Estimation

From:
To:
Cc:
Followup To:
Subject:
 Validation: For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon.

More options Dec 14 1993, 7:08 am
Newsgroups: sci.stat.math, sci.stat.consult
From: cl...@tukey.Stanford.EDU ( )
Date: Tue, 14 Dec 93 02:04:39 GMT
Local: Mon, Dec 13 1993 9:04 pm
Subject: Re: Density Estimation

Herman Rubin writes:
>In article <1993Dec8.022208.9...@EE.Stanford.EDU> cl...@tukey.Stanford.EDU ( ) writes:
>>Herman Rubin writes:

>>And what do armed nuclear weapons have to do with density estimation?

>The intent was to indicate both the magnitude of the danger and the
>seriousness of the problem.  Knowing the mean and variance of the
>distribution of X may or may not be of value, and the one who just
>plugs into a data analysis program is unlikely to know the difference.
>Density estimation is not of much value unless it is going to be used
>for something.  And in this field, very little has been done by looking
>at the PROBLEM; finding a solution is much closer to vacuous theory
>than facing the real problem.

Really? I can't speak for those I don't know, but as far as my department
at Bell Labs is concerned, this statement is plainly wrong, and there are
several of us here who have worked in the field of local fitting. Our work
has been almost entirely motivated by real engineering and manufacturing
problems for which other existing methods have been found to be unsatisfactory.

>>Sorry, but these are just cheap cough outs. It is YOUR responsibility,
>>as the one promoting this parametric approach, to provide the tools and
>>methods to convert the vague ideas one has in practice into formal loss
>>functions, sensible models and prior distributions for the parameters.

>These are not copouts, any more than it is a copout if a drug company
>does not know how to come up with a quick cure for AIDS.

So, someone walks into your office with a dataset, and says I want to
estimate the density'. You say Go away, come up with a large parametric
model, an estimate (from no data) of the prior density on the parameter
space, and a global loss function, then come back'.

For the sake of the story, assume the client comes back with the requested
items a few days later. You say The estimate is argmin E_{\pi} L(\hat f,f).
If you wish to compute that, consult a numerical analyst'.

Is this really the complete role you see for a statistician?

>>As I've said in previous posts, coming up with sensible choices is
>>unreasonable in many applications; no single loss function can adequately
>>capture the performance of a function estimate. Local smoothness of an
>>unknown density is often a reasonable assumption to make. Converting this
>>into a global parametric model and prior distribution is difficult. If you
>>wish to promote a large scale parametric approach, this is YOUR problem.

>What is local smoothness?  There are hundreds of ways of expressing it.

A question, and a reasonable one at that! A welcome change from your usual
attempts to portray posters as stupid and describing others' work as just

By local smoothness, I mean existence of a good polynomial approximation
over the smoothing window, which can be measured in a number of ways; for
example, by maximum error of the approximation. I work with log(density)
rather than the density. Formal specification of bounds allows us to,
for example, study what type of structures will be missed through systematic
bias of the estimate, and can be used as a basis for deriving optimal'
bandwidths. I would never expect a user to specify such a bound (any more
than I would expect them to specify priors or global loss functions).
I will use their questions of interest to determine reasonable bandwidths
for the problem, and at least initially try several fits to show the
difference. Density Estimation is not about plugging in global priors,
loss functions or bandwidths and taking the first answer that comes out.
It is about finding a reasonable solution to real problems.

There is of course no reason to restrict ourselves to polynomial approximations;
if there is prior information about singularities or heavy tails and estimating
these features is of interest, then other forms of local approximation may be
appropriate. In some applications I am fortunate enough to have such
information, and use it. In the vast majority of applications, I do not have
such information, and in such cases a local polynomial approximation will
be just as useful as any other.

Under the most widely studied asymptotics (bandwidth -> 0) the analagous
assumptions are existence of derivatives of appropriate order, with perhaps
bounded or continuity conditions depending on the precise results being
proved.

>In most "real world" density situations, the density is infinitely
>differentiable, usually even analytic.

Questionable, and I wouldn't place too much faith in estimators that rely
on this assumption. Probably not very relevant to our discussion.

>  This by itself is useless for
>inference.  If you would read the literature, you would know that I
>have suggested classes of infinte-parametric models--the prior distribution
>problem is another matter indeed, but there are some negative results.

I have read hundreds of papers in density estimation and related fields
over the past few years, and would be glad to add yours if only you would
provide the references. I have been asking for specific details of your
methods throughout this thread, and requested you to send (p)reprints over
two weeks ago.

Until you provide references, you can hardly direct others to read the
literature.

>These negative results are of some importance, in that they point out
>that certain types of simplifications do not work.  I also have some
>preliminary results on robust procedures here; they should beat any
>of the kernel procedures.

they should'. Good scientific argument. I also claim my procedures beat
kernel procedures, particularly with regard to moderate tail performance.

>>In the real world, we don't get to brush aside the difficult parts as
>>someone else's problem'; will be done in future work' and all the similar
>>excuses used all too frequently by some in academics.

>As I said before, the pharmaceutical company cannot solve the problem of
>treating the disease without input from the biologist.  The term "panacea"
>is the alchemical term for the compound which could.  The statistician
>is no more capable of finding this than the alchemist was, and for much
>the same reason.

Prior distributions, and (at least for some problems) models and loss
functions are mathematical abstractions of little direct interest to
anyone other than the statistician. It is the job of the statistician
to become sufficiently involved and knowledgeable about the problem, so
that informed decisions can be made about such items as needed, and if
informed decisions cannot be made, then to try other approaches. I have
no idea as to how to construct meaningful prior densities in possibly
hundreds or thousands of dimensions; until you can provide some guidance
as to how I should do so, your approach (or any other that requires me
to do so) can at best be considered of passing theoretical interest.

Leaving a biologist or engineer to make these decisions would in my view
make for a singularly bad statistician.

>>Using function+derivative at 3 or 4 points in each dimension gives as
>>much or more information than just the function on 5 or 6 points. Poor
>>specification? Maybe, but you will need at least as many parameters as
>>I need points to get a reasonable specification.

>You might be able to get one decimal place that way, but I doubt it.
>But even 4 point in each of 8 dimensions, function and derivative,
>will give more than 100,000 numbers, and I have seen nothing in the
>numerical analysis literature which indicates that even 3-dimensional
>interpolation is a reasonable problem.  In one dimension, both high
>degree and many points are relatively cheap, but not in more.

In numerical analysis one is usually interested in far higher degrees
of accuracy than in statistics. Since 9*4^8=589824, the original 100000
data points can't tell us much more. And your trying to model the situation
with "a 1000-parameter model, or even 10000", will provide about as much
information as I get from a 2^8 grid of points. Less, if I use local

I will also point out that I have never claimed my procedures (or anyone
else's) will do anything spectacular for the original problem; I never
even responded to that post. At best, one might find well separated
clusters or heavy skewness. I responded, Herman, to your ignorant
criticisms of the whole field of density estimation, which was not
this problem is just running for cover, of which you will find very little.

>>If your global fitting involves global optimization (as do all global
>>procedures I'm aware of; you have provided no evidence to suggest otherwise)
>>then a large number of parameters makes the computational burden absolutely
>>untenable.

>This is exactly what I dispute.  Numerical methods are not that difficult
>if one understands the problem, but it is an art, often not easily done in
>a package.

If the problems are not that difficult', why are you having so much
difficulty producing anything specific? What special features of your
procedure enable significant speedup over standard methods of global
optimization?

Local fitting is linear in the number of points fitted directly, and therefore
linear in the number of parameters. Do you have global optimization procedures
that are better than linear in the number of parameters?

>>Finally, if the user wishes to visualize the density estimate, one can
>>only look at lower dimensional cross sections and/or projections. With
>>a local approach, one just computes the views of most interest. With a
>>global approach, you get nothing until you've fitted everything.

>It depends on what one wants to visualize.  Possibly you want those; I
>have encountered situations where what one wants is the behavior in
>the moderate tails, and this is certainly not possible by crude methods.

Oh. You're resorting to calling other people's work crude methods' again.
Say something intelligent. Show me precisely in what ways your methods are
better than mine in the moderate tails. To obtain good estimates in the tails,
it is surely obvious one needs some form of localization to prevent bias
being induced by incorrect specification elsewhere?

>>These issues have been thrashed out in detail in the literature, as you
>>would quickly realize if you bothered to read any of the references I've
>>given.

>Engineering literature or numerical analysis literature?

The best place to start computing local fits is the paper by Cleveland
and Grosse. This was in the statistics literature. Do I sense a problem?

>>>I do not have enough information about the original problem to produce
>>>an algorithm.  It is questionable whether there even IS enough information
>>>to produce a simple algorithm.

>>Then what is the basis for your claims of computational superiority?
>>Your claims for superiority of your approach and abuse of existing methods
>>for density estimation have stretched far beyond the original 8 dimensional
>>problem; I fail to see why you are now trying to run for cover behind this
>>problem.

>The knowledge of problems for which plugging into computer packages does
>not work, plus a modicum of knowledge of numerical methods, which is not
>covered by the standard texts.

So what are you saying, apart from spouting off as to how you think you are
so much smarter than everyone else? Anyone who wants to estimate a density
has to write their own code from scratch? Are we allowed to use high level
languages, or does this have to be done in machine code?

And how does this answer the question:
"Then what is the basis for your claims of computational superiority?"

You have still not provided a scrap of evidence that you have ever implemented

>of great importance.  Judging from the little stated in the problem, the
>low-dimensional sections are likely to miss the important points.

As opposed to failing to visualize the fit at all, which will miss just about
everything.

You included early in your first post in this topic:

>In my opinion, this is not even the case for density
>estimation in *1* dimension; there are ad hoc procedures, and for some of
>them their properties have been studied, but none from a good analysis of
>the problem.

Now, stop trying to run for cover behind the 8 dimensional problem, and
start trying to justify your claims. You certainly haven't shown your
approach offers anything special for the 8-d problem, or any other problem.

>>Statistics in fact has very little to do with choosing between models.
>>In industry, the problems are to improve quality and productivity.
>>Complicated problems, no simple model and considerable advances have
>>been brought about through data analysis; often with no more than some simple
>>data visualization. Saying this is more likely to do damage than to bring
>>understanding simply falls flat in the face of evidence.

>If one is doing quality control, or operating a factory, that is the case.

Hey! We agree on something.

>If one is carrying out a scientific investigation, this is not the case.
>In a moderately well-tuned production facility, one may be looking at
>small perturbations, and low degree polynomials can be good fits, or
>it may be the case that most variables do not matter, or one of a
>number of other low-dimensional alternatives.

Local fitting does not restrict one to using local polynomials. In some
of my applications, I don't use local polynomials. There are procedures
for both global and local variable selection; procedures such as MARS do
the latter in the regression problem. There are semiparametric approaches and
conditionally parametric approaches for use where some variables are more
important than others; one can also smooth with different bandwidths in
different directions. You are hardly raising anything original here.

>  In higher dimensional
>problems, linear approximations are often used over ranges where linearity
>is overly restrictive, or other methods claimed to be robust, but which are
>only robust under massive assumptions, usually unstated, are produced.

And others propose using "a 1000-parameter model, or even 10000", which is
inadequate except under massive assumptions. Often in high dimensions
there's insufficient data for anything more than a coarse fit.