Newsgroups: sci.stat.math, sci.stat.consult
From: cl...@tukey.Stanford.EDU ( )
Date: Tue, 14 Dec 93 02:04:39 GMT
Local: Mon, Dec 13 1993 9:04 pm
Subject: Re: Density Estimation
Herman Rubin writes: Really? I can't speak for those I don't know, but as far as my department >In article <1993Dec8.022208.9...@EE.Stanford.EDU> cl...@tukey.Stanford.EDU ( ) writes: >>Herman Rubin writes: >>And what do armed nuclear weapons have to do with density estimation? >The intent was to indicate both the magnitude of the danger and the at Bell Labs is concerned, this statement is plainly wrong, and there are several of us here who have worked in the field of local fitting. Our work has been almost entirely motivated by real engineering and manufacturing problems for which other existing methods have been found to be unsatisfactory. >>Sorry, but these are just cheap cough outs. It is YOUR responsibility, So, someone walks into your office with a dataset, and says `I want to >>as the one promoting this parametric approach, to provide the tools and >>methods to convert the vague ideas one has in practice into formal loss >>functions, sensible models and prior distributions for the parameters. >These are not copouts, any more than it is a copout if a drug company estimate the density'. You say `Go away, come up with a large parametric model, an estimate (from no data) of the prior density on the parameter space, and a global loss function, then come back'. For the sake of the story, assume the client comes back with the requested Is this really the complete role you see for a statistician? >>As I've said in previous posts, coming up with sensible choices is A question, and a reasonable one at that! A welcome change from your usual >>unreasonable in many applications; no single loss function can adequately >>capture the performance of a function estimate. Local smoothness of an >>unknown density is often a reasonable assumption to make. Converting this >>into a global parametric model and prior distribution is difficult. If you >>wish to promote a large scale parametric approach, this is YOUR problem. >What is local smoothness? There are hundreds of ways of expressing it. attempts to portray posters as stupid and describing others' work as `just plain bad'. By local smoothness, I mean existence of a good polynomial approximation There is of course no reason to restrict ourselves to polynomial approximations; Under the most widely studied asymptotics (bandwidth -> 0) the analagous >In most "real world" density situations, the density is infinitely Questionable, and I wouldn't place too much faith in estimators that rely >differentiable, usually even analytic. on this assumption. Probably not very relevant to our discussion. > This by itself is useless for I have read hundreds of papers in density estimation and related fields >inference. If you would read the literature, you would know that I >have suggested classes of infinte-parametric models--the prior distribution >problem is another matter indeed, but there are some negative results. over the past few years, and would be glad to add yours if only you would provide the references. I have been asking for specific details of your methods throughout this thread, and requested you to send (p)reprints over two weeks ago. Until you provide references, you can hardly direct others to read the >These negative results are of some importance, in that they point out `they should'. Good scientific argument. I also claim my procedures beat >that certain types of simplifications do not work. I also have some >preliminary results on robust procedures here; they should beat any >of the kernel procedures. kernel procedures, particularly with regard to moderate tail performance. >>In the real world, we don't get to brush aside the difficult parts as Prior distributions, and (at least for some problems) models and loss >>`someone else's problem'; `will be done in future work' and all the similar >>excuses used all too frequently by some in academics. >As I said before, the pharmaceutical company cannot solve the problem of functions are mathematical abstractions of little direct interest to anyone other than the statistician. It is the job of the statistician to become sufficiently involved and knowledgeable about the problem, so that informed decisions can be made about such items as needed, and if informed decisions cannot be made, then to try other approaches. I have no idea as to how to construct meaningful prior densities in possibly hundreds or thousands of dimensions; until you can provide some guidance as to how I should do so, your approach (or any other that requires me to do so) can at best be considered of passing theoretical interest. Leaving a biologist or engineer to make these decisions would in my view >>Using function+derivative at 3 or 4 points in each dimension gives as In numerical analysis one is usually interested in far higher degrees >>much or more information than just the function on 5 or 6 points. Poor >>specification? Maybe, but you will need at least as many parameters as >>I need points to get a reasonable specification. >You might be able to get one decimal place that way, but I doubt it. of accuracy than in statistics. Since 9*4^8=589824, the original 100000 data points can't tell us much more. And your trying to model the situation with "a 1000-parameter model, or even 10000", will provide about as much information as I get from a 2^8 grid of points. Less, if I use local quadratic fitting. I will also point out that I have never claimed my procedures (or anyone >>If your global fitting involves global optimization (as do all global If the problems are `not that difficult', why are you having so much >>procedures I'm aware of; you have provided no evidence to suggest otherwise) >>then a large number of parameters makes the computational burden absolutely >>untenable. >This is exactly what I dispute. Numerical methods are not that difficult difficulty producing anything specific? What special features of your procedure enable significant speedup over standard methods of global optimization? Local fitting is linear in the number of points fitted directly, and therefore >>Finally, if the user wishes to visualize the density estimate, one can Oh. You're resorting to calling other people's work `crude methods' again. >>only look at lower dimensional cross sections and/or projections. With >>a local approach, one just computes the views of most interest. With a >>global approach, you get nothing until you've fitted everything. >It depends on what one wants to visualize. Possibly you want those; I Say something intelligent. Show me precisely in what ways your methods are better than mine in the moderate tails. To obtain good estimates in the tails, it is surely obvious one needs some form of localization to prevent bias being induced by incorrect specification elsewhere? >>These issues have been thrashed out in detail in the literature, as you The best place to start computing local fits is the paper by Cleveland >>would quickly realize if you bothered to read any of the references I've >>given. >Engineering literature or numerical analysis literature? and Grosse. This was in the statistics literature. Do I sense a problem? >>>I do not have enough information about the original problem to produce So what are you saying, apart from spouting off as to how you think you are >>>an algorithm. It is questionable whether there even IS enough information >>>to produce a simple algorithm. >>Then what is the basis for your claims of computational superiority? >The knowledge of problems for which plugging into computer packages does so much smarter than everyone else? Anyone who wants to estimate a density has to write their own code from scratch? Are we allowed to use high level languages, or does this have to be done in machine code? And how does this answer the question: You have still not provided a scrap of evidence that you have ever implemented >When someone asks for an 8-dimensional object, the assumptions are already As opposed to failing to visualize the fit at all, which will miss just about >of great importance. Judging from the little stated in the problem, the >low-dimensional sections are likely to miss the important points. everything. You included early in your first post in this topic: >In my opinion, this is not even the case for density Now, stop trying to run for cover behind the 8 dimensional problem, and >estimation in *1* dimension; there are ad hoc procedures, and for some of >them their properties have been studied, but none from a good analysis of >the problem. start trying to justify your claims. You certainly haven't shown your approach offers anything special for the 8-d problem, or any other problem. >>Statistics in fact has very little to do with choosing between models. Hey! We agree on something. >>In industry, the problems are to improve quality and productivity. >>Complicated problems, no simple model and considerable advances have >>been brought about through data analysis; often with no more than some simple >>data visualization. Saying this is more likely to do damage than to bring >>understanding simply falls flat in the face of evidence. >If one is doing quality control, or operating a factory, that is the case. >If one is carrying out a scientific investigation, this is not the case. Local fitting does not restrict one to using local polynomials. In some >In a moderately well-tuned production facility, one may be looking at >small perturbations, and low degree polynomials can be good fits, or >it may be the case that most variables do not matter, or one of a >number of other low-dimensional alternatives. of my applications, I don't use local polynomials. There are procedures for both global and local variable selection; procedures such as MARS do the latter in the regression problem. There are semiparametric approaches and conditionally parametric approaches for use where some variables are more important than others; one can also smooth with different bandwidths in different directions. You are hardly raising anything original here. > In higher dimensional And others propose using "a 1000-parameter model, or even 10000", which is >problems, linear approximations are often used over ranges where linearity >is overly restrictive, or other methods claimed to be robust, but which are >only robust under massive assumptions, usually unstated, are produced. inadequate except under massive assumptions. Often in high dimensions there's insufficient data for anything more than a coarse fit. You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
| ||||||||||||||