Yes, there's a good part of my code I wouldn't have written had I been
aware of astroML earlier.
> It sounds like you looked at the second one already. My thought on the
> matter is that the Bayesian approach is very nuanced, and you must make sure
> your model fits what you know about the data (for example, what distribution
> do you expect outliers to be drawn from?) For this reason, the approach
> we've taken is to give examples of approaches on toy data rather than to
> provide a python function that does it automatically. Any such function
> would be very likely to be misused.
I'm in general opposed to the philosophy that "likely to be misused"
means you shouldn't distribute (or simplify) code. I like your
examples well enough, though, and can probably use them without much
difficulty.
However, on this specific point - identifying outliers - is there a
more general approach? The two approaches you illustrate from Hogg's
paper are both good & powerful, but is there any reason you couldn't
identify outliers with an uninformative prior, under the assertion
that "outlier" means "probably not described by the model"?
So instead of the Bernoulli prior, you'd just start with a
DiscreteUniform distribution that only allows two variables. And
perhaps set the likelihood such that if the variable is marked
"Outlier", its likelihood is replaced with the likelihood of another
data point sampled from the observations... basically, this becomes a
filter to remove the least-likely data points. That's a rather
complicated set up, though, so perhaps the answer is 'no, there's
nothing more general'.
My problem was dealing with outliers in a set of data where the parent
distribution of the outliers is very poorly understood.
--
Adam