What methods do we want in astropy.stats?

180 views
Skip to first unread message

Christoph Deil

unread,
Jan 8, 2013, 2:43:03 AM1/8/13
to astro...@googlegroups.com
Dear all,

the astropy.stats package is planned to hold statistical functions or algorithms used in astronomy and astropy.

At the moment it only contains one function: sigmaclip.

I would like to add some commonly used methods in X- and gamma-ray astronomy for Poisson count statistics.
Steve Crawford wants to add methods for robust statistics.

What other common astronomy statistical methods do we want (that are not already available in numpy, scipy, or maybe scikit-learn, scikit-image, statsmodels)?

Please add to the whishlist on this wiki page:
https://github.com/astropy/astropy/wiki/What-methods-do-we-want-in-astropy.stats%3F
Or simply reply here if you prefer.
Fell free to propose methods even if you don't have time to implement them yourself.

Once we know what we want, the next step will be an API proposal.

Best wishes,
Christoph

Erik Tollerud

unread,
Jan 9, 2013, 11:54:07 AM1/9/13
to astro...@googlegroups.com
I completely agree astropy.stats needs more fleshing out, and it's a
great idea to get a sense of what people would like to see there.

That said, it will be easy to go overboard with this - I don't think
we want to put a ton of stuff in here that's "niche"-specific. My
understanding of the intent for astropy.stats is that it be used for
reasonably general stuff - e.g., robust statistics used in astronomy
fit the bill perfectly, but specific techniques only used in X-ray or
gamma-ray astronomy may not be something we want to put in the astropy
core. These sort of things might well fit better in a "high-energy
astro" affiliated package or something, perhaps along with X-ray
spectrum classes or similar. That way it isn't chained to the astropy
core development cycle.

I'd still love to see this wiki page get used to "brainstorm," but
just keep in mind that we might not want it all inthe core when it
actually starts getting coded up.
--
Erik

Tom Aldcroft

unread,
Jan 9, 2013, 12:30:08 PM1/9/13
to astro...@googlegroups.com
On Wed, Jan 9, 2013 at 11:54 AM, Erik Tollerud <erik.t...@gmail.com> wrote:
> I completely agree astropy.stats needs more fleshing out, and it's a
> great idea to get a sense of what people would like to see there.
>
> That said, it will be easy to go overboard with this - I don't think
> we want to put a ton of stuff in here that's "niche"-specific. My
> understanding of the intent for astropy.stats is that it be used for
> reasonably general stuff - e.g., robust statistics used in astronomy
> fit the bill perfectly, but specific techniques only used in X-ray or
> gamma-ray astronomy may not be something we want to put in the astropy
> core. These sort of things might well fit better in a "high-energy
> astro" affiliated package or something, perhaps along with X-ray
> spectrum classes or similar. That way it isn't chained to the astropy
> core development cycle.

It's worth noting that the initial functions that Christoph proposed
are really just source significance and sensitivity in the Poisson (as
opposed to Gaussian) regime. They are not highly specialized, and
have been around for 30 years, so there is not a big worry about the
release cycle here. Certainly the language used in the initial API
does make them look like gamma-ray routines, but Christoph already
agreed that this would be changed. Poisson and low-count statistics
are obviously useful throughout astronomy, particularly in analyzing
sample results, so we should not consider them as niche-specific.

- Tom

Christoph Deil

unread,
Jan 10, 2013, 4:13:54 AM1/10/13
to astro...@googlegroups.com

On Jan 9, 2013, at 6:30 PM, Tom Aldcroft <aldc...@head.cfa.harvard.edu> wrote:

> On Wed, Jan 9, 2013 at 11:54 AM, Erik Tollerud <erik.t...@gmail.com> wrote:
>> I completely agree astropy.stats needs more fleshing out, and it's a
>> great idea to get a sense of what people would like to see there.
>>
>> That said, it will be easy to go overboard with this - I don't think
>> we want to put a ton of stuff in here that's "niche"-specific. My
>> understanding of the intent for astropy.stats is that it be used for
>> reasonably general stuff - e.g., robust statistics used in astronomy
>> fit the bill perfectly, but specific techniques only used in X-ray or
>> gamma-ray astronomy may not be something we want to put in the astropy
>> core.


Erik, I agree that some of the methods I mention for "Poisson count statistics" at
https://github.com/astropy/astropy/wiki/What-methods-do-we-want-in-astropy.stats%3F
might not be general enough for the astropy core (like Feldman&Cousins), but as Tom pointed out, some (probably are.
I think it will be easier to discuss what belongs in the core after I implement these functions,
they will all be standalone and easy to put in astropy.stats or an affiliated package.

I saw that you put principal component analysis (PCA) on the wish list for astropy.stats:
https://github.com/astropy/astropy/wiki/What-methods-do-we-want-in-astropy.stats%3F

This brings up the question if we want to duplicate methods that are already available in the other
scientific python packages. E.g. PCA is available in scikit-learn:
http://scikit-learn.org/stable/modules/decomposition.html#principal-component-analysis-pca
My feeling is that as long as PCA is not needed in other parts of the astropy core (is it?),
we should not duplicate such methods.
There's tons of great data analysis methods in scikit-learn, scikit-image, pandas, …
and users that want them can simply install these packages.

Overall I get the feeling there aren't that many astronomy-specific stats methods, e.g. most (all?) at
http://astroml.github.com/modules/classes.html
I wouldn't put in astropy.stats.

Question to non-photon-counting (e.g. radio, optical … i.e. not Poisson stats as for X-ray or gamma-ray) astronomers:
Are there some general methods you use to compute errors, significance, sensitivity, upper limits?
Can you implement them in one line using numpy and scipy.stats or would it be worth coding them up in astropy.stats?

Erik Tollerud

unread,
Jan 11, 2013, 5:33:39 PM1/11/13
to astro...@googlegroups.com
> Erik, I agree that some of the methods I mention for "Poisson count statistics" at
> https://github.com/astropy/astropy/wiki/What-methods-do-we-want-in-astropy.stats%3F
> might not be general enough for the astropy core (like Feldman&Cousins), but as Tom pointed out, some (probably are.
> I think it will be easier to discuss what belongs in the core after I implement these functions,
> they will all be standalone and easy to put in astropy.stats or an affiliated package.

Sounds good!

> I saw that you put principal component analysis (PCA) on the wish list for astropy.stats:
> https://github.com/astropy/astropy/wiki/What-methods-do-we-want-in-astropy.stats%3F
>
> This brings up the question if we want to duplicate methods that are already available in the other
> scientific python packages. E.g. PCA is available in scikit-learn:
> http://scikit-learn.org/stable/modules/decomposition.html#principal-component-analysis-pca
> My feeling is that as long as PCA is not needed in other parts of the astropy core (is it?),
> we should not duplicate such methods.
> There's tons of great data analysis methods in scikit-learn, scikit-image, pandas, …
> and users that want them can simply install these packages.
>
> Overall I get the feeling there aren't that many astronomy-specific stats methods, e.g. most (all?) at
> http://astroml.github.com/modules/classes.html
> I wouldn't put in astropy.stats.


I see your point that we don't want to be copying a lot of
functionality that we can find just as well elsewhere. (thanks for
pointing out the PCA stuff in scikits-learn... I hadn't seen that
until now.)

That said, I guess my attitude for astropy.stats is not just that the
*functionality* be astronomy-specific, but also the language and
conventions. That is, if I, as an average astronomer, look at a
dataset and say "I want to do PCA on this", I shouldn't necessarily
have to undertand how to install and use scikits-learn. But I admit
that argument could be made on just about anything, so maybe you're
right this shouldn't be included unless/until it's needed in other
parts of astropy.

Anyone else have thoughts on this? Is PCA familiar to enough
astronomers that it should be included here?


> Question to non-photon-counting (e.g. radio, optical … i.e. not Poisson stats as for X-ray or gamma-ray) astronomers:
> Are there some general methods you use to compute errors, significance, sensitivity, upper limits?
> Can you implement them in one line using numpy and scipy.stats or would it be worth coding them up in astropy.stats?

Arguably optical/NIR is still in the photon-counting regime, even if
we're sometimes too lazy to do it properly :)

There are some standard sensitivity/SNR calculations relevant for CCDs
that I imagine would be useful both in imaging and spectroscopic
situations. But it's not clear what goes in astropy.stats vs
photutils (or similar).

Also, there's completeness calculations pertaining to redshift
surveys. But a lot of that will probably use similar methods as the
X-ray/gamma Poisson stuff.



>> It's worth noting that the initial functions that Christoph proposed
>> are really just source significance and sensitivity in the Poisson (as
>> opposed to Gaussian) regime. They are not highly specialized, and
>> have been around for 30 years, so there is not a big worry about the
>> release cycle here. Certainly the language used in the initial API
>> does make them look like gamma-ray routines, but Christoph already
>> agreed that this would be changed. Poisson and low-count statistics
>> are obviously useful throughout astronomy, particularly in analyzing
>> sample results, so we should not consider them as niche-specific.

Fair enough - it may be fine once the code is all laid out. I just
wanted to point out that astropy.stats is an area where I could easily
see out-of-control feature creep if we aren't a bit cautious.



>>
>> - Tom
>>
>>>
>>> I'd still love to see this wiki page get used to "brainstorm," but
>>> just keep in mind that we might not want it all inthe core when it
>>> actually starts getting coded up.
>>>
>>> On Tue, Jan 8, 2013 at 2:43 AM, Christoph Deil
>>> <deil.ch...@googlemail.com> wrote:
>>>> Dear all,
>>>>
>>>> the astropy.stats package is planned to hold statistical functions or algorithms used in astronomy and astropy.
>>>>
>>>> At the moment it only contains one function: sigmaclip.
>>>>
>>>> I would like to add some commonly used methods in X- and gamma-ray astronomy for Poisson count statistics.
>>>> Steve Crawford wants to add methods for robust statistics.
>>>>
>>>> What other common astronomy statistical methods do we want (that are not already available in numpy, scipy, or maybe scikit-learn, scikit-image, statsmodels)?
>>>>
>>>> Please add to the whishlist on this wiki page:
>>>> https://github.com/astropy/astropy/wiki/What-methods-do-we-want-in-astropy.stats%3F
>>>> Or simply reply here if you prefer.
>>>> Fell free to propose methods even if you don't have time to implement them yourself.
>>>>
>>>> Once we know what we want, the next step will be an API proposal.
>>>>
>>>> Best wishes,
>>>> Christoph
>>>
>>>
>>>
>>> --
>>> Erik
>



--
Erik

Steve Crawford

unread,
Jan 11, 2013, 7:19:13 PM1/11/13
to astro...@googlegroups.com

It's always best to list everything now.   At the very least, people can then sort them out and decide what to include or not include and also as  a resource where other things might live.     It might be good to have some very common things listed in a manner to indicate that they exist elsewhere and that people should look for them there if they want to use them.  

I'm not sure if this is where we want to include something like commonly used fitting functions (ie Schechter function) that are also used as well as statistical tests.   

Christoph Deil

unread,
Jan 12, 2013, 6:30:54 AM1/12/13
to astro...@googlegroups.com
On Jan 12, 2013, at 1:19 AM, Steve Crawford <craw...@saao.ac.za> wrote:


It's always best to list everything now.   At the very least, people can then sort them out and decide what to include or not include and also as  a resource where other things might live.     It might be good to have some very common things listed in a manner to indicate that they exist elsewhere and that people should look for them there if they want to use them.  

I'm not sure if this is where we want to include something like commonly used fitting functions (ie Schechter function) that are also used as well as statistical tests.   

Common fitting functions (=models, e.g. polynomial, powerlaw, Gaussian) should go in astropy.models:
https://github.com/astropy/astropy/pull/493 Models sub package by Nadia

Domain-specific models should go in affiliated packages (e.g. XSpec models).
(Side note: for my own work I need a package for broad-band synchrotron and Inverse Compton emission models as well as Pion-decay from populations of cosmic rays interacting with magnetic fields and gas. If someone on this list has code for this and would like to start an affiliated package, I'd be very interested to help make this happen.)

Sherpa has the Schechter function in the "common" section at  http://cxc.harvard.edu/sherpa/models/index.html
but which models are common enough to go into astropy.models is a discussion for another day, it doesn't concern this thread on astropy.stats.

Steve, I'm not sure what exactly what statistical tests methods you have in mind?

I guess computing a significance using the Li & Ma formula or functions like scipy.stats.f_value or scipy.stats.chisqprob could fall in that category.
These I would say belong in astropy.stats, so if there is something that is not available in scipy.stats or already mentioned on 
please add specific things that might be useful for astronomers.

But computing an error or 90% confidence interval or parameter limit for a model fitted to data could also be considered a statistical test, right?
In that case the implementation (like e.g. profile likelihood) is typically a complex iterative optimisation algorithm, such as e.g. MINOS or Rolke:
In that case, where an optimiser has to be called, I'm not sure we want this in scipy.stats, I propose we discuss this particular issue here:
Do you or anyone have a reference or maybe even a link to an implementation?
We can sort out what belongs in astropy.stats or photutils later, please put it on the wish list for now:


Also, there's completeness calculations pertaining to redshift
surveys.  But a lot of that will probably use similar methods as the
X-ray/gamma Poisson stuff.

Again, please add a bullet and a link on the wish list for now.




>> It's worth noting that the initial functions that Christoph proposed
>> are really just source significance and sensitivity in the Poisson (as
>> opposed to Gaussian) regime.  They are not highly specialized, and
>> have been around for 30 years, so there is not a big worry about the
>> release cycle here.  Certainly the language used in the initial API
>> does make them look like gamma-ray routines, but Christoph already
>> agreed that this would be changed.  Poisson and low-count statistics
>> are obviously useful throughout astronomy, particularly in analyzing
>> sample results, so we should not consider them as niche-specific.

Fair enough - it may be fine once the code is all laid out.  I just
wanted to point out that astropy.stats is an area where I could easily
see out-of-control feature creep if we aren't a bit cautious.

Yes.
I'll update the wiki page to reflect this discussion and stress this point.

Steve Crawford

unread,
Jan 12, 2013, 7:59:43 PM1/12/13
to astro...@googlegroups.com
> Steve, I'm not sure what exactly what statistical tests methods you
> have in mind?

I've added some stuff.  My feeling is we should probably create a list of commonly used methods that astronomers use (and just note if they are done somewhere else).   We can then decide what we want actually want or need in astropy, but then we at least have a very useful resource for how to do statistics in python. 

As for duplication, I think it has to be made on a case by case basis, but in addition, we do want tools that are commonly used by astronomers in this package so that it is a little bit of one stop shopping.   But I think there will be always a trade off between what is domain specific and what should be really going into scipy or other stats packages.  My feeling is that if other people can make use of it outside of your domain, it would be useful to put here.   In addition, I think if it is something that won't cause too much bloat, it isn't a problem having a duplicate.   We don't know what will continue to be supported and what won't be (although we can take some educated guesses). 

However, I think the list we are creating is probably a great resource not only for the package but for astronomers interested in using python for statistics.   I've tried to add a few more pages as well, but I'd encourage people to add things that are  missing.

Demitri Muna

unread,
Jan 13, 2013, 1:27:18 PM1/13/13
to astro...@googlegroups.com

On Jan 7, 2013, at 11:43 PM, Christoph Deil <deil.ch...@googlemail.com> wrote:

What other common astronomy statistical methods do we want (that are not already available in numpy, scipy, or maybe scikit-learn, scikit-image, statsmodels)?

I would strongly suggest that this question be raised on the (user) astropy mailing list.

Cheers,
Demitri


_________________________________________
Demitri Muna

Center for Cosmology and Particle Physics
New York University








Erik Tollerud

unread,
Jan 14, 2013, 4:55:18 AM1/14/13
to astro...@googlegroups.com
+1 to demitri's suggestion. The larger the audience the better, for this one.
--
Erik

Christoph Deil

unread,
Jan 15, 2013, 2:48:06 AM1/15/13
to astro...@googlegroups.com
I've posted this question on the astropy user mailing list:
Reply all
Reply to author
Forward
0 new messages