I'm curious who on the list is working on data mining right now and,
if so, what they are working on. Anyone care to share?
I thought that I would mention a specific instance of data-mining that
worked out well for me and my collaborator, Geoff Clayton. We were
searching for R CrB candidate lightcurves in the MACHO databases
containing LMC, SMC, and Bulge data. Needless to say, there was no
lightcurve flag that said "I'm a previously unrecognized R CrB star". So
our trick was to define statistics and narrow down the range of
combinations of such statistics to produce a list of candidate stars
which could then be examined in a finite amount of time by eye. Skewness
was one such statistic - many (but not all!) R CrB spend a lot of time
at maximum light, so non-maximum brightness points tend to fall on one
side of most of the points. That is also true of eclipsing binaries, of
course. So one needs to try combinations of statistics. They also need
to be quick to evaluate (or be worth the trouble of defining for all
stars once and for all!) Most eclipsing binaries would be shorter-period
and so would have many minima during our 8 years of data-taking. The
very long-period eclipsing binaries are interesting in their own right,
so we didn't mind finding those as R CrB "contaminants"!
This all resulted in papers like:
http://adsabs.harvard.edu/abs/2005AJ....130.2293Z
http://adsabs.harvard.edu/abs/2001ApJ...554..298A
http://adsabs.harvard.edu/abs/1996ApJ...470..583A
I would say that one of the current impediments to making better use of
existing stellar photometry databases is the lack of supplemental,
easily-defined, summary (or "aggregate") statistics such as skewness and
kurtosis. If the database is "live", these would need to be recalculated
every so often (although a yearly set of such indices could be easily
searched for changes, too!)
For folks who work with Postgresql, I have defined aggregate functions
(just modifying existing code for the standard deviation calculation!)
that I would be happy to provide to interested parties.
Finally, I would remark that one of the things I noticed with the MACHO
data was how easy it was to classify many stars simply by eye-balling
the lightcurve over the entire duration - that is, completely before
any time-series analysis. The way the points fill the space is quite
distinctive - you can tell a sawtooth curve from a sinsusoid, from a
"bump" pulsator - even from a multimode star. All this without any
time-consuming, time-series analysis. This, of course, is just a
restatement of my previous paragraphs saying that skewness and kurtosis
tells you a lot, especially combined with amplitude, color, ... and
numbers that likely *do* exist in the summary statistics already.
Enough for now!
Doug
Patrick
Nothing in particular really, just looking at lots and lots of light curves.
Patrick
If I may be so bold as to bring this thread back on track...
I am in the midst of taking a look at three stars - LX Cyg, W Dra, and
BH Cru - for fundamental period changes over time.
This is not groundbreaking research by any means. Matt Templeton has
exactly this sort of paper in the pipe for T UMi and he, Lee Ann
Willson, & Grant Foster published such a paper on RU Vul last year.
However, using the data for these stars in the AID, and such tools as
TS, WWZ and CLEANest, I am hoping to give myself some very much needed
experience in statistical data analysis, and make a tiny contribution
that will lead to something published.
About a year ago I attempted to help Aaron with the data analysis of his
study of BZ UMa superhumps. I did a good bit of work, but never found
the superhumps. Obviously Aaron did and so I'm also going through his
paper, and my analysis documentation, to see "where I went wrong."
I may post from time to time my progress as an example of a blind man
stumbling in the dark. So, if you see me walking around all bloody from
running into things, bandages would be appreciated. :-)
--
Richard "Doc" KINNE, BA, MSc., AMAAS
<kinnerc @ gmail . com> (sent from Linus)