How do I get from here to there?

29 views
Skip to first unread message

Kevin Nuckolls

unread,
Oct 4, 2010, 3:07:01 PM10/4/10
to incanter
I've been lurking reading this group for a while now, mostly learning by osmosis. While this discussion does not strictly deal with Incanter I believe it is germane to the developers and researchers that form it's community.

I was struck by this post[1] that I saw earlier today. It certainly summed up my personal feelings on the matter of where I need to go as far as my professional development is concerned. My colleagues and I certainly fall within the "Danger Zone" as far as data science is concerned. I'm competent with SQL, R, and Hadoop well enough to perform the analysis and data transformation that is needed to support our current products. FWIW, we have a decent amount of spatial data that mostly gets exposed in the form of online slippy maps. Business has been rolling in and we've been dealing with more and more scaling and optimization work. I enjoy that kind of work a great deal, but long term I certainly feel like I would be happiest doing more exploratory data analysis. 

What bothers me is the nagging feeling that there are gems of information in our datasets that we should certainly be capitalizing on; trends, demographic analysis, things like that. There's no doubt that we could hire a statistician to help perform this kind of analysis, but would someone like that be able to gel with my development team? The real ideal would be to find someone, or become someone who can wear both hats. 

So really the question becomes, how do I get from here to there? Graduate school seems like the most obvious choice. Topology, Machine Learning, Statistics, Time-series analysis, Econometrics. These all appear to be things that require a little more rigor to fully comprehend. Certainly most of them don't seem like things you might pick up on by osmosis in the workplace. Are there any specific graduate programs that you would suggest I look into? To prepare for such programs, are there any books that would be a good starting place for a someone with just a math minor?

Thanks,
-Kevin



David Edgar Liebke

unread,
Oct 4, 2010, 3:16:29 PM10/4/10
to inca...@googlegroups.com
Not a cheap option, but one I've been looking into is Stanford's Data
Mining and Applications online program:

http://scpd.stanford.edu/public/category/courseCategoryCertificateProfile.do?method=load&certificateId=1209602#searchResults

You can take classes taught by Tibshirani and Friedman, authors of one
of my favorite books, Elements of Statistical Learning:

http://www-stat.stanford.edu/~tibs/ElemStatLearn/

David

Kevin Nuckolls

unread,
Oct 4, 2010, 5:23:17 PM10/4/10
to Incanter
Both of those are excellent resources. The thought of completing a
certificate is a lot less daunting than an masters degree. Is Stanford
the go to place for statistics / applied mathematics these days or are
there other programs that are well renowned in statistical circles?

On Oct 4, 2:16 pm, David Edgar Liebke <lie...@gmail.com> wrote:
> Not a cheap option, but one I've been looking into is Stanford's Data
> Mining and Applications online program:
>
> http://scpd.stanford.edu/public/category/courseCategoryCertificatePro...

Ahmed Fasih

unread,
Oct 6, 2010, 11:20:42 AM10/6/10
to Incanter
Hiya.

The worst thing to do is post something that's uninformative
(technical term :), but I wanted to tack on my own question to this
community.

David's approach, of immigrating to the schools where the authors of
your favorite papers and books are, is the way to go. There are plenty
of such folks that are made famous by videolectures.net and iTunes U
that I'm aware of: Michael Jordan (lot of nonparametric Bayesian
statistics, Berkeley), Larry Carin (ditto, lots of applied govt work,
Duke), David Blei (ditto, inventor of LDA for document analysis, now
at Princeton), Alan Willsky and John Fisher (statistical signal
processing and statistical inference, MIT), as well as numerous others
who have made major contributions in their own practical fields
(radar, finance, geology, etc.).

But straight grad school is a very risky proposition, and as you say,
a certificate would be much more attractive to a hacker practitioner
with domain knowledge than taking real analysis classes while finding
a chunk of knowledge for your thesis/dissertation. We sometimes forget
that the primary task of professors who take on graduate students is
to create the next generation of professors for the academy, so
seeking a PhD usually increases the odds of getting into any of these
world-renowned programs.

Stanford is making a great entrepreneurial gamble by offering
certifications, and the fact that David is considering it is very
encouraging (I'm getting a PhD in statistical signal processing, and I
don't know how straight data mining works). I'd be curious to know if
anyone else is offering such programs, or if there are grad programs
that focus on practical applications of statistical learning (since
the standard formula for a stats degree in most schools is lots of
classic math and stats, with a couple of applied/programming courses
in R or Matlab---forget Hadoop/Mahout at these).

Which brings me to my question. How does someone from the "traditional
research" section (math & domain expertise), with a good handle on
basic programming of large, complicated algorithms (e.g., in Matlab or
Python, maybe Clojure :P), get to the next level of hacker skill, of
cloud/cluster computing for complicated linear algebra/mathy
algorithms on large datasets with Hadoop, or however? It seems the
barriers are somewhat high (my game plan, from studying Incanter, is:
step 1, learn Java), but I'm perfectly willing to be told they're not,
in which case I'll redouble my efforts to learn how to do so.

Thanks, and best of luck, and apologies for my off-topic-ness.
Ahmed
Reply all
Reply to author
Forward
0 new messages