Reproducible Research, Modern Workflows, and Other Best Practices

75 views
Skip to first unread message

humphr...@gmail.com

unread,
Sep 8, 2014, 11:14:48 AM9/8/14
to ismir2014-unco...@ismir.net
So ... there's a lot here, but I'll try to keep this concise.

In nearing the end of my dissertation / doctoral studies, I'm in that interesting position where one inevitably looks back and says "man, what do I wish I didn't have to learn the hard way?" It turns out that this list is already heart-breakingly long, and mostly consists of those three points named in the title:
  • Reproducible research -- shared data, shared code, distributed collaboration, planning for project lifespans, software reuse / reusability, data archival, etc.
  • Modern workflows, tools, and technologies -- how does one train an online learning algorithm with more data than fits in memory, hm??
  • Other best practices -- code reviews, unit tests, and all that stuff "real" developers do
These things are obvious in once you know them, but it's perspective I didn't have coming from an electrical engineering background; and, given the interdisciplinary nature of the ISMIR community, I'm certain many others are or were in the same position. There are a lot of methods we (and I) should be doing that we just don't know about, and I would love to leverage the wisdom of the crowd to fix this.

Simply put, there is a growing body of MIR research that consists of two different skill sets --scientific rigor and software development-- and I would absolutely love to compile all the know-how about the latter in one place. It could serve as a welcome guide / manual for MIR researchers, regardless of academic status, and would be an invaluable resource to pass on and update over time.

Me-from-five-years-ago wants this desperately. I'm sure past-you does too.

dan....@gmail.com

unread,
Sep 8, 2014, 10:00:34 PM9/8/14
to ismir2014-unco...@ismir.net
The thing that really strikes me is that this is true of so many different fields now: it's not just MIR where you develop and test ideas by writing software and crunching data.

Although I've been doing research for several decades, I only started seriously using source control within the past year (I too have an EE background).  I think our current graduate education system is doing a very poor job at delivering this absolutely crucial combination of skills -- i.e., the scientific method combined with professional-quality software development.  I think a great many questionable results have been published (including some I have intimate knowledge of...), but now we no longer have the excuse that we don't know better: the tools are out there.

I think git + software design + testing/validation should be a necessary research methods course for almost everyone I can think of.  Brian McFee and I tried to run such a course at Columbia last spring, but I'm still not sure how to get the key points across - they make so much more sense *after* you've experienced all the pitfalls first hand.

  DAn.

Eric Humphrey

unread,
Sep 9, 2014, 10:10:30 AM9/9/14
to ismir2014-unco...@ismir.net
Hm, that final comment is the real catch, huh .... perhaps a strategy could be to design an exercise (or several) where students will struggle / suffer in a controlled environment, and can quickly be shown a / the remedy? Project one, work together without version control / other tools; project two, now use these things; compare and contrast.

Failure and struggle is certainly a great way to learn, the trick is to cultivate a time and place where risk / cost is minimized, i.e. we shouldn't be figuring this out on the job, so to say.

mattjame...@gmail.com

unread,
Sep 10, 2014, 10:34:07 PM9/10/14
to ismir2014-unco...@ismir.net
Hi - excited to be part of the "unconference"!

Way back in NEMISIG 2013 (https://files.nyu.edu/onc202/public/NEMISIG13/) there was discussion of promoting reproducible research by awarding 'stars' or something similar to accepted papers which also published code. I really liked this idea. It could also be weaker or stronger than this, ranging from merely kudos or special mentions in the proceedings to certificates and cash awards (budget permitting). Stronger still, we could more strongly emphasize reproducibility in the review process, but I'm personally against this as some research institutions do not permit the sharing of code.

Matt 
Message has been deleted

Eric Humphrey

unread,
Oct 3, 2014, 12:20:14 PM10/3/14
to ismir2014-unco...@ismir.net
fwiw, this twitter conversation is relevant too: https://twitter.com/julian_urbano/status/518049834344148992

(apologies for not paying attention to which account this pulled up as...)
Reply all
Reply to author
Forward
0 new messages