I think (the lack of) interactive computation is one of the big
shortcomings of journal articles -- articles which use data to draw
a conclusion, anyway.
The NSF and others are pouring a lot of money into projects designed
to preserve scientific data. They seem to be unaware of the shortcomings
of the current approach. That approach is to host data on a centralized
website, with a DOI or some other unique identifier, in a specified
format, referenced from the published paper. Everything is perfect
because the reader can see where to get the data, download it, and then
examine it using their preferred tools.
But it is not perfect. Here is an actual use case which happened to me.
The published paper included some summary statistics about some
numbers. It does not matter what the numbers represent. What matters
is that the paper did not include a histogram of the data, which I
considered to be a serious omission.
So, I went to the repository and downloaded the data, which was a plain
text file with one decimal number on each line. My preferred way to
examine such data is in Mathematica. So I had to change the format
into a comma separated list delimited by curly brackets, and I had
to put "mynameforthelist =" at the beginning. They I read that file
into Mathematica and I had the data in a named list.
For me it look maybe 15 seconds to add all the commas and put in
the curly brackets, because I know how to use an editor. But even so,
there was a lot of tedium and jumping through hoops. (If Mathematica
has a way to directly read in such a file and interpret it as a list,
please to not tell me because I already have a method that works for
me and is faster than looking up that documentation.)
I have encountered other examples where the data format was difficult
to understand or difficult to parse. What I describe above is the
best possible case within the current approach. And as Alex noted,
there is a barrier to going through those steps, particularly if the
reader is a domain expert but is not technically savvy.
It would have been better if the paper had a Sage/R cell loaded with the
data, so I could just process it immediately. It took me 5 seconds
to find that hist() is how you make a histogram in R.
> To view this discussion on the web visit
https://groups.google.com/d/msgid/pretext-dev/CA%2BR-jrcodSLL7UVgspWhGV7hRH28H0O15Sj5QKo7pQCEcg-3YQ%40mail.gmail.com.
>