Negative results

16 views
Skip to first unread message

mduke

unread,
May 18, 2011, 10:26:27 AM5/18/11
to Beyond the PDF, ma...@figshare.com
I was reminded of the discussion on publishing of negative results at
the meeting in January - just heard of this crowd-sourced style
service (apologies if this is old news):

http://figshare.com/
Scientific publishing as it stands is an inefficient way to do science
on a global scale. A lot of time and money is being wasted by groups
around the world duplicating research that has already been carried
out. FigShare allows you to share all of your data, negative results
and unpublished figures. In doing this, other researchers will not
duplicate the work, but instead may publish with your previously
wasted figures, or offer collaboration opportunities and feedback on
preprint figures.

Monica

David Shotton

unread,
May 18, 2011, 10:50:12 AM5/18/11
to beyond-...@googlegroups.com
Nice idea.  However, it seems to me that without additional entry of appropriate metadata describing the (negative) results, this will all be wasted effort. 

The only bit of biological data submitted by anyone I know - by Rod Page at http://figshare.com/figures/index.php/Geophylogeny - failed to resolve to a viewable figure.  Another, picked at random - http://figshare.com/figures/index.php/Frappe_K9_pops, opened (using FireFox 4) in the Zoho viewer window in an unintelligible manner.  Not very promising.

Do you know the people behind FigShare? 

David
--

Dr David Shotton                                                       david....@zoo.ox.ac.uk
Reader in Image Bioinformatics

Image Bioinformatics Research Group                                 http://ibrg.zoo.ox.ac.uk
Department of Zoology, University of Oxford                  tel: +44-(0)1865-271193
South Parks Road, Oxford OX1 3PS, UK                    fax: +44-(0)1865-310447

c...@cameronneylon.net

unread,
May 18, 2011, 1:37:35 PM5/18/11
to beyond-...@googlegroups.com
The person behind figshare is Mark Hahnek a PhD student at Imperial. I like the concept behind what he's done, which is that a single figure plus figure legend is useful enough to be worth publishing in some form and that it approaches the minimum granularity of useful human readable pieces of science. That said there's a lot more work to be done and Mark is just doing what he can as he has the time. I think many of the earliest uploads suffered from memory issues and ended up broken. I think it's a great example of both what can be done relatively easily and why it isn't easy to build something sufficiently robust to be generally useful.

Cheers

Cameron

Gully Burns

unread,
May 18, 2011, 2:18:39 PM5/18/11
to beyond-...@googlegroups.com
This is terrific. Really nice work. I agree with the notion of adding semantics to the FigShare content. Those semantics could well be a nice illustration of the 'nanopublication' idea, since in fact, individual figures are usually designed to illustrate a specific relation, data point or correlation.

I like this a lot.

Gul

Waard, Anita de A (ELS-AMS)

unread,
May 18, 2011, 9:37:33 PM5/18/11
to beyond-...@googlegroups.com, beyond-...@googlegroups.com
Apart from missing the context (experimental and conceptual) and lack of narrative, I don't understand how without 'ideal semantic search engines' this model combats inefficiency: how would anyone process such a multiplicity of figures, if we already have such a problem processing the flood of papers? Doesn't this just make more stuff we collectively have to wade through?

Anita de Waard
Disruptive Technologies Director
Elsevier Labs

Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677 (The Netherlands)

Phillip Lord

unread,
May 19, 2011, 3:08:47 AM5/19/11
to beyond-...@googlegroups.com

I think that it is important to remember that the lack of negative data
is not inefficient, but fatal. People die as a result, and in
significant numbers.

I think that the situation where we have more stuff to wade through is
surely better than the situation where we do not have stuff at all, or
worse, where we only have stuff with a positive spin to wade through.
This is the situation that we are in at the moment.

Having said that, I agree with your point about lack of narrative.
Additionally, I'd be worried about the repudiability and concerns about
establishing authorship; it's these sort of concerns which push me
toward the blog as a better tool. I've generally been thinking about
blogs as tool for formal publication of information; I now blog all my
grants and papers whether accepted or not. But to get around the problem
of negative data we really need to use them as lab books. I know that
others on this list have done work on this.

As for processing the flood of papers; well, I am a bioinformatician.
Over the last decade, I have been drowned in floods, washed away in
tsunamis or my personal favorite meterological metaphor, crashed into
the infoberg of data more times than I care to remember. Having also
spent 4 months of my life, eeking out 2kbp with 4 lane radioactive DNA
sequencing, I can personally vouch that too much data is a great place
to be.

Phil

> .

--
Phillip Lord, Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: philli...@newcastle.ac.uk
School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord
Room 914 Claremont Tower, skype: russet_apples
Newcastle University, msn: m...@russet.org.uk
NE1 7RU twitter: phillord

anita bandrowski

unread,
May 19, 2011, 11:50:27 AM5/19/11
to beyond-...@googlegroups.com
Thanks Philip,

You have some wonderful points.  I have to second the notion that drowning in a tsunami of data is a much better place to be than commissioning a human drug trial because negative data from animal studies is not available, then having to quickly cancel the trial (as was expressed to me by an NIH director recently).  This is extremely costly and we may argue more inefficient than having to wade through the vast marshland of data. 

I am still a big proponent of the data paper, a paper that simply reports on the data that was gathered, and actually a data paper should include any positive and negative results, structured into a common exchange format.  These formats should be available for many experiments (where large community databases exist) and the journals should have enough clout to facilitate the process of submitting reasonable datasets to databases.

In my own experiences of scaling mount microarray data, it is often of interest that a particular gene did not change as a result of some experimental manipulation.  In these sorts of experiments it would be of high interest to see all the data described, computed and analyzed the same way.  Unfortunately papers often include a supplemental table (at times in JPG format) with a ton of numbers and no statistical measures telling you whether those numbers reached any significance threshold, but the fact that Aquaporin 4 is upregulated in the nucleus accumbens after heroin treatment may be of value to studying some rare disease in the future, but we will never know if the data are not available.

Regards,
anita
--
Anita Bandrowski, Ph.D.
NIF Project Lead
UCSD 858-822-3629
http://neuinfo.org
9500 Gillman Dr.#0446
la Jolla, CA 92093-0446

Waard, Anita de A (ELS-AMS)

unread,
May 19, 2011, 12:40:59 PM5/19/11
to beyond-...@googlegroups.com
Dear Anita, Philip, all -

Yes, I share your enthusiasm for data-based papers - it was, of course, one of the main drivers behind Beyond the PDF. I think there are two things:

1) There seems to be a lot of useful work to be done in integrating data with papers in a better way - possibly by, in some way, publishing the data (in a proprietary data store, or a place like Dataverse http://thedata.org/) and 'wrapping' a paper around a collection of datasets (see e.g. http://precedings.nature.com/documents/4742/version/1). But the whole point, in the end, is to make sense of it all - in some, eventually cognitive way (the difference between data mining and e.g. 'figure mining' is that there are no standards for what the figures represent or mean - so e.g. feature extraction or summarisation is very hard!) I think that curated data centres (I'm enclosing a small list of some references) do a much better job in structuring the data as it comes in, which can eventually save everyone time and effort, and lead to better science. I do agree there also needs to be a place for nagative and plain 'uninteresting' results there. If these datacentres could also curate and annotate the experimental design and the methods used to generate the data, we'd really be able to compare experiments - (a long-term goal of Gully's KEfED work, for instance).

2) However, I'm sceptical that publishing single figures, instead of papers containing a number of figures, would improve this process. The figshare blurb says:
>>>>> Scientific publishing as it stands is an inefficient way to do science
>>>>> on a global scale. A lot of time and money is being wasted by groups
>>>>> around the world duplicating research that has already been carried
>>>>> out. FigShare allows you to share all of your data, negative results
>>>>> and unpublished figures. In doing this, other researchers will not
>>>>> duplicate the work, but instead may publish with your previously

>>>>> wasted figures...

I seriously doubt that just cutting papers into smaller components would lead to people copying other research less.


Best,

- Anita.

Some relevant references:

Data-centric initiatives and reports:

NSF:
- NSF 07-28, Cyberinfrastructure Vision for 21st Century Discovery: http://www.nsf.gov/pubs/2007/nsf0728/index.jsp
- DataWeb
- Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century: http://www.nsf.gov/pubs/2006/nsf0648/NSF-06-48_5.pdf
- NSF Data sharing policy: http://www.nsf.gov/pubs/gpg/nsf04_23/6.jsp - see point I

EU FP7:
- Digital Libraries Initiative on Scientific and Scholarly Information: http://ec.europa.eu/information_society/activities/digital_libraries/scientific/index_en.htm
- Commission Recommendation on the digitisation and online accessibility of cultural material and digital preservation: http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:32006H0585:EN:HTML

DFG (Deutsche Forschings Gemeinschaft):
- Recommendations of the Commission on Professional Self Regulation in Science http://www.dfg.de/en/dfg_profile/statutory_bodies/ombudsman/index.html

ICSTI: International Council for Scientific and Technical Information:
- 2009 Conference Managing Data for Science: http://www.icsti2009.org/02-program-abs_e.shtml
CODATA: International Council for Science: Committee on Data for Science and Technology:
- With ERPANET: Paper on The Selection, Appraisal and Retention of Digital Scientific Data: http://www.ariadne.ac.uk/issue39/erpanet-rpt/
- With GEO, GEO Data Sharing Principles Implementation: http://www.codata.org/GEOSS/index.html
PARSE.insight Group (Permanent Access to the Records of Science in Europe):
- Survey on use and needs of Research Data: http://www.parse-insight.eu/downloads/PARSE-Insight_D3-4_SurveyReport_final_hq.pdf
DataCite: consortium focused on improving the scholarly infrastructure around datasets
NRC BRDI: National Research Council, Board on Research Data and Information http://sites.nationalacademies.org/PGA/brdi/index.htm:
- Reports include ‘Open Access and the Public Domain in Digital Data and Information for Science: Proceedings of an International Symposium (2004)’ http://sites.nationalacademies.org/PGA/brdi/PGA_047287
Conference on Preservation of Digital Objects (iPres):
- Conference report on the Preservation of Digital Objects - http://escholarship.org/uc/item/14h35961
Tools and Platforms:

eSciDoc: https://www.escidoc.org/JSPWiki/en/ContentModel
- eSciDoc is a system targeted at research organizations, universities, institutes, and companies interested in eScience-aware knowledge and information management that enables you to publish, visualize, manage, and work with data artifacts (or objects). Objects include both publication data and research data across disciplines.
- eSciDoc addresses aspects of data reliability, data quality, data curation, and long-term preservation. It covers the whole lifecycle of objects and supports semantic relations between objects.
Domain-specific Data Centers:
• Pangea: PANGAEA - Publishing Network for Geoscientific & Environmental Data, http://www.pangaea.de/
• CISL Research Data Archive: http://dss.ucar.edu/
• DCHep: ICFA Study Group on Data Preservation and Long Term Analysis in High Energy Physics, http://dphep.org/


Anita de Waard
Disruptive Technologies Director, Elsevier Labs
http://elsatglabs.com/labs/anita/
a.de...@elsevier.com

Regards,
anita

>>>> Department of Zoology, University of Oxford tel: +44-(0)1865-271193 <tel:%2B44-%280%291865-271193>
>>>> South Parks Road, Oxford OX1 3PS, UK fax: +44-(0)1865-310447 <tel:%2B44-%280%291865-310447>

>>>>
>>>
>>
>
> Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677 (The Netherlands)

> .

--

Phillip Lord, Phone: +44 (0) 191 222 7827 <tel:%2B44%20%280%29%20191%20222%207827>

Lecturer in Bioinformatics, Email: philli...@newcastle.ac.uk
School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord
Room 914 Claremont Tower, skype: russet_apples
Newcastle University, msn: m...@russet.org.uk
NE1 7RU twitter: phillord


--
Anita Bandrowski, Ph.D.
NIF Project Lead
UCSD 858-822-3629
http://neuinfo.org
9500 Gillman Dr.#0446
la Jolla, CA 92093-0446

Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677 (The Netherlands)

Paul Groth

unread,
May 19, 2011, 12:57:18 PM5/19/11
to beyond-...@googlegroups.com, Hahnel, Mark
Hi Anita,

I just want to mention a point that Cameron brought up - whether or not
you think FigShare will work or is good, I think it's pretty cool that
Mark (cc'd on the email) on his own is doing something to try to improve
science. It's a good example of how science infrastructure can begin to
have web 2.0 style rapid innovation cycles.

regards,
Paul


Waard, Anita de A (ELS-AMS) wrote:
> Dear Anita, Philip, all -
>
> Yes, I share your enthusiasm for data-based papers - it was, of course, one of the main drivers behind Beyond the PDF. I think there are two things:
>
> 1) There seems to be a lot of useful work to be done in integrating data with papers in a better way - possibly by, in some way, publishing the data (in a proprietary data store, or a place like Dataverse http://thedata.org/) and 'wrapping' a paper around a collection of datasets (see e.g. http://precedings.nature.com/documents/4742/version/1). But the whole point, in the end, is to make sense of it all - in some, eventually cognitive way (the difference between data mining and e.g. 'figure mining' is that there are no standards for what the figures represent or mean - so e.g. feature extraction or summarisation is very hard!) I think that curated data centres (I'm enclosing a small list of some references) do a much better job in structuring the data as it comes in, which can eventually save everyone time and effort, and lead to better science. I do agree there also needs to be a place for nagative and plain 'uninteresting' results there. If these datacentres could also curate and annotate the experimental design and the methods used to generate the data, we'd really be able to compare experiments - (a long-term goal of Gully's KEfED work, for instance).
>
> 2) However, I'm sceptical that publishing single figures, instead of papers containing a number of figures, would improve this process. The figshare blurb says:

> >>>>> Scientific publishing as it stands is an inefficient way to do science
> >>>>> on a global scale. A lot of time and money is being wasted by groups
> >>>>> around the world duplicating research that has already been carried
> >>>>> out. FigShare allows you to share all of your data, negative results
> >>>>> and unpublished figures. In doing this, other researchers will not
> >>>>> duplicate the work, but instead may publish with your previously

> • Pangea: PANGAEA - Publishing Network for Geoscientific& Environmental Data, http://www.pangaea.de/


> • CISL Research Data Archive: http://dss.ucar.edu/
> • DCHep: ICFA Study Group on Data Preservation and Long Term Analysis in High Energy Physics, http://dphep.org/
>
>

> Anita de Waard


> Disruptive Technologies Director, Elsevier Labs
> http://elsatglabs.com/labs/anita/
> a.de...@elsevier.com
>
>
>
> -----Original Message-----
> From: beyond-...@googlegroups.com on behalf of anita bandrowski
> Sent: Thu 5/19/2011 11:50
> To: beyond-...@googlegroups.com
> Subject: Re: Negative results
>

> >>>> Department of Zoology, University of Oxford tel: +44-(0)1865-271193<tel:%2B44-%280%291865-271193>
> >>>> South Parks Road, Oxford OX1 3PS, UK fax: +44-(0)1865-310447<tel:%2B44-%280%291865-310447>


> >>>>
> >>>
> >>
> >
> > Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677 (The Netherlands)
>
> > .
>
> --

> Phillip Lord, Phone: +44 (0) 191 222 7827<tel:%2B44%20%280%29%20191%20222%207827>

Waard, Anita de A (ELS-AMS)

unread,
May 19, 2011, 1:14:30 PM5/19/11
to beyond-...@googlegroups.com, Hahnel, Mark
Absolutely! I think it is a very cool project - I am just sceptical about this way of sharing results being more efficient. See e.g. also http://en.wikiversity.org/wiki/User:OpenScientist/Open_grant_writing_-_Encyclopaedia_of_original_research and the 'collaborative blog' for some very original, grass-roots stuff from Daniel Mietchen - another development that allows content creation. Lots of cool things here, as well - which could be combined with data publishing...

Best,

- Anita.

regards,
Paul

> . Pangea: PANGAEA - Publishing Network for Geoscientific& Environmental Data, http://www.pangaea.de/
> . CISL Research Data Archive: http://dss.ucar.edu/
> . DCHep: ICFA Study Group on Data Preservation and Long Term Analysis in High Energy Physics, http://dphep.org/

Daniel Mietchen

unread,
May 20, 2011, 9:34:39 PM5/20/11
to beyond-...@googlegroups.com
Thanks for bringing this up, Anita.

That project is, in essence, an attempt to get funded for putting into
practice the vision of "Science as a wiki" that I had outlined in my
BtPDF talk, perhaps with a spin that others have described as a
"GitHub for science". All of the grant drafting takes place in the
open, and we would very much welcome input from people on this list.

Thanks and cheers,

Daniel

--
http://www.google.com/profiles/daniel.mietchen

cameron...@stfc.ac.uk

unread,
May 21, 2011, 3:52:23 AM5/21/11
to beyond-...@googlegroups.com, m.hah...@imperial.ac.uk
Ok, so I'll have a shot at why I think Mark is heading on exactly the right track with Figshare.

First some grounding points of opinion that are probably important to lay out where I'm coming from:

* If we want to get more of the data that is currently published "out there" and available then the publication routes need to be _extremely_ low friction.

* IMO integrating things with papers is a strategic dead end*(but see caveat below). We first need to _dis-integrate_ the paper before we start putting it back together again.

* There is a large untapped demand, particular from the biological sciences, for _human readable_ data and representations of that data on the web, which is generally accessed through a Google or Google image search

So I'm interested in what Mark is doing from a number of perspectives. First that he is exploring the smallest useful piece of research output, primarily from the perspective of a human reader and discoverer. I think he's on the right track here with focussing on figures because they represent the smallest coherent piece of the stories that Anita talks about. This might be wrong but I think its an issue that is worth exploring. What is more is he's connecting metadata (the figure legends plus some semantic information surfaced through the Semantic MediaWiki platform) to an image (which is surfaced via Google image search very effectively) and through that image in many cases to the data behind the graph (in the form of a spreadsheet in most cases, but note that he is working to support lots of "standard" and therefore reasonably semantically defined data file types).

So this connects data (which we can imagine being easy to make more semantically readable via tools like Google Data Publishing Language or OData connected with appropriate vocabularies, Google Refine, enhancements to Excel etc, advances in instrumental data formats via tools like AniML...) to a readily understood search and discovery method ("I know I want a graph that refers to this term/gene name/unit") via image search or similar. I'm much more confident that this kind of bottom up approach, leveraging the work that big corporations are doing to improve the consumer web will make progress than trying to agree standardised formats/vocabularies etc., at least for the kind of science I do.

Both approaches are important and we can work to make them meet in the middle, particularly if our data collection and annotation systems can be standardised to both capture, archive, and consume data through standard views, but if we're going to make progress into the mainstream of experimental science I think we need really low barriers to participation. The message of the consumer web is that more data, even messy data is better. You don't improve Google by keeping pages from the web, in fact its all the trash that helps keep those search algorithms finely honed. I think there's a lot of mileage in getting what we can up there with the lowest barriers possible and letting the search giants go to work on trying to help us make sense of it. And that will only happen if its very easy for people to upload something that they are already creating. What excites me is that my intuition is that these figures, along with a little bit of descriptive text might hit a sweet spot if we can just get enough of a bit of reward in there, that encourage people to get that data up, and that its a rapid route to improving discoverability and searchability. It's not more efficient today, but it has the potential to provide the substrate of content that will make it more efficient in the future.

I should also admit that I very rarely read papers these days. I'm almost always looking for a single graph, single table, or the link to the data file which is the thing I _really_ want. I'm not sure how unusual that behaviour is although my guess is its much more common than people might think.

Cheers

Cameron

** I think this work is tremendously tactically important because it does two things, expanding people's minds beyond the constrictions that the printed page has created and helps guide us on the interchange and format standards that will need to manage the pieces. But it worries me that we seem to worry more about the container than the actual substance of the pieces in it.

--
Scanned by iCritical.

Lewis

unread,
May 21, 2011, 9:14:43 AM5/21/11
to Beyond the PDF
I received a while ago an invitation of The All Results Journals:Biol,
a journal focused in publishing negative results:

http://www.arjournals.com/ojs/index.php?journal=Biol&page=information&op=authors

It is published by a non-profit organization formed by scientists (the
email I received was sent by David Alcantara, managing editor of the
journal) and is getting good people on the board. What surprised me is
the new concept of total open access they claim, where nobody has to
pay to publish or read the articles.
Nice idea too, isn't it?
BTW, they are publishing a really interesting parallel blog
(blog.arjournals.com).
Lewis

Phillip Lord

unread,
May 23, 2011, 5:18:22 AM5/23/11
to beyond-...@googlegroups.com

"Waard, Anita de A (ELS-AMS)" <A.de...@elsevier.com> writes:
> I think that curated data centres (I'm enclosing a small list of some
> references) do a much better job in structuring the data as it comes
> in, which can eventually save everyone time and effort, and lead to
> better science.

Well, curated data centres are great. But expensive. So they are only
useful where there is a lot of data on one sort coming in on a regular
basis.

It also begs the question, what do the curated data centres curate if
the data is not available in the first place? The curated data centres
simply end up curating a publication bias.


Phil

Phillip Lord

unread,
May 23, 2011, 5:14:13 AM5/23/11
to beyond-...@googlegroups.com

anita bandrowski <aband...@ucsd.edu> writes:
> I am still a big proponent of the data paper, a paper that simply reports on
> the data that was gathered, and actually a data paper should include any
> positive and negative results, structured into a common exchange format.

I think that the idea of a common exchange format is a good one; having
said this, as someone who has been involved in defining a number of
these, I know that there is a lot of resistance to their use. I think
that they are nice to have, but not essential.

It is possible to extract structured data from unstructured text (even
if it's hard and expensive). It's not possible to extract anything from
no data.


> These formats should be available for many experiments (where large
> community databases exist) and the journals should have enough clout to
> facilitate the process of submitting reasonable datasets to databases.
>
> In my own experiences of scaling mount microarray data, it is often of
> interest that a particular gene did not change as a result of some
> experimental manipulation.

Indeed. I've done a reasonable amount on protein-protein interaction. I
carry a deep fear that, ultimately, if you look hard enough with enough
techniques, sooner or later you will be able to establish an interaction
betwen any two proteins, so making the data totally without meaning.
Part of this fear comes from the absence of negative data.

Phil

Phillip Lord

unread,
May 23, 2011, 5:22:05 AM5/23/11
to beyond-...@googlegroups.com
Lewis <lewisb...@gmail.com> writes:
> I received a while ago an invitation of The All Results Journals:Biol,
> a journal focused in publishing negative results:
>
> http://www.arjournals.com/ojs/index.php?journal=Biol&page=information&op=authors
>
> It is published by a non-profit organization formed by scientists (the
> email I received was sent by David Alcantara, managing editor of the
> journal) and is getting good people on the board. What surprised me is
> the new concept of total open access they claim, where nobody has to
> pay to publish or read the articles.
> Nice idea too, isn't it?

It is a good idea. As Cameron says, publishing has to be extremely low
friction, which includes low cost. My own experience with
http://knowledgeblog.org suggests to me that we should be aiming for
publication costs of 5 - 10 dollars a paper, rather than 500, and that
this is very possible.

Phil

Waard, Anita de A (ELS-AMS)

unread,
May 25, 2011, 11:59:02 AM5/25/11
to beyond-...@googlegroups.com, m.hah...@imperial.ac.uk
Dear Cameron, Philip, Anita, all,

I find this a fascinating discussion; in particular, it is an intriguing thought that you would want to solve the data deluge by creating, well, an order of magnitude more data. I'd like to make a shot at teasing out at least two separate topics in the emails sent so far:

1. Producing more data leads to less data deluge?

I did not think that this could be at all a logical solution to the problem ,until I saw this phenomenal demonstration of the Open Lab Notebook project by Jean-Clause Bradley at the ACS (in a session that was chaired by Peter Murray-Rust, I believe?) – http://www.softconference.com/ACSchem/player.asp?PVQ=HGFE&fVQ=FIDGII&hVQ= - which finally made me see the Open Data light. In this project, for one thing, problems are outsourced; he discusses the Open Notebook Science challenge (of which Cameron was a judge), and efforts to combine e.g. solubility data.

This is an incredibly powerful demonstration, and gives one the sense of truly watching a new and better way of sharing scientific results. The question is, however, if it might work in areas other than chemistry (or perhaps astronomy). What seems to be needed for this to work are two things:
a. A willingness and interest to share data, and work and with on other people’s data
b. A knowledge framework where the results of different experiments can easily be integrated.
Both seem to be present in chemistry: the participants seem happy to share their solubility and other data, and e.g the wiki pages show that it is easy to add data to what is known about a certain substance or molecule.

It seems that e.g. a field like astronomy also qualifies: as we heard Michael Kurt tell us at Beyond the PDF, astronomers share data from telescopes, and can store the knowledge they concatenate in various databases.

I wonder though, if it also works in the life sciences. What I have learned about biology (but am ready to stand corrected!) is that the quality of a researcher, a group, and a specific paper, is largely dependent on the quality of the work he or she can do in the lab – doing careful manipulations of living systems and recording the effects of these manipulations. It seems the lesser willingness of biologists to first, share their data, and second, use someone else’s, has to do with a domain-specific appreciation for the craft of experimentation; making the data is an integral part of doing science.

But: if that is not the case, and biologists would be willing to share their data, the question is still what is the best way to publish this data. So we get to point 2:

2. The figure as the smallest publishable unit

If I read the previous emails correctly, there are four arguments for publishing figures individually, rather than integrated within papers:
a) They are easier to publish
b) They are easier to digest
c) They are the only thing you read in a paper anyway
d) They can represent failed experiments.

I’ll try to offer some counter-arguments to each of these.

a) The question is whether it is indeed so hard to publish?

Anita Bandrowski argues for the ‘data paper’ – basically a data set with some language pertaining to models etc. around it. It seems that in principle, it is fully possible to publish these; e.g. Dataverse (thedata.org) allows any scientist to publish any level or granularity of data, provided they enter some basic metadata. A large number of open access journal publish papers at any level of maturity, provided it can be somewhat checked; of course, any author can (and many do) put anything on any blog or site without any peer review at all. So there are two issues here: reputable journals do not encourage data papers, and connected to that, publishing ‘just’ data does not offer the author any credit. Both have to do with the system of scientific evaluation and validation. In summary, the point is not whether it is easy or hard to publish, but if the types of publication that get you credit should change.

b) and c), in my view are two sides of the same coin: the argument goes that since an author only looks at the main conclusions and figures anyway, why even write the rest of the paper?

This, I think, is a misunderstanding of the way and speed at which our brains process knowledge. Yes – a specialist can assess the value and results of a paper in a few seconds flat. But there is evidence from the psychology literature (eg. work by Dillon, Kintsch, and others - happy to dig up some references) that it is precisely the linguistic context within which these results are given that allow our brains to cut to the chase so quickly. The essential element here is context: if we know what part of what argument a figure is supporting, our brains can spend most of its time (and register with our consciousness) thinking about the figure. However, if the figure is devoid of context, we spend all our time looking for it, as many failed experiments in ‘snippetising’ books written to be read in a linear fashion have shown.

So, if there is a way to provide the proper knowledge context for a figure, sure, just publish the figure. But is there really a way to do that quicker than a paper does?

d) Need to publish negative data.

I totally agree: it would be useful to have access to people’s negative results. The question is whether this is more effectively done by pubishing single bits of negative results: again, we need the conceptual and experimental context for a result to be able to assess its importance. If systems can be devised to oversee, combine, merge datasets and results can be directly added to such a system, it would be great to also include negative results; perhaps, in chemistry, this is possible. Such systems would have great validity even if they only have positive results, of course, too. But the argument from the first point still continues: we need a collective conceptual framework, and the willingness to share and use other’s data.

In summary, I think there is great value in building systems that publish, share and collectively make sense of data. To interpret a single figure, however, there is still a great deal of context needed; although you can very well imagine e.g. linking to or pulling in a single figure from another paper, I haven’t seen more effective ways of transmitting context than the research article – yet. But I would love to see one, and be proven wrong.

Best, and looking forward to continuing this discussion,

Anita

Phillip Lord

unread,
May 26, 2011, 5:31:25 AM5/26/11
to beyond-...@googlegroups.com, m.hah...@imperial.ac.uk
"Waard, Anita de A (ELS-AMS)" <A.de...@elsevier.com> writes:
> I wonder though, if it also works in the life sciences. What I have learned
> about biology (but am ready to stand corrected!) is that the quality of a
> researcher, a group, and a specific paper, is largely dependent on the quality
> of the work he or she can do in the lab – doing careful manipulations of
> living systems and recording the effects of these manipulations. It seems the
> lesser willingness of biologists to first, share their data, and second, use
> someone else’s, has to do with a domain-specific appreciation for the craft of
> experimentation; making the data is an integral part of doing science.


Biology is like any discipline. Some people are happy to share their
data, some are not. In general, though, I would say the situation is
biology is far better than in chemistry. Large quantities of some barely
pivotal data types are freely available and use well-understood forms of
representation. In chemistry, a lot of the basic data is controlled by
proprietary interests with overlapping, and often incompatible license
conditions. Medicine is half way in between -- getting free access to
drug information can still be hard. This is without considering the
knotty issue of patient data which is a different ball-game altogether.

All domains think that they are exceptional. A neuroscientist (a domain
in which I have no experimental background, and where everyone thinks I
am a computer scientist) told me once that they couldn't share data,
because they had to get publications or they wouldn't get funding.
Obviously, as a computer scientist, every time I run out of cash, I just
have to tell the government and they give me more, no questions asked.

> a) The question is whether it is indeed so hard to publish?
>

> any author can (and many do) put anything on any blog or site without
> any peer review at all.

It is, indeed, very easy to publish your data on a blog. At least, this
is true, if you know what a blog is. If you know how to use it. If you
consider the issue of using one in the first place.

There is a difference between perception and reality I think. Most
scientists don't realise that it is very easy to publish.

> In summary, the point is not whether it is easy or hard to publish,
> but if the types of publication that get you credit should change.

This is a very valid point, though. Nowadays, I publish on my blog, then
secondarily publish elsewhere, purely for the purpose of collecting
brownie points for my next promotion (assuming I ever get one).


> So, if there is a way to provide the proper knowledge context for a figure,
> sure, just publish the figure. But is there really a way to do that quicker
> than a paper does?

Personally, I think that the (short) paper is the way to go. This is the
reason, that I think we need to focus on reducing the cost and effort of
publication by several orders of magnitude. In this day and age,
the cost of publication should be no greater than the cost of authoring.


> d) Need to publish negative data.
>

> I haven’t seen more effective ways of transmitting context than the research
> article – yet. But I would love to see one, and be proven wrong.

Personally, I tend to agree with this. But, I think that we can tweak
the process and the paper to streamline it. In recent years, I've
noticed that as a result of regular blogging I tend to hyperlink more
and use many fewer scientific cliches. Increasingly, the wading through
large quantities of bad English that is necessary by me, in order to
ascertain the quality of the papers, which have been sent by others to
the author of this post, for review, has been shown to irritate.

I'd like a journal which enforced a maximum reading age of 12 for
articles, rather than a page limit. That would be a bigger advance than
anything else, we could achieve.

Phil

Reply all
Reply to author
Forward
0 new messages