Does a Data Journal Make Sense?

18 views
Skip to first unread message

Phil B

unread,
Jun 28, 2011, 12:10:37 PM6/28/11
to Beyond the PDF
Hi:

I found the announcement yesterday of a new high profile open access
journal somewhat discouraging (see http://www.hhmi.org/news/20110627.html).
Do not get me wrong, support for open access is of course welcome, but
that this is the best a group of top scientists could come up with to
improve scholarship leaves me wondering. It is my opinion we need to
coax these folks to greater thoughts through initiatives that they can
see make sense, even if not initially, but soon. In my opinion the SMA
application that came from the workshop in Jan. is a great example.
With that under way I started to contemplate what else could be done.

After much discussion with a number of people I have concluded that a
logical next step is to consider starting a high visibility data
journal for the reasons I outline below in a brief rationale. I am
considering starting to chase funds for such an endeavor and before I
do so I would very much appreciate the thoughts of this group on
whether this is a good idea, whether it conflicts with other efforts
etc. I am less concerned about the technology to be used at this stage
and more concerned about whether this would move improved scholarship
in the right direction? Your council would be much appreciated.

Best../Phil B.

A Proposal for a Data Journal
Philip E Bourne PhD June 24, 2011

Executive Summary

After thinking long and hard post the Beyond the PDF workshop and
talking at length to scientists, publishers, librarians, funders and
others vested in bettering scholarship I have concluded we need a bold
yet doable step which will engage your typical faculty member in the
process of change. Without broad active participation of scientists at
large we will continue to invoke change at a glacial pace and in just
a small number of disciplines.

Participation requires reward and other perceived gains in a
competitive workplace. A Data Journal can provide that reward and
gain. The scientist receives a true citation that can be used in
traditional journals for their dataset. These data are preserved and
the scientist is in compliance with emerging data sharing policies
being put in place by the funding agencies. Think of it as PLoS ONE
for data – an open access place for data that are sound, but not
necessarily expected to have high impact on science. If that sounds
less than promising, recall the beginnings of PLoS ONE, which was
initially considered below par, but proved otherwise and became the
worlds largest journal with lots of value.

Scientists need an open access place for data since there are many
datasets that do not fit well within existing repositories or are not
suitable for inclusion in a publication (the so-called long-tail
problem). With a lightweight documented calling interface (API), apps
can easily be developed by the community to operate on subsets of the
data (including reviewing the data) in the same way that apps are
built to maximize the utility of a piece of hardware or a corpus of
literature, as is the case with something like Elsevier’s SciVerse.
The more data are published this way, the more useful the data journal
becomes, which in turn encourages more data publication. The sum of
the parts are greater than the whole – not true of a typical journal.

A number of pieces will be critical to a successful data journal.
Understanding the needs of a scientific community and getting its
feedback will keep the journal relevant; this is an ongoing activity
effectively is achieved by establishing an appropriate social
infrastructure. Providing a simple, bullet-proof rights framework will
ensure that needless barriers to sharing and preservation are
minimized. Usage and impact metrics should be enabled by citation
conventions backed up by unique persistent identifiers for datasets
and researchers themselves. Some concept of “quality”, if not peer
review, also needs to be defined and piloted.

The impact on scientific data analysis could be profound. The devil is
in the details – how can this be made versatile enough to embrace
multiple types of data retrieved in a way that are comfortable to the
intended discipline of users? This is a problem that outside of the
database community does not seem to be well solved. Thus database
developers need to be part of the solution. At the same time, it will
be important that the effort not try to run before it can walk; solid,
forward-thinking infrastructure first, (but definitely not a build it
and they will come, but rather built it with them so that they might
come) with enriched application development later.

The objective proposed here is to prototype such a database (but call
it a journal) in one year. The prototype development requires willing
data contributors, database developers, publishers, software engineers
and others engaged in digital scholarship. The process involves
requirements gathering from scientists, a workshop to define the
technical requirements and the subsequent prototyping (possibly using
an existing system) with a goal to launch a Data Journal in the second
year.

You are receiving this document because in some way you could
contribute to this effort. I hope you will. Comments welcome at any
time.

Problem Statement
Traditional scholarly communication is struggling to adapt to the
Internet era. One notable shortcoming (drawing from the biosciences,
but more generally applicable) is that large amounts of digital data
are being generated, often from high-throughput experiments, which may
or may not be associated with a publication, and subsequently lost .
True, some of that data finds its way into existing repositories that
I would characterize as follows:

1. Established repositories with stable funding where the incentive to
deposit comes from a prerequisite defined by the journal accepting a
publication associated with the data. Examples are the Protein Data
Bank (PDB), Genbank, and PharmGKB – These are functioning well, but
are expensive to operate and there is no business model beyond
continued federal funding. As a result some of these resources have
closed (e.g., the Arabidopsis database) or are under threat.

2. Repositories that contain data specifically related to a
publication, but for which no recognized domain specific repository
exists. Dryad is an example of such a repository. Publishers have
associated with these resources since it relieves them of the burden
of managing the data as a large supplement to the paper, which many
publishers are ill-equipped to do. These resources are just emerging
and seem destined to also continue to rely on federal funding or buy-
in from the publishers, in other words, there is no established
business model to provide sustainability at this time.

3. Institutional repositories that manage the data produced by
researchers at a given institution. At this time their success is far
from assured – what is the incentive for a data producer to put data
there? What is the business model for their sustainability? How do
institutional repositories relate to each other? There are exceptions
(e.g., the California Digital Library) but even there I would say
there is a failure to achieve broad buy-in by faculty. There need to
be faculty who have seen a significant gain and who champion such
developments.

4. Grass roots efforts, e.g., Sage Bionetworks, which are pushing an
open data model and are usually domain specific, e.g., disease
modeling.

In reality, much of the data generated, most of which conforms to the
long tail – the very large number of small datasets not handled {well}
by these repositories - are lost. While much of these data could
easily be reproduced and many will never be in demand, there is also
valuable data being lost and no business model for sustaining such
data. Efforts like DataCite are a step in the right direction but a
DOI alone is not enough to incentivize the scientific community at
large.

What is the Solution and Why Will it Work?
The solution is to establish a data journal. What does this mean and
why will it work?

The major impediment for many researchers in preserving their data is
incentive. Funding agencies, e.g., NIH, NSF are providing incentive
through mandatory data sharing policies, but at this time there is
confusion surrounding these policies and no indication of, if and how
they will be enforced, or indeed where to store the data to conform to
the policies!

Perhaps more important, there is no reward for depositing data, except
in the case of 1 above, but that covers only a small amount of the
data generated. A data journal provides that reward through a citation
(e.g., hypothetically PLoS Data 2011 6:3 d000001) that can be
included on resumes, promotion files etc. Associated with the citation
will be a DOI (from DataCite or elsewhere) that provides a definitive
resolvable reference to that dataset. Initially the value of that
citation will be limited, but over time we anticipate it will be
accepted by aggregation services, e.g., Thomson Reuters ISI and
PubMed.

Another reason a Data Journal will be successful is the corresponding
demise of the research article. Already many research articles do
little else than report on a set of data, a Data Journal simply
acknowledges the fact. The goal of a Data Journal is to drive the
notion that one day (it should have been yesterday!) a research
article with zero citations (beyond self citing) will be less valuable
that a citable dataset that has been downloaded by many researchers
worldwide (bibliometrics on access to the data and commentary by users
of the data would be maintained from day 1).

How Will This Work?
Notwithstanding the provision of a reward, the barrier to “publishing”
with the Data Journal must be kept very low, at least initially. This
implies collecting only a minimal amount of metadata to provide
provenance and to identify the dataset uniquely. Over time, based on
demand from scientists to publish in the Data Journal, the
requirements can be ramped up, but also offset by increased
automation. Here are three suggested phases of development:

• Phase I (one year) – deploy the backend curation repository to
appropriately store, retrieve, search and audit datasets. Develop
simple upload mechanisms for small and large datasets. Define a simple
review and data integrity system to validate the depositions. See
whether the journal gets any “data papers” submitted. This could be
based on an existing infrastructure at the California Digital Library
(CDL).

• Phase II (one year) – If phase I is successful start to accumulate
more metadata on entries through automated extraction based on data
format and data type. An example of a popular data format will be MS
Excel spreadsheets (coincidentally by June 2012 CDL will have released
an open-source Excel “add-in” designed to make spreadsheet data easier
to publish, share, and preserve); an example of a data type would be
biological sequence data which has 1-2 well-structured forms and which
does not belong in any existing repository.

• Phase III (on-going) expand the capabilities established in phase II
to visually respond to common and established data formats and data
types. Data visualized in a traditional journal is static. Here the
data can be bought to life, often using tools developed by communities
themselves and collated by the Data Journal. If successful this will
become an accepted form of publishing and the depositors/authors will
treat it like any other journal and the merger between database and
journal, predicted 6 years ago will be realized.

Where Will This Be Done and By Whom?
This remains open at this time, but one option is in partnership with
PLoS or F1000, as a publisher and the University of California as a
backend provider. The California Digital Library could provide data
repository, curation, citation, and publishing services (with DataONE
and DataCite) as well as long experience with aggregators and
publishers.

As a faculty advocate and database person I am willing to put
significant effort into this, but it must be a broad team effort.

How and Why Will It Be Funded?
I have thought long and hard about this since the very successful
“Beyond the PDF Workshop” (BtPDF), which we hosted at UCSD in January
2011. I have become convinced that to precipitate change in the way we
manage scholarship in the digital era we need an exemplar that can
touch your average scientist, who sees a need for change, but will not
be part of it without a reward. A Data Journal is such a reward –
citation and preservation of an important part of their scholarship.
With great concern about preservation and open accessibility of
digital data from funders, publishers and scientists, the time would
seem right to take this step. The intent is to approach funders who
are vested in the ideals of BtPDF to seed such a Data Journal
initiative.

To start we will initiate a joint call with supporters of BtPDF – the
CDL, the National Cancer Institute (NCI), The Gordon and Betty Moore
Foundation (GBMF), The Doris Duke Foundation, The Alfred P. Sloan
Foundation, Sage Bionetworks, Science Commons and Microsoft with the
goal of obtaining short term seed funding for Phase I. Later for Phase
II the emerging team of like-minded folks who form and seek funding
from NSF etc. for Phase II and a sustained open access business model
for Phase III. That is, for Phase III researchers would have put a
line item in their research budgets to cover the cost of preservation.
For-profits will be encouraged to pay small amounts to upload data no
longer of proprietary value and will be compensated with limited
advertising.

Acknowledgements
Thus far thanks to Lee Dirks (Microsoft), Josh Greenberg (Sloan
Foundation), Tony Hey (Microsoft), John Kunze (CDL), David Lipman
(NLM), Chris Mentzel (GBMF), Mark Patterson (PLoS), Brian
Schottlaender (UCSD Library) for discussions in coming to the
conclusions presented here.


Rebholz

unread,
Jun 28, 2011, 12:29:23 PM6/28/11
to beyond-...@googlegroups.com, Phil B
Hi,

yep, not much innovation here. The setup is pretty much standard except:
* only one round of revision to ensure fast publication
* this means also to reject lot of the work, since it seems not worth a
second revision round (like in some conferences)
* not much re-editing of the paper and additional work => fast, but
likely to reject good work, since it is not yet ready

No title yet, just quite broadly: new, top-tier, open access journal
for biomedical and life sciences research.

I guess, the innovative step is: how to get top quality publications
without having to go through different revision cycles. Not sure how
this would work.

I wonder, how much time the funders want to leave to the journal to end
up in the top-tier. This could take some time, since open access is now
established through other journals already.

For the data data journal:
* sounds like a good and interesting idea
* it makes sense to work together with those guys http://www.datacite.org/
* giving the different data collection types a definition /
specification seems to be mandatory
* it would be important to think about the fact that the data also
requires revision => could be tedious
* working together with different journals to collect / organise the
supplementary data could be an interesting service.

This is what comes to my mind.

-drs-

> for data � an open access place for data that are sound, but not


> necessarily expected to have high impact on science. If that sounds
> less than promising, recall the beginnings of PLoS ONE, which was
> initially considered below par, but proved otherwise and became the
> worlds largest journal with lots of value.
>
> Scientists need an open access place for data since there are many
> datasets that do not fit well within existing repositories or are not
> suitable for inclusion in a publication (the so-called long-tail
> problem). With a lightweight documented calling interface (API), apps
> can easily be developed by the community to operate on subsets of the
> data (including reviewing the data) in the same way that apps are
> built to maximize the utility of a piece of hardware or a corpus of

> literature, as is the case with something like Elsevier�s SciVerse.


> The more data are published this way, the more useful the data journal
> becomes, which in turn encourages more data publication. The sum of

> the parts are greater than the whole � not true of a typical journal.


>
> A number of pieces will be critical to a successful data journal.
> Understanding the needs of a scientific community and getting its
> feedback will keep the journal relevant; this is an ongoing activity
> effectively is achieved by establishing an appropriate social
> infrastructure. Providing a simple, bullet-proof rights framework will
> ensure that needless barriers to sharing and preservation are
> minimized. Usage and impact metrics should be enabled by citation
> conventions backed up by unique persistent identifiers for datasets

> and researchers themselves. Some concept of �quality�, if not peer


> review, also needs to be defined and piloted.
>
> The impact on scientific data analysis could be profound. The devil is

> in the details � how can this be made versatile enough to embrace

> Bank (PDB), Genbank, and PharmGKB � These are functioning well, but


> are expensive to operate and there is no business model beyond
> continued federal funding. As a result some of these resources have
> closed (e.g., the Arabidopsis database) or are under threat.
>
> 2. Repositories that contain data specifically related to a
> publication, but for which no recognized domain specific repository
> exists. Dryad is an example of such a repository. Publishers have
> associated with these resources since it relieves them of the burden
> of managing the data as a large supplement to the paper, which many
> publishers are ill-equipped to do. These resources are just emerging
> and seem destined to also continue to rely on federal funding or buy-
> in from the publishers, in other words, there is no established
> business model to provide sustainability at this time.
>
> 3. Institutional repositories that manage the data produced by
> researchers at a given institution. At this time their success is far

> from assured � what is the incentive for a data producer to put data


> there? What is the business model for their sustainability? How do
> institutional repositories relate to each other? There are exceptions
> (e.g., the California Digital Library) but even there I would say
> there is a failure to achieve broad buy-in by faculty. There need to
> be faculty who have seen a significant gain and who champion such
> developments.
>
> 4. Grass roots efforts, e.g., Sage Bionetworks, which are pushing an
> open data model and are usually domain specific, e.g., disease
> modeling.
>
> In reality, much of the data generated, most of which conforms to the

> long tail � the very large number of small datasets not handled {well}

> Notwithstanding the provision of a reward, the barrier to �publishing�


> with the Data Journal must be kept very low, at least initially. This
> implies collecting only a minimal amount of metadata to provide
> provenance and to identify the dataset uniquely. Over time, based on
> demand from scientists to publish in the Data Journal, the
> requirements can be ramped up, but also offset by increased
> automation. Here are three suggested phases of development:
>

> � Phase I (one year) � deploy the backend curation repository to


> appropriately store, retrieve, search and audit datasets. Develop
> simple upload mechanisms for small and large datasets. Define a simple
> review and data integrity system to validate the depositions. See

> whether the journal gets any �data papers� submitted. This could be


> based on an existing infrastructure at the California Digital Library
> (CDL).
>

> � Phase II (one year) � If phase I is successful start to accumulate


> more metadata on entries through automated extraction based on data
> format and data type. An example of a popular data format will be MS
> Excel spreadsheets (coincidentally by June 2012 CDL will have released

> an open-source Excel �add-in� designed to make spreadsheet data easier


> to publish, share, and preserve); an example of a data type would be
> biological sequence data which has 1-2 well-structured forms and which
> does not belong in any existing repository.
>

> � Phase III (on-going) expand the capabilities established in phase II


> to visually respond to common and established data formats and data
> types. Data visualized in a traditional journal is static. Here the
> data can be bought to life, often using tools developed by communities
> themselves and collated by the Data Journal. If successful this will
> become an accepted form of publishing and the depositors/authors will
> treat it like any other journal and the merger between database and
> journal, predicted 6 years ago will be realized.
>
> Where Will This Be Done and By Whom?
> This remains open at this time, but one option is in partnership with
> PLoS or F1000, as a publisher and the University of California as a
> backend provider. The California Digital Library could provide data
> repository, curation, citation, and publishing services (with DataONE
> and DataCite) as well as long experience with aggregators and
> publishers.
>
> As a faculty advocate and database person I am willing to put
> significant effort into this, but it must be a broad team effort.
>
> How and Why Will It Be Funded?
> I have thought long and hard about this since the very successful

> �Beyond the PDF Workshop� (BtPDF), which we hosted at UCSD in January


> 2011. I have become convinced that to precipitate change in the way we
> manage scholarship in the digital era we need an exemplar that can
> touch your average scientist, who sees a need for change, but will not

> be part of it without a reward. A Data Journal is such a reward �


> citation and preservation of an important part of their scholarship.
> With great concern about preservation and open accessibility of
> digital data from funders, publishers and scientists, the time would
> seem right to take this step. The intent is to approach funders who
> are vested in the ideals of BtPDF to seed such a Data Journal
> initiative.
>

> To start we will initiate a joint call with supporters of BtPDF � the


> CDL, the National Cancer Institute (NCI), The Gordon and Betty Moore
> Foundation (GBMF), The Doris Duke Foundation, The Alfred P. Sloan
> Foundation, Sage Bionetworks, Science Commons and Microsoft with the
> goal of obtaining short term seed funding for Phase I. Later for Phase
> II the emerging team of like-minded folks who form and seek funding
> from NSF etc. for Phase II and a sustained open access business model
> for Phase III. That is, for Phase III researchers would have put a
> line item in their research budgets to cover the cost of preservation.
> For-profits will be encouraged to pay small amounts to upload data no
> longer of proprietary value and will be compensated with limited
> advertising.
>
> Acknowledgements
> Thus far thanks to Lee Dirks (Microsoft), Josh Greenberg (Sloan
> Foundation), Tony Hey (Microsoft), John Kunze (CDL), David Lipman
> (NLM), Chris Mentzel (GBMF), Mark Patterson (PLoS), Brian
> Schottlaender (UCSD Library) for discussions in coming to the
> conclusions presented here.
>
>

--
Dietrich Rebholz-Schuhmann, MD, PhD - Research Group Leader
EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD (UK)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
TM support:www.ebi.ac.uk/Rebholz-svr | tm-su...@ebi.ac.uk
ISMB/ECCB 2011: http://www.iscb.org/ismbeccb2011
Twitter: jbiomedsem

Paul Groth

unread,
Jun 28, 2011, 12:34:50 PM6/28/11
to beyond-...@googlegroups.com
Phil,

I think this would be fantastic. I also like the bootstrap approach you
outlined. One should require the minimal amount of information to make
it easy to add data.

This would be also be a good start to helping grow a community of
journals for publishing both data and experiments (e.g. Open Research
Computation [1])

regards,
Paul

[1] http://www.openresearchcomputation.com/

--
Dr. Paul Groth (p.t....@vu.nl)
http://www.few.vu.nl/~pgroth/
Knowledge Representation & Reasoning Group
Artificial Intelligence Section
Department of Computer Science
VU University Amsterdam

Tim Clark

unread,
Jun 28, 2011, 1:24:52 PM6/28/11
to beyond-...@googlegroups.com
Hi Phil,

I have a few basic thoughts relating to your idea for a Data Journal.

1 - Different domains of study have differing requirements and approaches regarding data, e.g. biology, social sciences, astronomy, medical (HIPPA), psychology...

2 - Technical keys to the problem may include common metadata specifications, stable identifiers, robust provenance, and reliability of the core data storage facilities.

3 - Social keys to the problem certainly include recruitment / motivation / reward for participation, and more broadly, incorporating a data journal into the workflow / communications ecosystem of the ordinary scientist.

4 - It would probably be mistaken to try to develop a one-size-fits-all complete metadata model for all domains at the outset. Any metadata model should be very basic but "regionally extensible".

5 - There are multiple formats that have been proposed for stable data identifiers, and these differences relate among other things to:
-- different existing repositories for the data -
-- different opinions and approaches to standard identifiers
-- different data citation models

6 - Fundamentally, "stable identifiers" are also a form of metadata - so-called "surrogate keys" in database parlance.

7 - A data repository is really only useful if its datasets are connected via data citation to the methods used to generate it and the interpretations given it by its originators.

I would particularly support creating a repository *pre-aligned with a journal*, for example PLOS itself, with a really useful citation mechanism for existing journals to opt into, requiring it of their contributors. This ensures that your prototype application answers point 7 above.

At the same time, one cannot expect that such a repository would be the only one or would have the "one ring" citation model. So it would also be important to provide for interoperability with other models and other repositories.

I would enjoy discussing this idea further one on one or in a small group.

The Dataverse Network (http://thedata.org) people have a very interesting open source software framework which might well be adaptable in an endeavor such as you propose. Merce Crosas, the Dataverse PM, and I have had a number of discussions on this topic and I am certain she also would be happy to chat further. Anita De Waard and I have discussed similar ideas.

Also recommend looking at some of the outputs of a workshop organized by Merce and her colleagues during May this year, http://projects.iq.harvard.edu/datacitation_workshop/

Best

Tim

The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

Tim Clark

unread,
Jun 28, 2011, 1:37:40 PM6/28/11
to beyond-...@googlegroups.com
PS Phil, in email below where I said "...PLOS itself..." I meant to say "PLOS One itself". 

Begin forwarded message:

Konrad Hinsen

unread,
Jun 28, 2011, 3:58:57 PM6/28/11
to beyond-...@googlegroups.com
On 28 Jun 2011, at 18:10, Phil B wrote:

> I found the announcement yesterday of a new high profile open access
> journal somewhat discouraging (see http://www.hhmi.org/news/20110627.html)
> .

Me too, but then it's not in my domain anyway.

> After much discussion with a number of people I have concluded that a
> logical next step is to consider starting a high visibility data
> journal for the reasons I outline below in a brief rationale. I am
> considering starting to chase funds for such an endeavor and before I
> do so I would very much appreciate the thoughts of this group on
> whether this is a good idea, whether it conflicts with other efforts

This is actually pretty similar to what I recently suggested in reply
to a proposal by Anita to start an "executable journal". The ideal
looks good to me and you have come up with good ideas concerning many
of the organizational aspects.

One topic you do not mention is refereeing. What would your data
journal do to ensure at the very least that the submitted data is what
it claims to be, and is usable by someone other than the submitting
authors? Without some minimal reviewing procedure, some people are
likely to deposit junk data in order to get a citable publication.

Konrad.
--
---------------------------------------------------------------------
Konrad Hinsen
Centre de Biophysique Moléculaire, CNRS Orléans
Synchrotron Soleil - Division Expériences
Saint Aubin - BP 48
91192 Gif sur Yvette Cedex, France
Tel. +33-1 69 35 97 15
E-Mail: research at khinsen dot fastmail dot net
---------------------------------------------------------------------

Jakob Voss

unread,
Jun 28, 2011, 5:02:35 PM6/28/11
to beyond-...@googlegroups.com
Hi Phil,


Technically we should compare a Data Journal with three existing
approaches:


1. Document Repositories like arXiv.
2. Data registries like CKAN
3. Source code repositories like github


I'd favour the third, but scientists will more likely be familiar with
the first. In my opinon it is crucial to be familiar with these
different approaches to know the strength and weaknesses of each
variant. If you add a peer-review to the journal, you will soon limit
its scope to some fields of research, but I think you cannot avoid some
topical focus anyway. Only experts in
a domain can judge data and metadata.


Jakob

--
Verbundzentrale des GBV (VZG)
Digitale Bibliothek - Jakob Vo�
Platz der Goettinger Sieben 1
37073 Goettingen - Germany
+49 (0)551 39-10242
http://www.gbv.de
jakob...@gbv.de

Peter Murray-Rust

unread,
Jun 28, 2011, 6:22:34 PM6/28/11
to beyond-...@googlegroups.com


On Tue, Jun 28, 2011 at 5:10 PM, Phil B <pebo...@gmail.com> wrote:
Hi:


After much discussion with a number of people I have concluded that a
logical next step is to consider starting a high visibility data
journal for the reasons I outline below in a brief rationale. I am
considering starting to chase funds for such an endeavor and before I
do so I would very much appreciate the thoughts of this group on
whether this is a good idea, whether it conflicts with other efforts
etc. I am less concerned about the technology to be used at this stage
and more concerned about whether this would move improved scholarship
in the right direction? Your council would be much appreciated.


In conjunction with IUCr We have started  to explore a "data journal" based on the data from Acta Cryst E and using Wordpress as the rendering engine and aiming at HTML5 as the carrier. We have also started to explore this in computational chemistry. The main problem, as always, is people not technology.

The main virtue - and its a small one but important - about the Wellcome/HHMI/MP journal (and I have only read the press release) is that it tries to regain control of the scholarly publishing process from the organised publishing sector which tends to stifle and ossify progress. Whether it will succeed depends on whether academics will give up self-centered publication for real communication. Which I doubt.

I would be delighted to be part of exploring data journals. I think they will might from bottom-up revolution aiming to democratise science. Because publishing data is about changing values and will face enormous opposition from established beneficiaries of the concentration on "the publication" (whether PDF or anything else). I would prefer us to publish things that work, that do things, that can be used. Software, data, protocols, etc. For these, I'm afraid we have to build reward systems. We shouldn't have to, but it seems essential.


P.


--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

Daniel Mietchen

unread,
Jun 28, 2011, 7:45:40 PM6/28/11
to beyond-...@googlegroups.com
Hi Phil,

I share your sentiment regarding the HHMI initiative and enjoyed
reading your alternative proposal.

While you have sketched out the technical hurdles and approaches in
quite some detail, I am missing a complement on the cultural side: as
PMR just pointed out, the key to cultural change is community
engagement, and this works better where the community already is than
in new venues. There are reasons why several scope-limited PLoS
journals were started before PLoS ONE, and many of them were cultural.
I think many of these would also apply to the launch of PLoS DATA,
with technical and legal issues further complicating the matter.

One way to go about this would be to try to get existing journals to
adopt a policy to add a statement on data sharing to each article they
publish, just as it is now common practice to have a statement on
potential conflicts of interest. Such data sharing statements are
already in effect at some journals [1] while under consideration at
others [2], and the little cultural change they require could be
achievable in a fair number of communities that do not yet have
mandatory data sharing [3]. If the statement is suitably implemented
(e.g. automatically generated on the basis of responses collected by a
standardized form), some basic metadata generation (and its flagging
[4]) could also be automated, so there would be little effort to be
expanded on the side of authors or publishers.

First results of such an approach (i.e. a standardized statement
generator, and journals using it) could realistically be expected on
the scale of months, thereby raising community awareness of the issue
already during your stage I of technical development. Other
community-based options - acting on different time scales - would be
getting involved with the standardization of technical aspects of data
sharing for scope-limited data journals [5], or even a SCOAP3
approach, which would also address the business model issue [6]. Given
a wider community involvement of such kind, significant community
participation in the technical development should be more easily
achievable than via a purely technical approach.

Note that a reward system is not required for anything thus far, but
these data statements (if machine readable) could serve as a basis for
a reward system that could later be expanded (e.g. to include reuse
metrics) as the technical development proceeds.

Thanks again for this stimulating proposal - I wish you could have
submitted it to a "Journal of Research Proposals" [7] for appropriate
reward.

Cheers,


Daniel

[1] http://dx.doi.org/10.1136/bmj.c564
[2] http://blogs.openaccesscentral.com/blogs/bmcblog/entry/dear_scientist_help_us_to
[3] http://www.datadryad.org/jdap
[4] http://onsclaims.wikispaces.com/
[5] http://www.gbif.org/communications/news-and-events/showsingle/article/new-incentive-for-biodiversity-data-publishing/
[6] http://scoap3.org/
[7] http://evomri.net/?page_id=36

john wilbanks

unread,
Jun 28, 2011, 8:26:29 PM6/28/11
to beyond-...@googlegroups.com
Just to make sure everyone on the list knows, there are already several
"data" journals - we have collected some requirements and some links to
existing efforts in Jonathan Rees's "Recommendations for independent
scholarly publication of data sets" (available at
http://sciencecommons.org/wp-content/uploads/datapaperpaper.pdf ) -
indeed, ecological archives has been publishing data for 11 years.
There's a lot to learn there.

More recently I have been watching the Journal of Network Biology from
the inside, and although there are technical challenges, the social
challenges of what it means to "peer review" a data set make the
technology look simple. And the odds are good that the first pass on
what review means will be wrong, and need to be incrementally improved
for a while, or perhaps even thrown out for something that is less wrong
and upon which incremental innovation can occur.

jtw

--
John Wilbanks
VP for Science
Creative Commons
web: http://creativecommons.org/science
blog: http://scienceblogs.com/commonknowledge
twitter: @wilbanks

Philip Bourne

unread,
Jun 28, 2011, 8:32:40 PM6/28/11
to beyond-...@googlegroups.com, Phil B
Hi:

First a confession - What I am really suggesting is not a Data Journal but a database, but my faculty (my metric for success) do not respect nor understand databases. So lets call it a journal (which they revere) even though it is a database underneath. Its only a white lie :) Now to your points:

I agree that there needs to be a way to cite the data and working with DataCite or others makes perfect sense, but it is not enough to entice your average scientist in my view. You need to be able to cite the dataset also with a journal like citation (journal name, year, volume etc.) and include that in the references to a paper. Even then it will not be given much credence in the beginning, but it could grow in perceived value in same way perception of the value of PLoS ONE articles have changed. What drove the change in my opinion in the perception of ONE was the impact factor and article level metrics showing the value of the papers therein. Data level metrics are a start to show that. You cant ignore the relative value of a paper only cited by the original authors vs a dataset that is downloaded by 100 people and cited by 10 in the papers resulting from its use. In time (I dream) data citations and paper citations will become indistinguishable.

The issues of data types, metadata extraction, validation etc are huge of course, but often solved in part by the respective labs who generate the data, but for their own needs (unless managed by a community database). There lies the beauty (I dream again) these tools now become part of the solution in a shared space and are themselves open.

None of this precludes relationships with journals and the articles that say more about the data that in the data journal itself. In fact publishers should be relieved since it relieves them of the burden of trying to handle data. What makes what I am thinking different than something like Dryad which is doing this with published articles is that there does not have to be a publication now or ever.

Cheers../Phil B

>> for data – an open access place for data that are sound, but not


>> necessarily expected to have high impact on science. If that sounds
>> less than promising, recall the beginnings of PLoS ONE, which was
>> initially considered below par, but proved otherwise and became the
>> worlds largest journal with lots of value.
>>
>> Scientists need an open access place for data since there are many
>> datasets that do not fit well within existing repositories or are not
>> suitable for inclusion in a publication (the so-called long-tail
>> problem). With a lightweight documented calling interface (API), apps
>> can easily be developed by the community to operate on subsets of the
>> data (including reviewing the data) in the same way that apps are
>> built to maximize the utility of a piece of hardware or a corpus of

>> literature, as is the case with something like Elsevier’s SciVerse.


>> The more data are published this way, the more useful the data journal
>> becomes, which in turn encourages more data publication. The sum of

>> the parts are greater than the whole – not true of a typical journal.


>>
>> A number of pieces will be critical to a successful data journal.
>> Understanding the needs of a scientific community and getting its
>> feedback will keep the journal relevant; this is an ongoing activity
>> effectively is achieved by establishing an appropriate social
>> infrastructure. Providing a simple, bullet-proof rights framework will
>> ensure that needless barriers to sharing and preservation are
>> minimized. Usage and impact metrics should be enabled by citation
>> conventions backed up by unique persistent identifiers for datasets

>> and researchers themselves. Some concept of “quality”, if not peer


>> review, also needs to be defined and piloted.
>>
>> The impact on scientific data analysis could be profound. The devil is

>> in the details – how can this be made versatile enough to embrace

>> Bank (PDB), Genbank, and PharmGKB – These are functioning well, but


>> are expensive to operate and there is no business model beyond
>> continued federal funding. As a result some of these resources have
>> closed (e.g., the Arabidopsis database) or are under threat.
>>
>> 2. Repositories that contain data specifically related to a
>> publication, but for which no recognized domain specific repository
>> exists. Dryad is an example of such a repository. Publishers have
>> associated with these resources since it relieves them of the burden
>> of managing the data as a large supplement to the paper, which many
>> publishers are ill-equipped to do. These resources are just emerging
>> and seem destined to also continue to rely on federal funding or buy-
>> in from the publishers, in other words, there is no established
>> business model to provide sustainability at this time.
>>
>> 3. Institutional repositories that manage the data produced by
>> researchers at a given institution. At this time their success is far

>> from assured – what is the incentive for a data producer to put data


>> there? What is the business model for their sustainability? How do
>> institutional repositories relate to each other? There are exceptions
>> (e.g., the California Digital Library) but even there I would say
>> there is a failure to achieve broad buy-in by faculty. There need to
>> be faculty who have seen a significant gain and who champion such
>> developments.
>>
>> 4. Grass roots efforts, e.g., Sage Bionetworks, which are pushing an
>> open data model and are usually domain specific, e.g., disease
>> modeling.
>>
>> In reality, much of the data generated, most of which conforms to the

>> long tail – the very large number of small datasets not handled {well}

>> Notwithstanding the provision of a reward, the barrier to “publishing”


>> with the Data Journal must be kept very low, at least initially. This
>> implies collecting only a minimal amount of metadata to provide
>> provenance and to identify the dataset uniquely. Over time, based on
>> demand from scientists to publish in the Data Journal, the
>> requirements can be ramped up, but also offset by increased
>> automation. Here are three suggested phases of development:
>>

>> • Phase I (one year) – deploy the backend curation repository to


>> appropriately store, retrieve, search and audit datasets. Develop
>> simple upload mechanisms for small and large datasets. Define a simple
>> review and data integrity system to validate the depositions. See

>> whether the journal gets any “data papers” submitted. This could be


>> based on an existing infrastructure at the California Digital Library
>> (CDL).
>>

>> • Phase II (one year) – If phase I is successful start to accumulate


>> more metadata on entries through automated extraction based on data
>> format and data type. An example of a popular data format will be MS
>> Excel spreadsheets (coincidentally by June 2012 CDL will have released

>> an open-source Excel “add-in” designed to make spreadsheet data easier


>> to publish, share, and preserve); an example of a data type would be
>> biological sequence data which has 1-2 well-structured forms and which
>> does not belong in any existing repository.
>>

>> • Phase III (on-going) expand the capabilities established in phase II


>> to visually respond to common and established data formats and data
>> types. Data visualized in a traditional journal is static. Here the
>> data can be bought to life, often using tools developed by communities
>> themselves and collated by the Data Journal. If successful this will
>> become an accepted form of publishing and the depositors/authors will
>> treat it like any other journal and the merger between database and
>> journal, predicted 6 years ago will be realized.
>>
>> Where Will This Be Done and By Whom?
>> This remains open at this time, but one option is in partnership with
>> PLoS or F1000, as a publisher and the University of California as a
>> backend provider. The California Digital Library could provide data
>> repository, curation, citation, and publishing services (with DataONE
>> and DataCite) as well as long experience with aggregators and
>> publishers.
>>
>> As a faculty advocate and database person I am willing to put
>> significant effort into this, but it must be a broad team effort.
>>
>> How and Why Will It Be Funded?
>> I have thought long and hard about this since the very successful

>> “Beyond the PDF Workshop” (BtPDF), which we hosted at UCSD in January


>> 2011. I have become convinced that to precipitate change in the way we
>> manage scholarship in the digital era we need an exemplar that can
>> touch your average scientist, who sees a need for change, but will not

>> be part of it without a reward. A Data Journal is such a reward –


>> citation and preservation of an important part of their scholarship.
>> With great concern about preservation and open accessibility of
>> digital data from funders, publishers and scientists, the time would
>> seem right to take this step. The intent is to approach funders who
>> are vested in the ideals of BtPDF to seed such a Data Journal
>> initiative.
>>

>> To start we will initiate a joint call with supporters of BtPDF – the

Philip Bourne

unread,
Jun 28, 2011, 8:33:59 PM6/28/11
to beyond-...@googlegroups.com
Agreed.. cheers../Phil B

>> for data – an open access place for data that are sound, but not


>> necessarily expected to have high impact on science. If that sounds
>> less than promising, recall the beginnings of PLoS ONE, which was
>> initially considered below par, but proved otherwise and became the
>> worlds largest journal with lots of value.
>>
>> Scientists need an open access place for data since there are many
>> datasets that do not fit well within existing repositories or are not
>> suitable for inclusion in a publication (the so-called long-tail
>> problem). With a lightweight documented calling interface (API), apps
>> can easily be developed by the community to operate on subsets of the
>> data (including reviewing the data) in the same way that apps are
>> built to maximize the utility of a piece of hardware or a corpus of

>> literature, as is the case with something like Elsevier’s SciVerse.


>> The more data are published this way, the more useful the data journal
>> becomes, which in turn encourages more data publication. The sum of

>> the parts are greater than the whole – not true of a typical journal.


>>
>> A number of pieces will be critical to a successful data journal.
>> Understanding the needs of a scientific community and getting its
>> feedback will keep the journal relevant; this is an ongoing activity
>> effectively is achieved by establishing an appropriate social
>> infrastructure. Providing a simple, bullet-proof rights framework will
>> ensure that needless barriers to sharing and preservation are
>> minimized. Usage and impact metrics should be enabled by citation
>> conventions backed up by unique persistent identifiers for datasets

>> and researchers themselves. Some concept of “quality”, if not peer


>> review, also needs to be defined and piloted.
>>
>> The impact on scientific data analysis could be profound. The devil is

>> in the details – how can this be made versatile enough to embrace

>> Bank (PDB), Genbank, and PharmGKB – These are functioning well, but


>> are expensive to operate and there is no business model beyond
>> continued federal funding. As a result some of these resources have
>> closed (e.g., the Arabidopsis database) or are under threat.
>>
>> 2. Repositories that contain data specifically related to a
>> publication, but for which no recognized domain specific repository
>> exists. Dryad is an example of such a repository. Publishers have
>> associated with these resources since it relieves them of the burden
>> of managing the data as a large supplement to the paper, which many
>> publishers are ill-equipped to do. These resources are just emerging
>> and seem destined to also continue to rely on federal funding or buy-
>> in from the publishers, in other words, there is no established
>> business model to provide sustainability at this time.
>>
>> 3. Institutional repositories that manage the data produced by
>> researchers at a given institution. At this time their success is far

>> from assured – what is the incentive for a data producer to put data


>> there? What is the business model for their sustainability? How do
>> institutional repositories relate to each other? There are exceptions
>> (e.g., the California Digital Library) but even there I would say
>> there is a failure to achieve broad buy-in by faculty. There need to
>> be faculty who have seen a significant gain and who champion such
>> developments.
>>
>> 4. Grass roots efforts, e.g., Sage Bionetworks, which are pushing an
>> open data model and are usually domain specific, e.g., disease
>> modeling.
>>
>> In reality, much of the data generated, most of which conforms to the

>> long tail – the very large number of small datasets not handled {well}

>> Notwithstanding the provision of a reward, the barrier to “publishing”


>> with the Data Journal must be kept very low, at least initially. This
>> implies collecting only a minimal amount of metadata to provide
>> provenance and to identify the dataset uniquely. Over time, based on
>> demand from scientists to publish in the Data Journal, the
>> requirements can be ramped up, but also offset by increased
>> automation. Here are three suggested phases of development:
>>

>> • Phase I (one year) – deploy the backend curation repository to


>> appropriately store, retrieve, search and audit datasets. Develop
>> simple upload mechanisms for small and large datasets. Define a simple
>> review and data integrity system to validate the depositions. See

>> whether the journal gets any “data papers” submitted. This could be


>> based on an existing infrastructure at the California Digital Library
>> (CDL).
>>

>> • Phase II (one year) – If phase I is successful start to accumulate


>> more metadata on entries through automated extraction based on data
>> format and data type. An example of a popular data format will be MS
>> Excel spreadsheets (coincidentally by June 2012 CDL will have released

>> an open-source Excel “add-in” designed to make spreadsheet data easier


>> to publish, share, and preserve); an example of a data type would be
>> biological sequence data which has 1-2 well-structured forms and which
>> does not belong in any existing repository.
>>

>> • Phase III (on-going) expand the capabilities established in phase II


>> to visually respond to common and established data formats and data
>> types. Data visualized in a traditional journal is static. Here the
>> data can be bought to life, often using tools developed by communities
>> themselves and collated by the Data Journal. If successful this will
>> become an accepted form of publishing and the depositors/authors will
>> treat it like any other journal and the merger between database and
>> journal, predicted 6 years ago will be realized.
>>
>> Where Will This Be Done and By Whom?
>> This remains open at this time, but one option is in partnership with
>> PLoS or F1000, as a publisher and the University of California as a
>> backend provider. The California Digital Library could provide data
>> repository, curation, citation, and publishing services (with DataONE
>> and DataCite) as well as long experience with aggregators and
>> publishers.
>>
>> As a faculty advocate and database person I am willing to put
>> significant effort into this, but it must be a broad team effort.
>>
>> How and Why Will It Be Funded?
>> I have thought long and hard about this since the very successful

>> “Beyond the PDF Workshop” (BtPDF), which we hosted at UCSD in January


>> 2011. I have become convinced that to precipitate change in the way we
>> manage scholarship in the digital era we need an exemplar that can
>> touch your average scientist, who sees a need for change, but will not

>> be part of it without a reward. A Data Journal is such a reward –


>> citation and preservation of an important part of their scholarship.
>> With great concern about preservation and open accessibility of
>> digital data from funders, publishers and scientists, the time would
>> seem right to take this step. The intent is to approach funders who
>> are vested in the ideals of BtPDF to seed such a Data Journal
>> initiative.
>>

>> To start we will initiate a joint call with supporters of BtPDF – the

Philip Bourne

unread,
Jun 28, 2011, 8:52:57 PM6/28/11
to beyond-...@googlegroups.com
Thanks Tim..

re 1 - true domains are very different, capturing a few might be enough to start change... Science and Nature cross disciplines so may a Data Journal

re 2 - agreed - but many of these problems are well addressed by this wonderful community and only need be applied

re 3 - yes this is the big hurdle in my view - but wait till you have to prove your data sharing to get your next grant - it will be a cake walk at that point.

re 4 - agreed

re 5,6 - agreed so who wins? Maybe it will be teh one that delivers useful data to a customer - okay a circular argument since you have to identify the data, but my point is a focus on data sharing first. If the PDB (with its own identifiers) and GenBank (with its own) said we have a new system that covers both datatypes chances are it would be adopted (ie cited in other places including papers).

re 7 - true of immature datatypes, is it true of say SNPs (a more mature datatype) - yes in theory but in practice, not so sure .. this requires discussion for sure.

I dont claim anything re the idea - others as you name have thought about this deeply - that the idea reoccurs tells us something positive, the trick is to go from idea to resource to my fellow faculty clamoring for data citations.

Lots to chat about.. cheers../Phil

John A. Kunze

unread,
Jun 29, 2011, 12:11:33 AM6/29/11
to beyond-...@googlegroups.com
To put a finer point on it, Jonathan's article elegantly describes the
emerging genre of the "data paper" as it is practiced in a number of
disciplines, and which has been appearing in special sections of sundry
journals. Data journals are rarer, and journals-as-databases even rarer.

-John

David Shotton

unread,
Jun 29, 2011, 2:32:53 AM6/29/11
to beyond-...@googlegroups.com, Phil B, Lyubomir Penev
Folk will, I hope, be interested to look at the Data Publishing Policies and Guidelines for Biodiversity Data recently published by Pensoft Journals (http://www.pensoft.net/journals/), which specializes in publishing biodiversity and biological systematics papers, and which has taken the lead in promoting the publication of datasets with DOIs.

��������� Penev L, Mietchen D, Chavan V, Hagedorn G, Remsen D, Smith V, Shotton D (2011). Pensoft Data Publishing Policies and Guidelines for Biodiversity Data. Pensoft Publishers, http://www.pensoft.net/J_FILES/Pensoft_Data_Publishing_Policies_and_Guidelines.pdf.

�David



On 28/06/2011 17:10, Phil B wrote:
).
Do not get me wrong, support for open access is of course welcome, but
that this is the best a group of top scientists could come up with to
improve scholarship leaves me wondering.  It is my opinion we need to
coax these folks to greater thoughts through initiatives that they can
see make sense, even if not initially, but soon. In my opinion the SMA
application that came from the workshop in Jan. is a great example.
With that under way I started to contemplate what else could be done.

After much discussion with a number of people I have concluded that a
logical next step is to consider starting a high visibility data
journal for the reasons I outline below in a brief rationale. I am
considering starting to chase funds for such an endeavor and before I
do so I would very much appreciate the thoughts of this group on
whether this is a good idea, whether it conflicts with other efforts
etc. I am less concerned about the technology to be used at this stage
and more concerned about whether this would move improved scholarship
in the right direction? Your council would be much appreciated.

Best../Phil B.

A Proposal for a Data Journal
Philip E Bourne PhD June 24, 2011

Executive Summary

After thinking long and hard post the Beyond the PDF workshop  and
talking at length to scientists, publishers, librarians, funders and
others vested in bettering scholarship I have concluded we need a bold
yet doable step which will engage your typical faculty member in the
process of change. Without broad active participation of scientists at
large we will continue to invoke change at a glacial pace and in just
a small number of disciplines.

Participation requires reward and other perceived gains in a
competitive workplace. A Data Journal  can provide that reward and
gain. The scientist receives a true citation that can be used in
traditional journals for their dataset. These data are preserved and
the scientist is in compliance with emerging data sharing policies
being put in place by the funding agencies.  Think of it as PLoS ONE
for data � an open access place for data that are sound, but not
necessarily expected to have high impact on science. If that sounds
less than promising, recall the beginnings of PLoS ONE, which was
initially considered below par, but proved otherwise and became the
worlds largest journal with lots of value.

Scientists need an open access place for data since there are many
datasets that do not fit well within existing repositories or are not
suitable for inclusion in a publication  (the so-called long-tail
problem). With a lightweight documented calling interface (API), apps
can easily be developed by the community to operate on subsets of the
data (including reviewing the data) in the same way that apps are
built to maximize the utility of a piece of hardware or a corpus of
literature, as is the case with something like Elsevier�s SciVerse.
The more data are published this way, the more useful the data journal
becomes, which in turn encourages more data publication. The sum of
the parts are greater than the whole � not true of a typical journal.

A number of pieces will be critical to a successful data journal.
Understanding the needs of a scientific community and getting its
feedback will keep the journal relevant; this is an ongoing activity
effectively is achieved by establishing an appropriate social
infrastructure. Providing a simple, bullet-proof rights framework will
ensure that needless barriers to sharing and preservation are
minimized. Usage and impact metrics should be enabled by citation
conventions backed up by unique persistent identifiers for datasets
and researchers themselves.  Some concept of �quality�, if not peer
review, also needs to be defined and piloted.

The impact on scientific data analysis could be profound. The devil is
in the details � how can this be made versatile enough to embrace
multiple types of data retrieved in a way that are comfortable to the
intended discipline of users?  This is a problem that outside of the
database community does not seem to be well solved. Thus database
developers need to be part of the solution.  At the same time, it will
be important that the effort not try to run before it can walk; solid,
forward-thinking infrastructure first, (but definitely not a build it
and they will come, but rather built it with them so that they might
come) with enriched application development later.

The objective proposed here is to prototype such a database (but call
it a journal) in one year. The prototype development requires willing
data contributors, database developers, publishers, software engineers
and others engaged in digital scholarship. The process involves
requirements gathering from scientists, a workshop to define the
technical requirements and the subsequent prototyping (possibly using
an existing system) with a goal to launch a Data Journal in the second
year.

You are receiving this document because in some way you could
contribute to this effort. I hope you will. Comments welcome at any
time.

Problem Statement
Traditional scholarly communication is struggling to adapt to the
Internet era. One notable shortcoming (drawing from the biosciences,
but more generally applicable) is that large amounts of digital data
are being generated, often from high-throughput experiments, which may
or may not be associated with a publication, and subsequently lost .
True, some of that data finds its way into existing repositories that
I would characterize as follows:

1.      Established repositories with stable funding where the incentive to
deposit comes from a prerequisite defined by the journal accepting a
publication associated with the data. Examples are the Protein Data
Bank (PDB), Genbank, and PharmGKB � These are functioning well, but
are expensive to operate and there is no business model beyond
continued federal funding. As a result some of these resources have
closed (e.g., the Arabidopsis database) or are under threat.

2.      Repositories that contain data specifically related to a
publication, but for which no recognized domain specific repository
exists. Dryad is an example of such a repository. Publishers have
associated with these resources since it relieves them of the burden
of managing the data as a large supplement to the paper, which many
publishers are ill-equipped to do. These resources are just emerging
and seem destined to also continue to rely on federal funding or buy-
in from the publishers, in other words, there is no established
business model to provide sustainability at this time.

3.      Institutional repositories that manage the data produced by
researchers at a given institution. At this time their success is far
from assured � what is the incentive for a data producer to put data
there? What is the business model for their sustainability? How do
institutional repositories relate to each other? There are exceptions
(e.g., the California Digital Library) but even there I would say
there is a failure to achieve broad buy-in by faculty. There need to
be faculty who have seen a significant gain and who champion such
developments.

4.      Grass roots efforts, e.g., Sage Bionetworks, which are pushing an
open data model and are usually domain specific, e.g., disease
modeling.

In reality, much of the data generated, most of which conforms to the
long tail � the very large number of small datasets not handled {well}
by these repositories - are lost. While much of these data could
easily be reproduced and many will never be in demand, there is also
valuable data being lost and no business model for sustaining such
data. Efforts like DataCite  are a step in the right direction but a
DOI alone is not enough to incentivize the scientific community at
large.

What is the Solution and Why Will it Work?
The solution is to establish a data journal. What does this mean and
why will it work?

The major impediment for many researchers in preserving their data is
incentive. Funding agencies, e.g., NIH, NSF are providing incentive
through mandatory data sharing policies, but at this time there is
confusion surrounding these policies and no indication of, if and how
they will be enforced, or indeed where to store the data to conform to
the policies!

Perhaps more important, there is no reward for depositing data, except
in the case of 1 above, but that covers only a small amount of the
data generated. A data journal provides that reward through a citation
(e.g., hypothetically  PLoS  Data 2011 6:3 d000001) that can be
included on resumes, promotion files etc. Associated with the citation
will be a DOI (from DataCite or elsewhere) that provides a definitive
resolvable reference to that dataset. Initially the value of that
citation will be limited, but over time we anticipate it will be
accepted by aggregation services, e.g., Thomson Reuters ISI and
PubMed.

Another reason a Data Journal will be successful is the corresponding
demise of the research article. Already many research articles do
little else than report on a set of data, a Data Journal simply
acknowledges the fact. The goal of a Data Journal is to drive the
notion that one day (it should have been yesterday!) a research
article with zero citations (beyond self citing) will be less valuable
that a citable dataset that has been downloaded by many researchers
worldwide (bibliometrics on access to the data and commentary by users
of the data would be maintained from day 1).

How Will This Work?
Notwithstanding the provision of a reward, the barrier to �publishing�
with the Data Journal must be kept very low, at least initially. This
implies collecting only a minimal amount of metadata to provide
provenance and to identify the dataset uniquely. Over time, based on
demand from scientists to publish in the Data Journal, the
requirements can be ramped up, but also offset by increased
automation. Here are three suggested phases of development:

�       Phase I (one year) � deploy the backend curation repository to
appropriately store, retrieve, search and audit datasets. Develop
simple upload mechanisms for small and large datasets. Define a simple
review and data integrity system to validate the depositions. See
whether the journal gets any �data papers� submitted. This could be
based on an existing infrastructure at the California Digital Library
(CDL).

�       Phase II (one year) � If phase I is successful start to accumulate
more metadata on entries through automated extraction based on data
format and data type. An example of a popular data format will be MS
Excel spreadsheets (coincidentally by June 2012 CDL will have released
an open-source Excel �add-in� designed to make spreadsheet data easier
to publish, share, and preserve); an example of a data type would be
biological sequence data which has 1-2 well-structured forms and which
does not belong in any existing repository.

�       Phase III (on-going) expand the capabilities established in phase II
to visually respond to common and established data formats and data
types. Data visualized in a traditional journal is static. Here the
data can be bought to life, often using tools developed by communities
themselves and collated by the Data Journal. If successful this will
become an accepted form of publishing and the depositors/authors will
treat it like any other journal and the merger between database and
journal, predicted 6 years ago  will be realized.

Where Will This Be Done and By Whom?
This remains open at this time, but one option is in partnership with
PLoS or F1000, as a publisher and the University of California as a
backend provider.  The California Digital Library could provide data
repository, curation, citation, and publishing services (with DataONE
and DataCite) as well as long experience with aggregators and
publishers.

As a faculty advocate and database person I am willing to put
significant effort into this, but it must be a broad team effort.

How and Why Will It Be Funded?
I have thought long and hard about this since the very successful
�Beyond the PDF Workshop� (BtPDF), which we hosted at UCSD in January
2011. I have become convinced that to precipitate change in the way we
manage scholarship in the digital era we need an exemplar that can
touch your average scientist, who sees a need for change, but will not
be part of it without a reward. A Data Journal is such a reward �
citation and preservation of an important part of their scholarship.
With great concern about preservation and open accessibility of
digital data from funders, publishers and scientists, the time would
seem right to take this step. The intent is to approach funders who
are vested in the ideals of BtPDF to seed such a Data Journal
initiative.

To start we will initiate a joint call with supporters of BtPDF � the
CDL, the National Cancer Institute (NCI), The Gordon and Betty Moore
Foundation (GBMF), The Doris Duke Foundation, The Alfred P. Sloan
Foundation, Sage Bionetworks, Science Commons and Microsoft with the
goal of obtaining short term seed funding for Phase I. Later for Phase
II the emerging team of like-minded folks who form and seek funding
from NSF etc. for Phase II and a sustained open access business model
for Phase III. That is, for Phase III researchers would have put a
line item in their research budgets to cover the cost of preservation.
For-profits will be encouraged to pay small amounts to upload data no
longer of proprietary value and will be compensated with limited
advertising.

Acknowledgements
Thus far thanks to Lee Dirks (Microsoft), Josh Greenberg (Sloan
Foundation), Tony Hey (Microsoft), John Kunze (CDL), David Lipman
(NLM), Chris Mentzel (GBMF), Mark Patterson (PLoS), Brian
Schottlaender (UCSD Library) for discussions in coming to the
conclusions presented here.


--

Dr David Shotton��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� � � � � � �david....@zoo.ox.ac.uk
Reader in Image Bioinformatics

Image Bioinformatics Research Group � � � � � � � � � � � � � � � � http://ibrg.zoo.ox.ac.uk
Department of Zoology, University of Oxford� � � � � � � � � � tel: +44-(0)1865-271193
South Parks Road, Oxford OX1 3PS, UK� ��� � � � � � � � � � �fax: +44-(0)1865-310447

in...@pensoft.net

unread,
Jun 29, 2011, 2:59:55 AM6/29/11
to David Shotton, beyond-...@googlegroups.com, Phil B
Dear Phil,

We are planning to launch our Biodiversity Data Journal within a year time and develop a special online platform for it. I would be happy to discuss and cooperate in launching a similar kind of data journals.

Best regards,
Lyubomir
Pensoft Publishers
-- 
Dr Lyubomir Penev
Managing Director
Pensoft Publishers
13a Geo Milev Street
1111 Sofia, Bulgaria
Tel +359-2-8704281
Fax +359-2-8704282
www.pensoft.net
www.pensoft.net/journals/zookeys
www.pensoft.net/journals/phytokeys
in...@pensoft.net

Konrad Hinsen

unread,
Jun 29, 2011, 3:44:39 AM6/29/11
to beyond-...@googlegroups.com
On 29 Jun, 2011, at 24:22 , Peter Murray-Rust wrote:

> I would be delighted to be part of exploring data journals. I think they will might from bottom-up revolution aiming to democratise science. Because publishing data is about changing values and will face enormous opposition from established beneficiaries of the concentration on "the publication" (whether PDF or anything else). I would prefer us to publish things that work, that do things, that can be used. Software, data, protocols, etc. For these, I'm afraid we have to build reward systems. We shouldn't have to, but it seems essential.

I am less pessimistic about this. I believe that the scientific reward system (citations) will slowly evolve to include data citations and software citations, as soon as they become as easy to handle as today's paper citations. A data journal whose contents can be referenced exactly like today's journals could clearly help there. Once it is referenced by the big citation databases, data is part of "the system".

Another point worth investigating is the search for natural allies in this process. One group of candidates is big instruments such as synchrotrons. Their output is raw data, so they have an interest in data being highly valued. Since they are big, they are very visible and well-known to science policy makers. Suppose that big instruments would start to make all data collected on their site available after a one-year period, except if the experimenters pay a fee for exclusive access. We'd have tons of published data very quickly. But this does require an integration into today's reward system, i.e. citations.

Konrad.
--
---------------------------------------------------------------------
Konrad Hinsen
Centre de Biophysique Moléculaire, CNRS Orléans
Synchrotron Soleil - Division Expériences
Saint Aubin - BP 48
91192 Gif sur Yvette Cedex, France
Tel. +33-1 69 35 97 15

E-Mail: research AT khinsen DOT fastmail DOT net
---------------------------------------------------------------------

cameron...@stfc.ac.uk

unread,
Jun 29, 2011, 4:10:26 AM6/29/11
to beyond-...@googlegroups.com
Hi All

No time to write anything substantive but just wanted to make two points.

Every organization and their dog is setting up something that broadly fits into the category of data journal or repository at the moment. BMC are looking at it, F1000 are building something, Dryad is operational, DataCite are doing great work, other publishers are working on journals, and others in the thread have mentioned other things. Rather than add to this I think trying pull these efforts together to ensure that they are complementary, and not just going to compete so nothing gets achieved would be more valuable. The one thing all of these efforts have in common is precisely the 'giving data a journal like citation' game.

Secondly the question of evaluation and reward is key on my view. The citation is not enough, nor is a DOI but both are tremendously useful steps. (as an aside I'm with Phil on wishing we weren't being driven into using DOIs but they seem to solve the key social problem so I will live with the potential downstream technical considerations) The key is making these citations matter. I have had lots of recent productive discussions with major funders that indicate that they desperately want to make the publication of data matter to the people they fund. The direction is there, the technical means are largely in hand, the challenge lies in making those meet in the middle.

So I'd like to offer an alternative approach, at least as a straw man. We cheat the system. We know funders care. We know we can make the citation game work technically. So we don't tell other people, just let them observe as we work the system to our advantage. Present the slam dunk data management plan. Show the citations stats for your data on your CV. These are dog whistles to members of our club when sitting on panels, and will be seen and noted by key people at funding agencies. It wont initially have much of an effect at institutional levels but I think we can start to have it make a difference at funder level over the next 12-24 months. I'm not saying to hide what is going on but actually create a visible 'old boys club' that others will then want to join. I think we can look to see that data citation and publication are explicitly included on CV Or bio sketch requirements for many key funders over the next year or so. If we are ready to exploit that ourselves then that may send a much stronger message out to the wider community than 'please use and cite my data' and 'here is yet another journal like thing to think about'.

As I say, bit of a straw man but interested to see people's response. I will try to write up a more extensive response to the Wellcome/Hughes/MPG announcement. I was actually at the meeting and you can bet I was arguing for more radical steps but I think they may have actually got the balance about right for the aims they have. Reading between the lines I think some of the more radical technical suggestions might make it through although this will depend a lot on the conservatism of the EiC. I've written one post on it (most recent one at http://cameronneylon.net) and will try to write another to capture some of the discussion when I have a moment.

Cheers

Cameron

--
Scanned by iCritical.

cameron...@stfc.ac.uk

unread,
Jun 29, 2011, 4:21:36 AM6/29/11
to beyond-...@googlegroups.com

, 29 Jun 2011, at 08:45, "Konrad Hinsen" <rese...@khinsen.fastmail.net<mailto:rese...@khinsen.fastmail.net>> wrote:

Another point worth investigating is the search for natural allies in this process. One group of candidates is big instruments such as synchrotrons. Their output is raw data, so they have an interest in data being highly valued. Since they are big, they are very visible and well-known to science policy makers. Suppose that big instruments would start to make all data collected on their site available after a one-year period, except if the experimenters pay a fee for exclusive access. We'd have tons of published data very quickly. But this does require an integration into today's reward system, i.e. citations.

We are actually doing a series of things along these lines at ISIS (UKs national neutron source). The technical infrastructure is going in to make the raw data published in some sense, although this is nit terribly useful in it's own right, and there is a move towards making things available on some timeframe. At the same time I am trying to work on capturing more of the processed data at least for for some types of experiments in a way that can be co-published with or at least accessible from the same places as the raw data will be. There is a lot of concern in our organization about how to reward both data sharing but also give credit for code sharing and development. The thinking isn't as yet sophisticated as this group but the direction of travel is right.

Cheers

Cameron
--
Scanned by iCritical.

Phillip Lord

unread,
Jun 29, 2011, 6:33:12 AM6/29/11
to beyond-...@googlegroups.com

I think that you are being overly negative about this. I think that a
data journal, should be a journal. Let me explain.

Different disciplines have very different requirements for databases and
data repositories, both in terms of the data and the metadata. The one
thing in common is that they do need metadata.

As you know, in biology the development of minimal information standards
has become a bit of cottage industry. As it's broadest, a minimal
information standard is basically a document, describing a dataset. In
short, it's a publication.

So, to me, this would be what a data journal would be; a collection of
reports about the existance of a dataset, using structure and
organisation where appropriate data standards exist, and using free text
where they do not.

If you want a equivalent example, I have released various bits of code
that I written over the years. When I want to refer to something that I
have written, I generally send the URI to the *documentation* and not a
tarball, or version control system which is where the actual digitial
object is. Obviously, the documentation will contain a link to digital
object.

The key difference here between the current system is that the paper
would not require an experiment, would not require a thesis, or analysis
of the results; it would not need a significant introduction, nor would
the paper have to start with the phrase "In recent years...".

Phil Lord

Waard, Anita de A (ELS-AMS)

unread,
Jun 30, 2011, 10:47:01 AM6/30/11
to beyond-...@googlegroups.com
Very much agree with Cameron and others about connecting to existing efforts; just wanted to add Astronomy as a 'natural ally' - I am sure Mike Kurtz has a much better insight re. data citations in astronomy, but but this list is quite impressive already: http://www.astro.caltech.edu/~pls/astronomy/archives.html

Also, here's a copy-paste from the Beyond the PDF website list of links on Data Citation:

* The Australian National Data Service has a nice page on data citation awareness: http://ands.org.au/guides/data-citation-awareness.html
* The challenges with tracking dataset reuse today, based on DOIs and paper-oriented tools: http://researchremix.wordpress.com/2010/11/09/tracking-dataset-citations-using-common-citation-tracking-tools-doesnt-work/
* Gary King on data sharing http://gking.harvard.edu/projects/repl.shtml
* UNF:
o To ensure that the data set can be used for replication, one recommendation is that in addition to a handle, DOI, or other identifier, a universal numerical fingerprint (UNF) is used. UNF was proposed in Micah Altman, Gary King, 2007. "A Proposed Standard for the Scholarly Citation of Quantitative Data", D-Lib 13(3/4) http://www.dlib.org/dlib/march07/altman/03altman.html
o Example: http://thedata.org/citation/standard

Best,

- Anita.

Anita de Waard
Disruptive Technologies Director, Elsevier Labs
http://elsatglabs.com/labs/anita/
a.de...@elsevier.com

-----Original Message-----
From: beyond-...@googlegroups.com on behalf of cameron...@stfc.ac.uk
Sent: Wed 6/29/2011 4:21
To: beyond-...@googlegroups.com
Subject: Re: Does a Data Journal Make Sense?

Cheers

Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677 (The Netherlands)

Waard, Anita de A (ELS-AMS)

unread,
Jun 30, 2011, 1:28:03 PM6/30/11
to beyond-...@googlegroups.com
This blog (Michael (Kurtz), is this you, or someone from your group? I can't find an author attribution) http://anopisthographs.wordpress.com/ offers an interesting answer to the question, for astronomy: 'is there a difference, from a bibliometric point of view, between articles that link to data and articles that do not?'

The answer is: 'for this data set, articles with data links acquired 20% more citations (compared to articles without these links).'

This might be useful data point in trying to convince the community at large that data citations are important.

- Anita.

Anita de Waard
Disruptive Technologies Director, Elsevier Labs
http://elsatglabs.com/labs/anita/
a.de...@elsevier.com

-----Original Message-----
From: beyond-...@googlegroups.com on behalf of cameron...@stfc.ac.uk
Sent: Wed 6/29/2011 4:10
To: beyond-...@googlegroups.com
Subject: Re: Does a Data Journal Make Sense?

Hi All

Cheers

Cameron

Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677 (The Netherlands)

Michael Kurtz

unread,
Jun 30, 2011, 2:23:13 PM6/30/11
to beyond-...@googlegroups.com
Hi All,

The list Anita sent is nice, more impressive perhaps is that it is now
already 8 years old. Most of the links do, indeed, still work. New
efforts are at www.usvao.org and www.ivoa.net. The new ADS stuff is
at adsabs.org/ui

Cheers,

Michael

On Thu, Jun 30, 2011 at 10:47 AM, Waard, Anita de A (ELS-AMS)
<A.de...@elsevier.com> wrote:
> Very much agree with Cameron and others about connecting to existing efforts; just wanted to add Astronomy as a 'natural ally' - I am sure Mike Kurtz has a much better insight re. data citations in astronomy, but but this list is quite impressive already: http://www.astro.caltech.edu/~pls/astronomy/archives.html


--
Dr. Michael J. Kurtz
Harvard-Smithsonian Center for Astrophysics
60 Garden Street
Cambridge, MA 02138
USA
ku...@cfa.harvard.edu
+1 617 495 7434
www.cfa.harvard.edu/~kurtz

Michael Kurtz

unread,
Jun 30, 2011, 2:24:32 PM6/30/11
to beyond-...@googlegroups.com
Edwin Henneken, who is indeed part of the ADS staff, writes this blog.
He will be writing this up soon.

Cheers,

Michael

--

Philip Bourne

unread,
Jul 14, 2011, 4:59:40 PM7/14/11
to beyond-...@googlegroups.com
Hi: I wanted to thank everyone for their responses on the question of a Data Journal that I posted. That has led to many useful discussions and I append my own synopsis of the important points of those discussions. In summary I would say:

There is much going on in this space that is noteworthy. Perhaps it is part of the natural evolution of a 1000 flowers blooming, but certainly there is little coordination around these efforts. Many have technical solutions that seem promising, but not the data content to prove it; a few do. Many publishers see this as an emergent part of their business, but what will emerge is not necessarily clear and they are moving slowly.

For my own part I am attending a couple of meetings in August which will be rife with discussions around data journals and I will decide what, if anything, I can contribute after that. One fun thought is that we could have Beyond the PDF 2 in San Diego next Jan with the goal of defining exactly what the group can collectively contribute to a working data journal, either one that is out there, emerging, or a new one defined by the group. 

Summary of points surrounding a data journal
 
Relationships
- Relationship to Dryad
- Relationship to DataCite
- Relationship to DataVerse
- Relationship to Document Repositories like arXiv.
- Relationship to Data registries like CKAN
- Relationship to Source code repositories like github
- Relationship to big data producers

Learn from existing data journals
- Ecological archives
- Journal of Network Biology
- Biodiversity data journal (forthcoming)
- Astronomy

General Issues
- Different domains have different visions
- What constitutes the minimum metadata and ho wto represent it
- Need stable identifiers
- Versioning
- Does the data need to be reviewed if so how?
- Provinence
- Stability and longevity of the resource
- Reward system

- A number of emergent efforts from publishers
- NPG
- F1000
- PLoS

Models forward
- Work with a single institutional repository
- Work to support multiple journals in a given domain

Visions
- could support methods that operate on the data
- could then include workflows
- both part of an executable journal

Best../Phil

pfeiff

unread,
Jul 15, 2011, 2:21:10 PM7/15/11
to Beyond the PDF
Hi Phil,

On 14 Jul., 16:59, Philip Bourne <bou...@sdsc.edu> wrote:
> Hi: I wanted to thank everyone for their responses on the question of
> a Data Journal that I posted.

I was made aware of this discussion only today. May I still suggest
some additions?

First, you may consider two aditional "existing data journals":
- Earth System Science Data (first article in 2009)
- GigaScience (first article in 2011?)

ESSD is a data journal but definitely not a database or repository
Rather, it does rely on suitable repositories to exist, which might
be
a different one for each dataset published through it. We require
each dataset to be openly availabe and persistently available
(meaning: having a persistent identifier - preferably a DOI -
pointing to "eternaly" unmodified data)

I also see two additional "general issues":

- quality: The single most important contribution of the scientific
journal
to the advancement of science has been to build a (sufficiently)
*reliable*
corpus of knowledge. A data journal must do the possible to provide
the same service in relation to data.

ESSD is trying to couple reward to quality - as is "tradition".

- preservation: I would much appreciate executable elements (e.g.,
data browsers) for data articles. But I would expect even more
serious
problems to preserve those, compared to preserving the data
themselves.

Journals need to balance both the short term maximization of
providing
a maximum of insight to each "reader" in the shortest possible time
and
the provision of the long term record of science.

best,

Hans
http://www.awi.de/People/show?pfeiff
http://www.earth-system-science-data.net/

mduke

unread,
Jul 20, 2011, 12:28:21 PM7/20/11
to Beyond the PDF

Here is the announcement for GigaScience for those that have not yet
seen it:

Gigascience
<http://www.gigasciencejournal.com/>

Now accepting submissions

Editor-in-Chief:
Laurie Goodman PhD

GigaScience, a new international open access journal published by
BioMed Central, is now accepting submissions. GigaScience will publish
articles relating to biological and biomedical ¿big-data¿ studies, and
will provide a forum for dealing with the difficulties of handling
large-scale data from all areas of the life sciences. GigaScience is
co-published by BGI Shenzhen, one of the world's largest
data-producing and analysis centers, and will utilize their expertise,
infrastructure, and cloud-computing resources to host and handle all
associated large-scale datasets.
<http://www.biomedcentral.com/>
<http://www.gigasciencejournal.com/manuscript>
<http://www.genomics.cn/en/index.php>

GigaScience aims to revolutionize data dissemination, organization,
understanding and use, and to increase transparency and
reproducibility of research, with an emphasis on data quality and
utility over subjective assessments of immediate impact. To achieve
its goals, GigaScience has developed a novel publishing format that
integrates manuscript publication with a database that will provide
DOI assignment to every dataset, creating ease of access and data
permanency, as well as the long-needed ability to cite data producers
and track datasets.

Our scope covers not just 'omic' type data and the fields of
high-throughput biology currently serviced by large public
repositories, but also the growing range of more difficult-to-access
data, such as imaging, neuroscience, ecology, cohort data, systems
biology, and other new types of large-scale sharable data.

In addition to articles covering the creation of large-scale data
sets, the journal welcomes submissions of original research based on
data reuse and on cloud-based software and methods that can be
included in the database for immediate community testing. The journal
will also publish reviews and commentaries focused on current issues
relevant to data-handling and sharing.

GigaScience offers authors immediate publication on acceptance and
unrestricted space for an unlimited number of figures, tables, and
supplementary material at no extra cost to the standard
article-processing charge.

Supporting the open data movement, to enable future access and
analyses, we require that all supporting data and source code be
publically available. All articles, and associated data, published in
GigaScience will be open access - freely available to all, with the
copyright retained by the authors. All journals published by BioMed
Central are indexed in PubMed and full text articles are deposited in
PubMed Central, ensuring high visibility of research and compliance
with the NIH Public Access Policy and the Wellcome Trust Open Access
Policy.

Thanks to generous support from the BGI, authors of articles accepted
for publication in GigaScience in the journal¿s first year of
publication, will benefit from all the advantages of open access and
will not be required to pay an article processing charge.

Please submit your manuscript via our online submission system. For
more information about the journal, contact us at
edit...@gigasciencejournal.com or visit our instructions for
authors.
<http://www.gigasciencejournal.com/manuscript>
<http://www.gigasciencejournal.com/authors/instructions>

We look forward to seeing the exciting work going on in your
laboratories and to speaking with you about issues that you feel may
need special handling by our editorial staff.

With kind regards,

Laurie Goodman, PhD
Editor-in-Chief, GigaScience

--------------------------------------------------------------------------------

Editor-in-Chief

Dr Laurie Goodman (USA)

Editorial Board

Prof Stephan Beck (UK)
Dr Alvis Brazma (UK)
Prof Ann-Shyn Chiang (Taiwan)
Dr Richard Durbin (UK)
Dr Paul Flicek (UK)
Dr Robert Hanner (Canada)
Dr Yoshihide Hayashizaki (Japan)
Dr Henning Hermjakob (UK)
Dr Paul Horton (Japan)
Prof Gary King (USA)
Dr Donald G Moerman (Canada)
Prof Karen Nelson (USA)
Prof Stephen J O'Brien (USA)
Dr Hanchuan Peng (USA)
Dr Ming Qi (China)
Dr Susanna-Assunta Sansone (UK)
Dr Michael C Schatz (USA)
Prof David C Schwartz (USA)
Prof Sumio Sugano (Japan)
Dr Thomas Wachtler (Germany)
Dr Jun Wang (China)
Dr Marie Zins (France)
Reply all
Reply to author
Forward
0 new messages