Decision on PeerJ submission, with new Revised version for your review

108 views
Skip to first unread message

Tim Clark

unread,
Jan 3, 2015, 3:38:36 PM1/3/15
to idm...@force11.org, Joan Starr
Dear Colleagues

On Dec 25 I received the attached decision letter on our PeerJ submission, "Achieving human and machine accessibility of cited data in scholarly publications".  As you'll see it requested major revisions, including a significant amount of additional background. 

I went ahead and made the requested revisions, except for a few cases. The paper has increased in length as a result - from seven pages to fifteen. 

I've attached a PDF of the revised article for your review and comment, along with a draft of the "Response to Reviewers".  Please have a look and send any comments to me promptly. 

There is one particular request by Reviewer #3 to which I did not respond, mainly because I wanted your assistance.  That is item 12 under "D. Reviewer 3 Comments" in the Response, where the reviewer requests additional explanation about Data Access Methods.   I would really appreciate someone in the group expanding this section a bit. 

If possible I'd like to get this re-submitted by January 7 at the latest.  So please send your comments - if any - asap.

Thanks and Happy New Year to all. 

Tim
---------------------------------------------
Tim Clark, Ph.D.
Assistant Professor of Neurology, Harvard Medical School
Director, Biomedical Informatics Core, Massachusetts General Hospital
co-Director, Data and Statistics Core, Massachusetts Alzheimer Disease Research Center
website: http://mindinformatics.org
mobile: +1 617-947-7098 fax: +1 617-213-5418

Revisions letter accessibility PeerJ.doc
achieving-human-machine-73.pdf

Ivan Herman

unread,
Jan 4, 2015, 10:24:03 AM1/4/15
to Tim Clark, idm...@force11.org, Joan Starr
Tim,

first of all: wow:-) A deep bow towards Boston (oops, sorry, Cambridge:-) for the work you have put into this!

I have made some minor (editorial) comments; I've attached an annotated copy of the PDF file. I hope you can read it properly. I have only one slightly more substantial comment (also in the text): on the top of page 11, you write:

• Creator Identifier(s): ORCiD ID(s)6 of the individual creator(s).

I have lots of sympathy for ORCiD, but I wonder whether we are in position to put a stake on the ground and _require_ ORCiD. There are other possible identifier schemes that publishers are discussing (eg, ISNI), others are favouring a distributed approach whereby authors (and individuals) use HTTP URI-s to identify themselves (Web ID-s); I think the jury is still out on this. Just referring to unique identification of authors, and use ORCiD as a good example, might be a better approach.

As for the Data Access Method comment: I must admit I do not really understand the comments: 'Point 3 follows a linked data model that is library applicable but might not translate to other stakeholders.' The third option, ie, using a <link> element in an HTML file is not substantially different than points 1 or 2 in my view; all three aim at the same issue: find an associated reference when accessing a specific link.

What is indeed not really clear from the text, now that I re-read it, is what the goal of this section is in this specific context. Is it

- I dereference the URI, I get a landing page in HTML, and I want to have an automatic way to access the metadata?
- I dereference the URI, I get a landing page in HTML, and I want to have an automatic way to access the _real_ data content itself?

My feeling is that we are talking about the former; if so, this should be clearly stated in the section: I may need a way to access, say, the JSON version of the metadata based on the URI. And for that, I need a way to access that data itself. I wonder whether, just by making the goals a bit clearer, we would not answer the comment. (Actually, switching the order between this section and the previous one, ie, just talking about these mechanisms in relations to the 'content encoding of landing pages' may also help, possibly not even keeping it as a separate section. That being said: I am not even 100% sure that this section is strictly necessary for the paper. I goes into a level of technical details that may be unnecessary. Ie, another way of answering the comment is: nuke the section! :-)

If we decide to keep the text, some more comments came to my mind

- Item #1: in fact, it does not necessarily require a 'webmaster', but may require special web server knowledge as well as privileges (in Apache one might be able to use the '.var' mechanism to set up content negotiations; if the user has the right to do so, that may be enough, without being a webmaster. I would rather say 'requires more than average web knowledge and possibly privileges'.

- Item #1: it may be worth emphasizing that this may give the possibility to provide the same content in different format and the end user may set a priority. Ie, the same metadata may be there in Turtle, RDF/XML, and JSON, and the user decides which one to choose.

- Item #2: the same comments as item #1 above in terms of webmaster. People who have the right to provide, e.g., a PhP script on a web site may also implement this, no need for a webmaster position.

- Item #3: I think the example is unnecessary or, more exactly, either we have an example for all three bullet points or for none of the three.

I am around tomorrow (Monday) and Tuesday if you need more help.

Thanks and a happy new year to you (and everybody else!)

Cheers

Ivan
> To unsubscribe from this group and stop receiving emails from it, send an email to idmeta+un...@force11.org.
> <Revisions letter accessibility PeerJ.doc>
>
> To unsubscribe from this group and stop receiving emails from it, send an email to idmeta+un...@force11.org.
> <achieving-human-machine-73.pdf>
>> Begin forwarded message:
>>
>> From: PeerJ <peer....@peerj.com>
>> Subject: Decision on your PeerJ submission: "Achieving human and machine accessibility of cited data in scholarly publications" (#2014:12:3509:0:0:REVIEW)
>> Reply-To: PeerJ <peer....@peerj.com>
>> To: Tim Clark <tim_...@harvard.edu>
>> Date: December 25, 2014 at 1:11:20 PM EST
>>
>> PeerJ
>> Thank you for your submission to PeerJ. I am writing to inform you that in my opinion as the Academic Editor for your article, your manuscript "Achieving human and machine accessibility of cited data in scholarly publications" (#2014:12:3509:0:0:REVIEW) requires a number of major revisions before we could accept it for publication.
>>
>> The comments supplied by the reviewers on this revision are pasted below. My comments are as follows:
>>
>> Editor's comments
>>
>> Regarding concerns expressed by reviewers 1 and 2, I will start by clarifying that I have been asked to consider this submission to be within scope for PeerJ, so questions of appropriateness have not been considered in this review.
>>
>> I am very supportive of the spirit and the goals of this work, and of the Force11 effort in general. However, I share Reviewer 3's concerns about the readability and clarity of this paper.
>>
>> As written, this submission does not provide enough context about the goals of the Force11 effort, the need for human and machine accessibility of data, and the specific mechanisms being discussed. As someone who has followed these efforts, and who is sympathetic to the goals, I found myself a bit befuddled by the content of the JDDCP principles (why not cite all 8?) and some of the acronyms in table 1 (NBN? N2T ARK?). I fear that readers who are less familiar with these topics would be thoroughly confused.
>>
>> Given that this paper is trying to argue for a set of practices that would involve a change of practice for many potentially recalcitrant investigators, I suggest the addition of some additional introductory material that would more clearly express the need for this sort of data description, the existing landscape, and the potential solutions. A strong and clear description that would convince readers that this sort of description is both possible and not unduly burdensome would be most effective for meeting the Force11 goals. I fear that the paper as is would befuddle readers and hinder realization of these goals.
>>
>> All reviewers provided useful feedback - I suggest accounting for their concerns. Defining acronyms and providing the complete JDDCP definitions would be particularly useful. I also identified two questions that I would like to see discussed:
>>
>> 1. Regarding machine accessibility, is REST the only possible approach? Some repositories might, for example, prefer SPARQL access to triple stores - would that not be considered accessible? Some discussion might help.
>>
>> 2. Regarding descriptions of software, is it reasonable to discuss URIs for software tools?
>>
>> Please be aware that we consider these revisions to be major, and your revised manuscript will probably have to be re-reviewed.
>>
>> If you are willing to undertake these changes, please submit your revised manuscript (with any rebuttal information*) to the journal within 60 days.
>>
>> * Resubmission checklist:
>>
>> When resubmitting, in addition to any revised files (e.g. a clean manuscript version, figures, tables, which you will add to the "Primary Files" upload section), please also provide the following two items:
>>
>> • A rebuttal Letter: A single document where you address all the Editor and reviewers' suggestions or requirements, point-by-point.
>> • A 'Tracked Changes' version of your manuscript: A document that shows the tracking of the revisions made to the manuscript. You can also choose to simply highlight or mark in bold the changes if you prefer.
>>
>> Accepted formats for the rebuttal letter and tracked changes document are: docx (preferred), doc, or PDF.
>>
>> As you previously uploaded a single manuscript file for your initial submission you will need to upload any primary high resolution image and table files separately if you have not already done so.
>>
>> Harry Hochheiser
>> Academic Editor for PeerJ
>>
>> Reviewer Comments
>>
>> Reviewer 1 (Anonymous)
>>
>> Basic reporting
>>
>> There are some confusing aspects to Table 1. Please clarify how the HTTP(s) and PURL URI identifier schemes meet JDDCP criteria if they ‘fail’ upon object removal; this does not appear to agree with Principle 6. How can these ‘achieve persistence’ if they may not persist?
>>
>> Further attention to clarity in the text, with an eye to removing possible ambiguities and redundancies, would make the article read better. I urge the authors to consider issues such as the following:
>> 1. Page 2, paragraph 2: use of parentheses
>> 2. Page 2, paragraph 3, sentence 2: what does ‘It’ refer to?
>> 3. Page 2, paragraph 3: introduce the acronym DCIG before using it later on the same page
>> 4. Throughout the manuscript: minor errors in spelling, punctuation, and sentence structure
>> 5. Page 3, paragraph 1, sentence 1: is ‘has’ the correct word here? Perhaps ‘reflects’ or ‘demonstrates’, etc?
>> 6. Page 3, paragraph 4: is ‘vend’ the correct word to use?
>> 7. Page 5: a bullet point related to #3 refers to “b”; do the authors mean “2”?
>> 8. Page 6, paragraph 3: do the authors mean ‘as a draft of NISO JATS version 1.1’, as ‘1.1d2’ denotes the second draft?
>>
>> Experimental design
>>
>> Many of these stated criteria for this area are not relevant to the article, which does not describe primary research in the Biological Sciences, Medical Sciences, or Health Sciences. However, the subject matter of the article (guidelines/proposed methods for improving access, citation, and deposition of data related to scholarly publications) is certainly applicable to Biological, Medical, and Health sciences. I leave it to the Academic Editor to determine whether the article is appropriate for this publication.
>>
>> Validity of the findings
>>
>> Please see my comments under Experimental Design above.
>>
>> Comments for the author
>>
>> The guidelines are generally clear and contain sufficient detail for implementation. The manuscript is well-organized in presenting them.
>>
>> Reviewer 2 (Tim Vines)
>>
>> Basic reporting
>>
>> This paper is a short note that provides practical guidance on the implementation of the standards for data deposition and discoverability formulated in the Joint Declaration of Data Citation Principles. The writing is generally clear, although a bit choppy in places. Please see the attached pdf for specific suggestions.
>>
>> I found it hard to identify the intended audience for the paper. I suspect that the technical details will make most sense to web designers or database engineers that are making decisions on the design and appearance of data entries and the scholarly work that cites those entries. However, the introduction seems to be aimed at a broader audience, perhaps those in publishing, libraries or research institutions that are encountering the Force11 initiative for the first time and need to be convinced of the need for standardizing data citation and hosting practices. Even if that’s not the intent, it would benefit the article to add a little more detail and explanation throughout, and to strenuously avoid acronyms and other terms that outsiders (like me) may find opaque. A more accessible article would likely reach a broader audience, which can only be a good thing.
>>
>> Experimental design
>>
>> There is no original primary research being presented here – the work of choosing these particular standards while rejecting others has clearly gone on beforehand, and the paper presents the conclusions of that work.
>>
>> Validity of the findings
>>
>> I am not well placed to comment on whether the guidance presented here is valid or the best possible practice, particularly because there is no detail on how the presented solution was decided upon.
>>
>> Comments for the author
>>
>> I'm supportive of this paper, as it's important to have a published version of record that others can point to when considering data citation issues. However, the paper is not obviously within the Biological, Medical or Health Sciences, and seems closest to Computer Science. Moreover, the paper does not present ‘research’ as such, as all the evaluation of various options (and the process behind that) is not presented. The article may therefore not be a good fit to PeerJ’s remit, and this is the reason behind my 'reject' recommendation. The final decision on suitability is, of course, up to the editor.
>>
>> Annotated manuscript
>>
>> The reviewer has provided feedback as annotations on the manuscript PDF.
>>
>> Reviewer 3 (Anonymous)
>>
>> Basic reporting
>>
>> Although this is intended as a brief piece to provide operational guidance, as it will be published as a journal article rather than a technical report, I suggest providing additional background and explanation throughout the paper. Consider that the relevant audience may be more than just repository managers who are highly proficient with technical jargon, but also other related stakeholders e.g., in managerial or other advisory positions. Additional explanation and clarification would make this paper more accessible and appropriate for the publication venue.
>>
>> Title:
>> The title is broad considering the specific focus of the article. The current title reflects the overall goal of the Force11 data citation principles, rather than the specific points addressed by this article.
>>
>> Introduction:
>> Provide some background on Force11 as not all readers may be intimately familiar with the organization. Why are they trustworthy? What is their mission? Why the JDCCP, given the existence of other guidelines for data citation? How does this relate to other guidelines for data citation and metadata? Consider that the reader might appreciate a listing of all 8 principles in the introduction for additional context.
>>
>> You state that the JDDCP “deliberately” does not provide implementation guidelines. Why not?
>>
>> Don’t leave me hanging: what are some other specific implementation issues that we can expect to be addressed in the future?
>>
>> Provide parathenetical acronym for Data Citation Implementation Group (DCIG) the first time it is used.
>>
>> What is Machine Accessibility?:
>> Only a very cursory definition/description of machine accessibility is provided, yet guidelines for machine accessibility is the main point of the article. Some additional description with attention to reasons for the importance of machine accessibility would be appreciated. Consider a brief description of RESTful Web services---although this is a standard for accessing functions for others to use they would need documentation. Is provision of documentation a best practice?
>>
>> Unique Identification:
>> What do you mean by “long term commitment to persistence”?
>>
>> If the criteria in Table 1 are important, why are they not introduced and discussed in the main body of the text? Is there a particular recommendation on any of these criteria?
>>
>> Landing pages:
>> “First, as ‘mandated’ in the JDDCP – consider word choice here with ‘mandated’, may be too strong.
>>
>> Sentence order in the first paragraph here does not flow appropriately. After the “First” point the sentence explaining more about metadata should follow. After the “Second” the sentence explaining credential validation should appear.
>>
>> “Landing pages should combine human-readable and machine-readable information on a selection of the following items” – What selection should I choose from the list? Are these all optional items?
>>
>> “Explanatory or contextual information” – Should the documentation be part of the landing page or a separate document?
>>
>> “Dataset descriptions” – Need better definition of what “description” is in this context.
>> Regarding persistence and data availability, and the persistence of metadata beyond
>> de-accessioning: should this be parallel to journal articles? Why is data different?
>>
>> Minimum acceptable information on landing pages:
>> Under 2, the 6th bullet, is there a particular ISO standard that can be referenced?
>>
>> Best practices for dataset description:
>> What kind of description are we talking about here exactly? Why is it safe to say that a standard that has only been very recently released is already widely used and settled? Do you anticipate the release of any additional domain specific standards?
>>
>> Data access methods:
>> Consider providing here some additional explanation. Points 1 and 2 are generically applicable. Point 3 follows a linked data model that is library applicable but might not translate to other stakeholders.
>>
>> Persistence guarantee:
>> Can you make the relationship to persistent identifiers more explicit? This section feels somewhat overreaching. Is this a little much for citation practices? Or is this really a trusted repository issue?
>>
>> Additional comments:
>> Are there any existing examples out there that already meet these criteria that you can share?
>>
>> References:
>> In an article on citation standards, it is imperative that the reference list be correctly formatted and provide all necessary information to easily retrieve the listed documents. Check the author guidelines and make certain that both in-text and full citations in the reference list are done appropriately. For example, provide URLs and access dates for technical reports. It sends a mixed message to promote higher standards for data citation than document citation.
>>
>> Should the JDDCP itself be included in the reference list?
>>
>> Experimental design
>>
>> Not applicable
>>
>> Validity of the findings
>>
>> Not applicable
>>
>> Comments for the author
>>
>> No comments - covered above.
>>
>> © 2014, PeerJ, Inc. PO Box 614 Corte Madera, CA 94976, USA
>>
>
>> Begin forwarded message:
>>
>> References:
>> In an article on citation standards, it is imperative that the reference list be correctly formatted and provide all necessary information to easily retrieve the listed documents.
>
>
> To unsubscribe from this group and stop receiving emails from it, send an email to idmeta+un...@force11.org.


----
Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704



signature.asc
achieving-human-machine-73.pdf

Tim Clark

unread,
Jan 4, 2015, 12:12:57 PM1/4/15
to Dr Ivan Herman, idm...@force11.org, Joan Starr, Joe Hourcle
Ivan, 

Thanks very much for your comments and notes!  Yes, I would very much like some more help to finish this, from you and whoever else can join in.  

How would Tuesday Jan 6 at 11am Boston time (16:00 UTC) work for you?

We can use my concall line at 

US: +1 800-501-8979
Intl: +1 30 32 489 662 
Code: 7688522

If nobody else  joins we can shift over to Skype :-)

Best

Tim


<achieving-human-machine-73.pdf>

Joan Starr

unread,
Jan 4, 2015, 12:49:01 PM1/4/15
to Ivan Herman, Tim Clark, idm...@force11.org
I echo Ivan's "wow" and raise it one. :) Tim, you are amazing. Thanks so much for your hard work on this, and thanks Ivan for your thoughtful comments. I promise to review all this and join the Tuesday morning call. So, please use Skype, okay?

Thanks!
Joan

Joan Starr
EZID Service Manager
California Digital Library
University of California Office of the President
wk: 510-987-0469
twitter: joan_starr
http://orcid.org/0000-0002-7285-027X

________________________________________
From: Ivan Herman [iv...@w3.org]
Sent: Sunday, January 04, 2015 7:23 AM
To: Tim Clark
Cc: idm...@force11.org; Joan Starr
Subject: Re: [idmeta] Decision on PeerJ submission, with new Revised version for your review

Ruth Ellen Duerr

unread,
Jan 4, 2015, 12:53:38 PM1/4/15
to Tim Clark, Dr Ivan Herman, idm...@force11.org, Joan Starr, Joe Hourcle
Hi Tim,

I plan on taking a look at this tomorrow (well ok, I started already but have too many things due tomorrow to finish it today).  I do have comments already.  

For example, there seems to be this pervasive idea that citing your sources is something that has never been done before; but that simply isn’t true.  The issue is that some time around the late 50’s - early 60’s or so, data was no longer in library books which were routinely cited as a normal course of business!  

For example, here at NSIDC we, as the World Data Center - A for glaciology, have all of the data holdings related to glaciology from the International Geophysical Year (1957-58) (the WDC-B and C in Cambridge England and over in China have duplicate copies of all of these data).  So what are these data?  They are huge books consisting of maybe 20 pages of descriptive text about how the data was measured, the instruments calibrated, where the data was taken, etc. (i.e., what we would now call metadata) followed by several hundred pages of data tables (or weather charts or maps or what have you - the point being that it was data).  OK, so some of the books have chapters where each chapter starts off with a few pages of descriptive text, followed by a whole bunch of data; but the point holds that if you used that data in your own work, you would have most certainly cited it.

Yes, different disciplines have handled this differently over time.  For example, as late as the 1970’s I published my Astronomy data in the Publications of the Astronomical Society of the Pacific, since that was the purpose of that journal at the time.  In that case, the “data” were marked up copies of the objective prism plates that I acquired indicating the locations of stars with bright Hydrogen lines.

But the point is that the science community used to do this, stopped doing this with the advent of digital data and needs to get back to its old practices…  The very first paragraph of the new abstract doesn’t reflect this reality.

I also echo Joe’s concern re DCAT which has the problem that it only allows one distribution per data set which is so not the case in the Earth Sciences where it is extremely common to offer scientifically equivalent data in several formats.  What we do here is to add the FRBR concept of a representation between the DCAT dataset concept and the distribution concept to accommodate that.  We are also moving to using the Data Conservancy packaging format which combines BagIt with OAI-ORE to reflect all those various relationships.

I also echo Ivan’s concerns re ORCID’s, etc.

I do note that Joe and I are attending the ESIP Federation meeting next week and that technically at 11 am we should be listening to a 
Panel on User Needs Related to Food Resilience - Moderated by Brad Doorn, NASA
Molly Brown, NASA/SMAP Early Adopter Program
Gary Eilerts, Famine Early Warning Systems Network, US Agency for International Development (USAID)
Liangzhi You, International Food Policy Research Institute
John Bolten, NASA (GEOGLAM)
Josh Liebermann, OGC Agriculture & Climate WG


While Joe may be entirely uninterested (???) I must admit that I am not…

Ruth

Haak, Laurel

unread,
Jan 4, 2015, 1:04:08 PM1/4/15
to Joan Starr, Ivan Herman, Tim Clark, idm...@force11.org
Tim, thanks for your hard work on this, and over winter break nonetheless.  I've read through comments and the text and agree with your changes.  Per Ivan's comment about ORCID, I am OK with adjusting the reference as he suggests, through I do want to note that ORCID identifiers are persistent URIs. 

Happy New Year to all!

Cheers,

-Laure

Tim Clark

unread,
Jan 4, 2015, 1:40:01 PM1/4/15
to Ruth Ellen Duerr, Dr Ivan Herman, idm...@force11.org, Joan Starr, Joe Hourcle
Hi Ruth,

Thanks for your comments.  Can you suggest some specific changes to the text for your three points (citing sources, DCAT and ORCID)?  

Best

Tim

Ruth Ellen Duerr

unread,
Jan 4, 2015, 2:20:52 PM1/4/15
to Tim Clark, Dr Ivan Herman, idm...@force11.org, Joan Starr, Joe Hourcle
Yup - that's the plan... - Ruth

Sent from my iPad

Joe Hourcle

unread,
Jan 4, 2015, 5:42:43 PM1/4/15
to Tim Clark, idm...@force11.org, Joan Starr

On Jan 3, 2015, at 3:38 PM, Tim Clark wrote:

> Dear Colleagues
>
> On Dec 25 I received the attached decision letter on our PeerJ submission, "Achieving human and machine accessibility of cited data in scholarly publications". As you'll see it requested major revisions, including a significant amount of additional background.
>
> I went ahead and made the requested revisions, except for a few cases. The paper has increased in length as a result - from seven pages to fifteen.
>
> I've attached a PDF of the revised article for your review and comment, along with a draft of the "Response to Reviewers". Please have a look and send any comments to me promptly.


Citation for SOAP (page 5). I don't know which is best ... the original, or the current version. I'll go with the recent version, based on the XML schema reference:

Gudgin, M., Hadley, M., Mendelsohn, N., Moreau, J.-J., Nielsen, H.F., Karmarkar, A, and Lafon, Y. (2007). SOAP Version 1.2 Part 1: Messaging Framework (Second Edition): W3C Recommendation, 27 April 2007, http://www.w3.org/TR/soap12-part1/


On the REST vs. SOAP approach: for 'simpler', SOAP actually tends to be faster & easier to set up because of WSDL (Web Services Description Language), which allows you to easily create stubs for the procedure calls. Document/literal soap (or preferably, doc/lit wrapped) does away with the overhead of variable type declarations that you have with RPC/encoded SOAP. The reason to select REST in this particular case is because the focus in REST is on the documents being stored, whereas in SOAP, the focus is on the methods/procedures (if it's RPC/encoded), or on the messages (if it's document/literal).

I'd always vote against taking the 'contemporary' argument ... after all, hovercraft are much newer than cars, and they can go more places, so why are we all still using cars? Or heaven forbid, bicycles? Or walking?


So, proposed revision:

RESTful Web services are recommended over the other major Web service approach, SOAP interfaces (/cite SOAP), due to our focus on the documents being served and their content. REST allows for the content to be returned in multiple formats. (see Data Access Methods for details).


... you can still reference JSON if you want.




> There is one particular request by Reviewer #3 to which I did not respond, mainly because I wanted your assistance. That is item 12 under "D. Reviewer 3 Comments" in the Response, where the reviewer requests additional explanation about Data Access Methods. I would really appreciate someone in the group expanding this section a bit.


>> Data access methods:
>> Consider providing here some additional explanation. Points 1 and 2 are generically applicable. Point 3 follows a linked data model that is library applicable but might not translate to other stakeholders.

Conveniently, I've been working the poster for ESIP next week, so actually fleshed out some of the advantages / disadvantages of the 3 methods. See attached.
(although the poster was done when I didn't know that there was a new version, so may need to be revised).

I do agree with Ivan's earlier comments that we may need to rework this to explain that this is 'Metadata Access Methods' or 'Landing Page Access Methods'.


...


And for what I sent Joan & Tim yesterday. The quick synopsis :

we recommend DCAT, but DCAT doesn't actually have fields for the metadata that we're requiring. (no Creator or Version).

DCAT lacks any concept of people other than 'Publisher', so would fail Principle #2 (Credit and Attribution).


...

I still need to go through the full reviewer comments, so might have more to add.

-Joe

ps. and yes, I care about food security & such ... I'll have to see if I can find a good place to make a phone call from. I can't remember if the teleconferencing stuff allows us to type messages or not, but I might be able to pull that off w/ some headphones if nothing else. (but I might have to bail a tough early, as I'm meeting up with some folks at noon a couple blocks away from the conference)





2015_ESIP_DataCitation_draft3.pdf

Tim Clark

unread,
Jan 4, 2015, 5:47:12 PM1/4/15
to Joe Hourcle, idm...@force11.org, Joan Starr
Thanks Joe.


<2015_ESIP_DataCitation_draft3.pdf>









If possible I'd like to get this re-submitted by January 7 at the latest.  So please send your comments - if any - asap.

Thanks and Happy New Year to all.

Tim
---------------------------------------------
Tim Clark, Ph.D.
Assistant Professor of Neurology, Harvard Medical School
Director, Biomedical Informatics Core, Massachusetts General Hospital
co-Director, Data and Statistics Core, Massachusetts Alzheimer Disease Research Center
website: http://mindinformatics.org
mobile: +1 617-947-7098 fax: +1 617-213-5418


Data access methods:
Consider providing here some additional explanation. Points 1 and 2 are generically applicable. Point 3 follows a linked data model that is library applicable but might not translate to other stakeholders.

Persistence guarantee:
Can you make the relationship to persistent identifiers more explicit? This section feels somewhat overreaching. Is this a little much for citation practices? Or is this really a trusted repository issue?

Additional comments:
Are there any existing examples out there that already meet these criteria that you can share?

References:
In an article on citation standards, it is imperative that the reference list be correctly formatted and provide all necessary information to easily retrieve the listed documents. Check the author guidelines and make certain that both in-text and full citations in the reference list are done appropriately. For example, provide URLs and access dates for technical reports. It sends a mixed message to promote higher standards for data citation than document citation.

Should the JDDCP itself be included in the reference list?

Experimental design

Not applicable

Validity of the findings

Not applicable

Comments for the author

No comments - covered above.

© 2014, PeerJ, Inc. PO Box 614 Corte Madera, CA 94976, USA

Joan Starr

unread,
Jan 4, 2015, 6:49:16 PM1/4/15
to Tim Clark, idm...@force11.org

OK, I grabbed some time to read through the doc and comments. Have added my comments to the version that Ivan commented. I appreciate the comments others have contributed as well and have tried not to duplicate.

--Joan

 

From: Tim Clark [mailto:tim_...@harvard.edu]
Sent: Sunday, January 04, 2015 2:47 PM
To: Joe Hourcle
Cc: idm...@force11.org; Joan Starr
Subject: Re: [idmeta] Decision on PeerJ submission, with new Revised version for your review

 

Thanks Joe.

achieving-human-machine-73-js.pdf

Ivan Herman

unread,
Jan 5, 2015, 1:03:01 AM1/5/15
to Tim Clark, idm...@force11.org, Joan Starr, Joe Hourcle

> On 04 Jan 2015, at 18:12 , Tim Clark <tim_...@harvard.edu> wrote:
>
> Ivan,
>
> Thanks very much for your comments and notes! Yes, I would very much like some more help to finish this, from you and whoever else can join in.
>
> How would Tuesday Jan 6 at 11am Boston time (16:00 UTC) work for you?
>
> We can use my concall line at
>
> US: +1 800-501-8979
> Intl: +1 30 32 489 662
> Code: 7688522
>
> If nobody else joins we can shift over to Skype :-)

It seems that there are more people to join, let us use this.

Ivan
signature.asc

Ivan Herman

unread,
Jan 5, 2015, 3:04:17 AM1/5/15
to Joan Starr, Tim Clark, idm...@force11.org

> On 04 Jan 2015, at 18:48 , Joan Starr <Joan....@ucop.edu> wrote:
>
> I echo Ivan's "wow" and raise it one. :) Tim, you are amazing. Thanks so much for your hard work on this, and thanks Ivan for your thoughtful comments. I promise to review all this and join the Tuesday morning call. So, please use Skype, okay?

Tim offered a concall as an alternative:

US: +1 800-501-8979
Intl: +1 30 32 489 662
Code: 7688522

wouldn't that be better in place of skype? I have bad experiences in using skype for several people at a time...

Ivan
signature.asc

Ivan Herman

unread,
Jan 5, 2015, 3:10:20 AM1/5/15
to Joe Hourcle, Tim Clark, idm...@force11.org, Joan Starr
Joe, Tim,

I like the section on access methods on the slide. It is concise and much clearer than what is in the current text; I guess essentially taking it over would be a good idea.

Talk to you tomorrow!

Ivan
> <2015_ESIP_DataCitation_draft3.pdf>
>
>
>
>
>
>
>
>
>
>> If possible I'd like to get this re-submitted by January 7 at the latest. So please send your comments - if any - asap.
>>
>> Thanks and Happy New Year to all.
>>
>> Tim
>> ---------------------------------------------
>> Tim Clark, Ph.D.
>> Assistant Professor of Neurology, Harvard Medical School
>> Director, Biomedical Informatics Core, Massachusetts General Hospital
>> co-Director, Data and Statistics Core, Massachusetts Alzheimer Disease Research Center
>> website: http://mindinformatics.org
>> mobile: +1 617-947-7098 fax: +1 617-213-5418
>>
>>
>>> Data access methods:
>>> Consider providing here some additional explanation. Points 1 and 2 are generically applicable. Point 3 follows a linked data model that is library applicable but might not translate to other stakeholders.
>>>
signature.asc

Joan Starr

unread,
Jan 5, 2015, 9:47:01 AM1/5/15
to Ivan Herman, Tim Clark, idm...@force11.org
Okay sure--that sounds fine to me.
--Joan

________________________________________
From: Ivan Herman [iv...@w3.org]

Sent: Monday, January 05, 2015 12:04 AM
To: Joan Starr
Cc: Tim Clark; idm...@force11.org

Mercè Crosas

unread,
Jan 5, 2015, 10:11:23 AM1/5/15
to Joan Starr, Ivan Herman, Tim Clark, idm...@force11.org
Tim,

Thanks for all the hard work on this. I'll join the call tomorrow morning at 11am.

Merce


Mercè Crosas, Ph.D.
Director of Data Science, IQSS
Harvard University

Arthur Smith

unread,
Jan 5, 2015, 10:45:16 AM1/5/15
to idm...@force11.org
I'm just getting to this after the holidays... Thanks to Tim for all the
hard work! I'll definitely try to join the call tomorrow.

I have a concern that the new version is excessively long and wordy. The
additional explanation is good, but it seems overdone (and a bit
melodramatic) to me. For example the "Background" and "Why cite data?"
sections seem duplicative - both discuss motivations regarding fraud and
erroneous or flawed treatment of data, as well as reuse and accessibility.

I also question some of the historical background - for instance
crystallographic data was required (or at least "recommended") by IUCr
journals as of January 1992 to be provided in CIF format for
publication, which I think predates (at least slightly) requirements for
genetics data. See
http://www.iucr.org/__data/assets/pdf_file/0019/22618/cifguide.pdf - on
the other hand genetics is certainly a great example of where being open
about the data and the support of public databases seems to have had a
huge positive effect on the field and the growth is indeed very impressive.

I haven't had a chance to read the whole thing in detail, but those were
a couple of first impressions.

Arthur Smith

Tim Clark

unread,
Jan 5, 2015, 10:52:36 AM1/5/15
to Arthur Smith, idm...@force11.org
HI Arthur - 

Well I take your point about the drama, and on re-read I agree.  So we can cut out the details on Obokata.  And thanks, we can surely add the crystallographic data background.

As for the wordiness - well please read the reviewer comments.  They requested it.  If you find certain sections can be made more terse, please suggest specific wording. 

Best

Tim



---------------------------------------------
Tim Clark, Ph.D.
Assistant Professor of Neurology, Harvard Medical School
Director, Biomedical Informatics Core, Massachusetts General Hospital
co-Director, Data and Statistics Core, Massachusetts Alzheimer Disease Research Center
website: http://mindinformatics.org
mobile: +1 617-947-7098 fax: +1 617-213-5418

Arthur Smith

unread,
Jan 5, 2015, 10:56:53 AM1/5/15
to Tim Clark, idm...@force11.org
Thanks - I'll try to send some more concrete suggestions later today.

  Arthur

Melissa Haendel

unread,
Jan 5, 2015, 11:37:55 AM1/5/15
to Arthur Smith, Tim Clark, <idmeta@force11.org>
Thanks for all the hard work !! I'm working on a round of edits now. 

I tend to agree though - now its too long. Will try to streamline a bit. I know we are running out of time, but perhaps if we are not doing this on googledocs we should take turns? else it becomes somewhat of a bear to keep track of edits, especially if we are editing on a PDF.

Cheers,
Melissa


On Jan 5, 2015, at 7:56 AM, Arthur Smith <aps...@aps.org>
 wrote:

Thanks - I'll try to send some more concrete suggestions later today.

  Arthur

On 1/5/15, 10:52 AM, Tim Clark wrote:
HI Arthur - 

Well I take your point about the drama, and on re-read I agree.  So we can cut out the details on Obokata.  And thanks, we can surely add the crystallographic data background.

As for the wordiness - well please read the reviewer comments.  They requested it.  If you find certain sections can be made more terse, please suggest specific wording. 

Best

Tim



---------------------------------------------
Tim Clark, Ph.D.
Assistant Professor of Neurology, Harvard Medical School
Director, Biomedical Informatics Core, Massachusetts General Hospital
co-Director, Data and Statistics Core, Massachusetts Alzheimer Disease Research Center
website: http://mindinformatics.org
mobile: +1 617-947-7098 fax: +1 617-213-5418

On Jan 5, 2015, at 10:45 AM, Arthur Smith <aps...@aps.org> wrote:

I'm just getting to this after the holidays... Thanks to Tim for all the hard work! I'll definitely try to join the call tomorrow.

I have a concern that the new version is excessively long and wordy. The additional explanation is good, but it seems overdone (and a bit melodramatic) to me. For example the "Background" and "Why cite data?" sections seem duplicative - both discuss motivations regarding fraud and erroneous or flawed treatment of data, as well as reuse and accessibility.

I also question some of the historical background - for instance crystallographic data was required (or at least "recommended") by IUCr journals as of January 1992 to be provided in CIF format for publication, which I think predates (at least slightly) requirements for genetics data. See http://www.iucr.org/__data/assets/pdf_file/0019/22618/cifguide.pdf - on the other hand genetics is certainly a great example of where being open about the data and the support of public databases seems to have had a huge positive effect on the field and the growth is indeed very impressive.

I haven't had a chance to read the whole thing in detail, but those were a couple of first impressions.

  Arthur Smith

To unsubscribe from this group and stop receiving emails from it, send an email to idmeta+un...@force11.org.





To unsubscribe from this group and stop receiving emails from it, send an email to idmeta+un...@force11.org.

Dr. Melissa Haendel

Assistant Professor
Ontology Development Group, OHSU Library
Department of Medical Informatics and Epidemiology
Oregon Health & Science University
hae...@ohsu.edu
skype: melissa.haendel




Mercè Crosas

unread,
Jan 5, 2015, 11:56:28 AM1/5/15
to Melissa Haendel, Arthur Smith, Tim Clark, <idmeta@force11.org>
Or should we just create a Google doc for the revisions?

Merce


Mercè Crosas, Ph.D.
Director of Data Science, IQSS
Harvard University

Tim Clark

unread,
Jan 5, 2015, 12:01:08 PM1/5/15
to Mercè Crosas, Melissa Haendel, Arthur Smith, <idmeta@force11.org>
No way, this is all in Latex. 

You can do collaborative edits using Overleaf (aka “writelatex”), but I’d like to limit the edits so it doesn’t go haywire. 

If you want to edit this text, please let the list know you are doing so and what section - let’s not do a free-for-all.


Tim

Tim Clark

unread,
Jan 5, 2015, 12:03:43 PM1/5/15
to Mercè Crosas, Melissa Haendel, Arthur Smith, <idmeta@force11.org>
PS please also be sure you have carefully read the editor & reviewer comments before hacking the text!

Mercè Crosas

unread,
Jan 5, 2015, 12:05:56 PM1/5/15
to Tim Clark, Melissa Haendel, Arthur Smith, <idmeta@force11.org>
Sounds good, Tim. You are completely right - we can't do this at this point, and need to limit the edits. As of now, I'm NOT planning to change any text, I'm just reading it and reviewing for tomorrow's call.

Thanks,
Merce


Mercè Crosas, Ph.D.
Director of Data Science, IQSS
Harvard University

Melissa Haendel

unread,
Jan 5, 2015, 12:06:15 PM1/5/15
to Tim Clark, Mercè Crosas, Arthur Smith, <idmeta@force11.org>
I'm using PDF edit tool now…

On Jan 5, 2015, at 9:01 AM, Tim Clark <tim_...@harvard.edu>
 wrote:

Tim Clark

unread,
Jan 5, 2015, 12:09:58 PM1/5/15
to Melissa Haendel, Mercè Crosas, Arthur Smith, <idmeta@force11.org>
Just responded to Arthur’s comments (except as to length) by

- removing the “suicide / lab closure” details re: Obokata (but they are real) and
- adding mention of crystallographic data to the historical section

I will try to incorporate  something like Joe’s text from the poster next …

Tim

Joe Hourcle

unread,
Jan 5, 2015, 12:12:30 PM1/5/15
to Tim Clark, idm...@force11.org

On Jan 5, 2015, at 12:09 PM, Tim Clark wrote:

> Just responded to Arthur’s comments (except as to length) by
>
> - removing the “suicide / lab closure” details re: Obokata (but they are real) and
> - adding mention of crystallographic data to the historical section
>
> I will try to incorporate something like Joe’s text from the poster next …

I can volunteer to do that section, if you want to work on something else.

-Joe



>> On Jan 5, 2015, at 12:06 PM, Melissa Haendel <hae...@ohsu.edu> wrote:
>>
>> I'm using PDF edit tool now…
>>
>> On Jan 5, 2015, at 9:01 AM, Tim Clark <tim_...@harvard.edu <mailto:tim_...@harvard.edu>>
>> wrote:
>>
>>> No way, this is all in Latex.
>>>
>>> You can do collaborative edits using Overleaf (aka “writelatex”), but I’d like to limit the edits so it doesn’t go haywire.
>>>
>>> If you want to edit this text, please let the list know you are doing so and what section - let’s not do a free-for-all.
>>>
>>> https://www.overleaf.com/1940196jndtwp <https://www.overleaf.com/1940196jndtwp>
>>>
>>> Tim
>>>
>>>
>>>> On Jan 5, 2015, at 11:56 AM, Mercè Crosas <mcr...@iq.harvard.edu <mailto:mcr...@iq.harvard.edu>> wrote:
>>>>
>>>> Or should we just create a Google doc for the revisions?
>>>>
>>>> Merce
>>>>
>>>>
>>>> Mercè Crosas, Ph.D.
>>>> Director of Data Science, IQSS
>>>> Harvard University
>>>> http://datascience.iq.harvard.edu <http://datascience.iq.harvard.edu/>
>>>> On Mon, Jan 5, 2015 at 11:37 AM, Melissa Haendel <hae...@ohsu.edu <mailto:hae...@ohsu.edu>> wrote:
>>>> Thanks for all the hard work !! I'm working on a round of edits now.
>>>>
>>>> I tend to agree though - now its too long. Will try to streamline a bit. I know we are running out of time, but perhaps if we are not doing this on googledocs we should take turns? else it becomes somewhat of a bear to keep track of edits, especially if we are editing on a PDF.
>>>>
>>>> Cheers,
>>>> Melissa
>>>>
>>>>
>>>> On Jan 5, 2015, at 7:56 AM, Arthur Smith <aps...@aps.org <mailto:aps...@aps.org>>
>>>> wrote:
>>>>
>>>>> Thanks - I'll try to send some more concrete suggestions later today.
>>>>>
>>>>> Arthur
>>>>>
>>>>> On 1/5/15, 10:52 AM, Tim Clark wrote:
>>>>>> HI Arthur -
>>>>>>
>>>>>> Well I take your point about the drama, and on re-read I agree. So we can cut out the details on Obokata. And thanks, we can surely add the crystallographic data background.
>>>>>>
>>>>>> As for the wordiness - well please read the reviewer comments. They requested it. If you find certain sections can be made more terse, please suggest specific wording.
>>>>>>
>>>>>> Best
>>>>>>
>>>>>> Tim
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------
>>>>>> Tim Clark, Ph.D.
>>>>>> Assistant Professor of Neurology, Harvard Medical School
>>>>>> Director, Biomedical Informatics Core, Massachusetts General Hospital
>>>>>> co-Director, Data and Statistics Core, Massachusetts Alzheimer Disease Research Center
>>>>>> website: http://mindinformatics.org <http://mindinformatics.org/>
>>>>>> mobile: +1 617-947-7098 <tel:%2B1%20617-947-7098> fax: +1 617-213-5418 <tel:%2B1%20617-213-5418>
>>>>>>> On Jan 5, 2015, at 10:45 AM, Arthur Smith <aps...@aps.org <mailto:aps...@aps.org>> wrote:
>>>>>>>
>>>>>>> I'm just getting to this after the holidays... Thanks to Tim for all the hard work! I'll definitely try to join the call tomorrow.
>>>>>>>
>>>>>>> I have a concern that the new version is excessively long and wordy. The additional explanation is good, but it seems overdone (and a bit melodramatic) to me. For example the "Background" and "Why cite data?" sections seem duplicative - both discuss motivations regarding fraud and erroneous or flawed treatment of data, as well as reuse and accessibility.
>>>>>>>
>>>>>>> I also question some of the historical background - for instance crystallographic data was required (or at least "recommended") by IUCr journals as of January 1992 to be provided in CIF format for publication, which I think predates (at least slightly) requirements for genetics data. See http://www.iucr.org/__data/assets/pdf_file/0019/22618/cifguide.pdf <http://www.iucr.org/__data/assets/pdf_file/0019/22618/cifguide.pdf> - on the other hand genetics is certainly a great example of where being open about the data and the support of public databases seems to have had a huge positive effect on the field and the growth is indeed very impressive.
>>>>>>>
>>>>>>> I haven't had a chance to read the whole thing in detail, but those were a couple of first impressions.
>>>>>>>
>>>>>>> Arthur Smith
>>>>>>>
>>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to idmeta+un...@force11.org <mailto:idmeta+un...@force11.org>.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to idmeta+un...@force11.org <mailto:idmeta+un...@force11.org>.
>>>>
>>>> Dr. Melissa Haendel
>>>>
>>>> Assistant Professor
>>>> Ontology Development Group, OHSU Library
>>>> www.ohsu.edu/library/ontology <http://www.ohsu.edu/library/ontology>
>>>> Department of Medical Informatics and Epidemiology
>>>> Oregon Health & Science University
>>>> hae...@ohsu.edu <mailto:hae...@ohsu.edu>
>>>> skype: melissa.haendel
>>>> 503-407-5970 <tel:503-407-5970>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> To unsubscribe from this group and stop receiving emails from it, send an email to idmeta+un...@force11.org <mailto:idmeta+un...@force11.org>.
>>>>
>>>>
>>>> To unsubscribe from this group and stop receiving emails from it, send an email to idmeta+un...@force11.org <mailto:idmeta+un...@force11.org>.
>>>
>>>
>>> To unsubscribe from this group and stop receiving emails from it, send an email to idmeta+un...@force11.org <mailto:idmeta+un...@force11.org>.
>>
>> Dr. Melissa Haendel
>>
>> Assistant Professor
>> Ontology Development Group, OHSU Library
>> www.ohsu.edu/library/ontology <http://www.ohsu.edu/library/ontology>
>> Department of Medical Informatics and Epidemiology
>> Oregon Health & Science University
>> hae...@ohsu.edu <mailto:hae...@ohsu.edu>
>> skype: melissa.haendel
>> 503-407-5970
>>
>>
>>
>>
>>
>> To unsubscribe from this group and stop receiving emails from it, send an email to idmeta+un...@force11.org <mailto:idmeta+un...@force11.org>.

Tim Clark

unread,
Jan 5, 2015, 12:12:56 PM1/5/15
to Joe Hourcle, idm...@force11.org
Okay have at it!

Michel Dumontier

unread,
Jan 5, 2015, 1:01:33 PM1/5/15
to Ruth Ellen Duerr, Tim Clark, Dr Ivan Herman, idm...@force11.org, Joan Starr, Joe Hourcle
Ruth / Joe,

I also echo Joe’s concern re DCAT which has the problem that it only allows one distribution per data set

where does it say that?
 
m.

Joe Hourcle

unread,
Jan 5, 2015, 1:26:09 PM1/5/15
to Michel Dumontier, Ruth Ellen Duerr, Tim Clark, Dr Ivan Herman, idm...@force11.org, Joan Starr
Um ... that wasn't my object. My issue was that it doesn't support versions (other than a modified date) or acknowledging roles other than publisher, and has no concept of 'creator' for citation. Although the DCAT documentation mentions Dublin Core and FOAF, it doesn't look to be using DC directly as it defines duplicate fields (publisher, title, etc.).

...

I think DCAT *does* support multiple distributions although their ER diagrams don't show cardinality. I've highlighted the part that I think suggested a one-to-many relationship w/ underscores:

http://www.w3.org/TR/vocab-dcat/#class-distribution

RDF class: dcat:Distribution

Definition: Represents a specific available form of a dataset. Each dataset _might_be_available_ in _different_forms_, these forms might represent different formats of the dataset or different endpoints. Examples of distributions include a downloadable CSV file, an API or an RSS feed

Usage note: This represents a general availability of a dataset it implies no information about the actual access method of the data, i.e. whether it is a direct download, API, or some through Web page. The use of dcat:downloadURL property indicates directly downloadable distributions.


I don't have to dig deeper to see if it handles highly available datasets via duplicating accessURL or downloadURL within a given Distribution class.

-Joe

Michel Dumontier

unread,
Jan 5, 2015, 1:32:51 PM1/5/15
to Joe Hourcle, Ruth Ellen Duerr, Tim Clark, Dr Ivan Herman, idm...@force11.org, Joan Starr
Hi Joe,
We have used dcat:distribution to point to more than one file.
Another way of doing it is partitioning the dataset using dct:hasPart
or void:subset, and pointing to the download urls.

as far as versioning goes - we proposed a solution to this in the
HCLS note [1].

m.

[1] http://htmlpreview.github.io/?https://github.com/joejimbo/HCLSDatasetDescriptions/blob/master/Overview.html#datasetdescriptionlevels



Michel Dumontier
Associate Professor of Medicine (Biomedical Informatics), Stanford University
Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group
http://dumontierlab.com

Ruth Ellen Duerr

unread,
Jan 5, 2015, 2:19:48 PM1/5/15
to Michel Dumontier, Tim Clark, Dr Ivan Herman, idm...@force11.org, Joan Starr, Joe Hourcle
OK, I confess that I misstated myself…  to some extent at least.  We have the issue where multiple representations of the same dataset may be available and where in one form of the data set may be organized by time series per pixel (over the entire earth where the times series can be many decades long) and the other as a sequence of 2.4 million and growing data files (individually web accessible even!) each containing the data for some roughly 500x500 km region of the earth at a point in time (well a 5 minute duration in time). Moreover, there may be several different types of web services available for each representation.  So we need one to many multiplicity in many places.   Like Michel, we extend DCAT as needed to accommodate these more complicated situations…  - though apparently with different sources of the extensions!

Ruth

Arthur Smith

unread,
Jan 5, 2015, 4:06:32 PM1/5/15
to Tim Clark, Mercè Crosas, Melissa Haendel, <idmeta@force11.org>
Hi Tim - sorry I should have emailed I was making a few minor edits - mostly grammar issues near the beginning. Overleaf does show you a full history of changes (the little "clock" icon) though sometimes that disappears on me so maybe not entirely reliable.

Changes were:
* "...target audience consists _of_ publishers ..." ("of" was missing in the instance of this phrasing in the abstract)

* "Data are now to be considered ..." ("now" seemed redundant - I would generally avoid it in a published article as reader and writer always have different "now"s, though I see we have some "Today"s etc as well... Maybe it's ok as the time context is clear)

* Replaced a left double-quote (") with `` due to latex formatting style (otherwise " always turns into a right double-quote), but there are a lot of these - will PeerJ fix this with copy editors?

* "Data transparency and open presentation, while a central notion of _key to the reproducibility central to the_ scientific method" (most substantive change - reproducibility is the notion from the standard understanding of scientific method, not directly data transparency, I would think).

There are still a number of issues I have with the introductory sections but I don't want to change anything major via editing the file. One other concern was the repeated mention of "post-publication peer review" - that might be a red flag to some people, and really isn't a central concern of the paper. Anyway, I won't do any more edits for now, but will send some further notes and suggestions later.

   Arthur

Arthur Smith

unread,
Jan 5, 2015, 5:41:26 PM1/5/15
to Tim Clark, Mercè Crosas, Melissa Haendel, <idmeta@force11.org>
Ok, here's my thoughts on significant rearrangements/possible shortening. Note that Reviewer 2 recommended to "add a little more detail and explanation throughout, and to strenuously avoid acronyms and other terms that outsiders (like me) may find opaque" and reviewer 3 asked for "additional background and explanation throughout the paper." so we definitely need some of this...

* I think we should get to the JDDCP principles as soon as possible in this document; therefore I propose:
   - 1. Relocate the "Why cite data?" subsection entirely (without the subsection heading) within the introductory "Background" subsection, after the 3rd paragraph (before the one that starts "Reports from leading scholarly ...") - it seems to me it fits there; the resulting Background could be shortened a bit and still serve well. Also can we use "open validation" as mentioned in the Conclusion section, rather than "post-publication peer review"?
   - 2. move the "Purpose of this document" bit to after the "Implementation questions arising from the JDDCP", essentially I would suggest merging that with the "IDMETA group's recommendations are presented in the remainder of this article" bit. If more of a purpose statement is needed earlier it could go in the abstract.

* the section on "What is machine accessiblity?" we should probably say something about W3C standards for Linked Open Data (I think another reason for preferring REST over SOAP) - note the SOAP citation link is broken in the PDF - and I would just completely remove the "Clearly, machine accessibility is also ..." paragraph, I don't see that it adds anything.

* There are a few places where latex is breaking the edges of the page - one of which is the PubMed identifier URL. Do we really need to be that explicit here? I guess it's ok...

* In the discussion on persistence, it seems to me it would be helpful to be a little more explicit about cases that require some diligence on the part of owners of data, aside from "de-accessioning" - that is:
 - data may be renamed or relocated (but with the same owner); persistence commitment means ensuring the persistent id points to the new location
 - data may become the responsibility of a new owner; the persistent id responsibility, landing page, etc. should be taken up by that new owner
 - data may no longer be made available individually but as part of some larger collection, which needs to be handled somehow (what do we recommend here?)...

* Table 1 and Table 2 seem to have distinct definitions of "Resolution URI" (4th column of Table 1) and "Resolution services" (column 2 in Table 2) - this is, at the least, confusing. I'd suggest removing column 2 of Table 2 into Table 1, replacing the "Resolution URI" column. That way Table 2 is essentially about persistence, Table 1 about resolution.

* Reviewer 3 didn't like the word "mandatory" in the Landing pages section. Maybe it's the combination of the word "should" with that term. Perhaps it can all be rephrased as conditions for being a good JDDCP implementation? Would "compliant" or "required" be better than "mandatory"?

* In general the major "Five recommendations for achieving machine accessibility" section is hard to follow - can we number the 5? I count 6 right now: "Unique Identification", "Landing pages", "Content encoding on landing pages", "Best practices for dataset description", "Data access methods", and "Persistence guarantees".

* The "strongly encourage authors to publish preferentially only with journals which implement"... bit in the "Implementation" section seems a bit too strong, especially as it will take time for these things to actually be implemented...

* The second paragraph in Conclusions seems a duplicate of item 2. in "Implementation", I would just remove it.

   I think the editor and some reviewers also wanted acronyms to be defined on first use in the form "Data Citation Implementation Group (DCIG)" (i.e. acronym in parentheses immediately after the spelled-out name), this doesn't seem to have been done yet. And there's still some issues with commas and semicolons...

Anyway, if there's a consensus on any of this I wouldn't mind doing the editing, but obviously we need to discuss it and any other concerns. Thanks!

   Arthur

Tim Clark

unread,
Jan 5, 2015, 9:11:20 PM1/5/15
to Arthur Smith, Mercè Crosas, Melissa Haendel, <idmeta@force11.org>
Hi Arthur,

thanks for your edits, I am (sorry) good with all of them except this one

* "Data transparency and open presentation, while a central notion of _key to the reproducibility central to the_ scientific method" (most substantive change - reproducibility is the notion from the standard understanding of scientific method, not directly data transparency, I would think).

It is to my understanding, both data (and methods) transparency and reproducibility are central to the scientific method, but transparency now and in the past has probably been paramount. 

See the attached very fine paper by Shapin.  

Tim

Shapin 1984 - Pump & Circumstance.pdf

Melissa Haendel

unread,
Jan 5, 2015, 9:24:36 PM1/5/15
to Tim Clark, Arthur Smith, Mercè Crosas, <idmeta@force11.org>
Hi all,
here are some edits from me. I didn't quite make it to the end, ran out of time today. But hopefully helpful. 
Cheers,
Melissa

On Jan 5, 2015, at 6:10 PM, Tim Clark <tim_...@harvard.edu>
 wrote:

<Shapin 1984 - Pump & Circumstance.pdf>
achieving-human-machine-73-js-mh.pdf
Reply all
Reply to author
Forward
0 new messages