last call for revision - IDMETA PeerJ article resubmission

4 views
Skip to first unread message

Tim Clark

unread,
Jan 7, 2015, 10:37:11 PM1/7/15
to <idmeta@force11.org>, Joan Starr
achieving-human-machine-88.pdf

Ivan Herman

unread,
Jan 8, 2015, 1:26:07 AM1/8/15
to Tim Clark, <idmeta@force11.org>, Joan Starr
Tim,

I have some minor editorial nit-picking below; I just add it here instead of going through version hell again with annotated PDF files flying around. But that is just editorial; otherwise I think we should ship it!

Thanks

Ivan

Section on 'What is machine accessibility?'

- ”software system[s] designed to support interoperable machine-to-machine interaction over a network”Haas and Brown (2004)

space is missing before Haas and Brown

- Fielding and others(Fielding (2000) .

space is missing after 'others', and there is an extra space after (2000)

Section on 'HTTP URIs'

- such as http or ftp or textttmailto

texttmailto should be mailto: in the font used for code

Section on 'Best practices for dataset description'

- Creator Identifier(s) (e.g. ORCiD6, or other unique identifier of the individual creator(s)

brackets are not balanced (or is it acceptable in English styling? I do not know...)

Appendix A

- we recommend web linking (Nottingham (2010).

same comments as above on unbalanced brackets

- <link href=" uri-to-an-alternate" rel="alternate" type="application/xml" title="title">

there is an extra space before uri-to-an-alternate which really shows up on the PDF file:-(


> On 08 Jan 2015, at 04:37 , Tim Clark <tim_...@harvard.edu> wrote:
>
>
> To unsubscribe from this group and stop receiving emails from it, send an email to idmeta+un...@force11.org.
> <achieving-human-machine-88.pdf>
>
> ---------------------------------------------
> Tim Clark, Ph.D.
> Assistant Professor of Neurology, Harvard Medical School
> Director, Biomedical Informatics Core, Massachusetts General Hospital
> co-Director, Data and Statistics Core, Massachusetts Alzheimer Disease Research Center
> website: http://mindinformatics.org
> mobile: +1 617-947-7098 fax: +1 617-213-5418
>
>
> To unsubscribe from this group and stop receiving emails from it, send an email to idmeta+un...@force11.org.


----
Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704




signature.asc

Joe Hourcle

unread,
Jan 8, 2015, 6:05:40 AM1/8/15
to Ivan Herman, Tim Clark, <idmeta@force11.org>, Joan Starr


I'm taking all of Ivan's comments to be cosmetic, so went and fixed them all.

Personally, I think the style guide should put the parens around the citation automatically, but maybe that's asking for too much.


... and I also fixed all of the cases where text extended into the right margin, and a citation that had the authors wrapped in {{ ... }}, so it wasn't even trying to parse out last names.

-Joe

Ivan Herman

unread,
Jan 8, 2015, 6:18:54 AM1/8/15
to Joe Hourcle, Tim Clark, <idmeta@force11.org>, Joan Starr

> On 08 Jan 2015, at 12:06 , Joe Hourcle <one...@annoying.org> wrote:
>
>
>
> I'm taking all of Ivan's comments to be cosmetic,

They indeed are! Thanks

Ivan
signature.asc

Arthur Smith

unread,
Jan 8, 2015, 10:50:55 AM1/8/15
to Joe Hourcle, Ivan Herman, Tim Clark, <idmeta@force11.org>, Joan Starr
I've read through in detail the pdf Tim sent around earlier - I hope
that was the latest?

Thanks for all the work on this, it's looking really good!

I have some comments of quite varying significance, I'll try to put the
more important ones first...

1. There is actually some "data" our paper is based on - namely the
discussion on this mailing list and the various background documents
that have been created. We do link back to the
"datacitationimplementation" page in the text, but we don't CITE it (in
the bibliography section). should we? Also more relevant are the "Team 2
discussions" and documents which we don't directly reference at all. How
many of the JDDCP principles does a citation to that page meet? How many
of the implementation recommendations? I think there is some value in
showing we believe in what we are talking about, even if this isn't the
typical sort of research data we think of...

2. In the abstract and on p. 4 (paragraph after JDDCP) and possibly
elsewhere we describe our "main target audience" - that's ok, but the
paper does have significant messages for ordinary publishing researchers
- namely to place their data in conforming repositories and to cite it
there in their articles. I think we want to make sure not to turn off
the casual reader who needs to hear that message by conveying that the
paper doesn't have anything for them. Not sure precisely on how to word
that though. In particular the paragraph on p. 4 seemed to be worded
awkwardly though it does explain the need. Maybe a re-think of that and
promotion of some text to the abstract would help.

3. The word "mandatory" still appears on p. 8 (Items marked
"conditional" are mandatory ...) - presumably should be "recommended" to
match the changes in the following itemized list

4. On p. 6 the reference to Table 2 mentions "resolution services", and
the caption for Table 2 on p. 9 also mentions "resolution", but this was
removed from the table due to duplication with Table 1 and to focus on
persistence.

5. I like the Background and Why cite data sections as structured now, I
think this is a much improved introductory section. However, it seemed
to me that the subtitle "Why cite data?" might be more directly tied to
what our paper is about by changing it to "Why make cited data
accessible?" That is what the section seems to be about ("data is
available from the authors upon request" is sort of a citation of their
data, but it's not making it easily accessible...) - the only other
change there would be to modify the last paragraph "Citing data ..." to
"Making cited data available ...".

6. On p. 6 on Unique identification, second paragraph, is there a
contradiction between "use any ... widely (and currently) used by a
community" vs "Best practice is to choose a scheme that is
cross-discipline"? Can a community be cross-discipline?

7. On p. 16 first paragraph in "Serving landing pages: linking to the
data" there seems to be a circular statement in "the landing page should
reference the data in the landing page" - or am I missing something?

8. p. 4 DCIG #3 "Common Repository Applications Program Interfaces" - at
the least I think there shouldn't be an 's' on Applications. API usually
translates to "application programming interface" I believe - see this
wikipedia page
http://en.wikipedia.org/wiki/Application_programming_interface

9. In the abstract there's still an "emerging" in the first sentence - I
think that was expunged in the text earlier?

10. Why is the 'P' capitalized for Principles in the JDDCP section
(title and first paragraph, but not last paragraph)?

11. p. 2 I believe JDDCP and DCIG abbreviations should be given directly
after their introduction (i.e. "The Joint Declaration of Data Citation
Principles (JDDCP) (Data Citation Synthesis Group (2014)) ...")

12. I don't believe FORCE11 is explained anywhere? Also CODATA? Given
the publication venue, these are probably not familiar to most readers...

13. There are a lot of comma issues... Overuse of commas in most places
(for example p. 2 last paragraph "The recommendations outlined here were
developed as part of a community process[,] by participants representing
a ..." That comma shouldn't be there. Lots of other examples. Also, I
think Oxford commas are generally preferred? So in lists of more than 2
items a comma before the and (for instance first sentence on p. 2, "An
underlying requirement for verification, reproducibility[] and
reusability ..." - [] should have a comma. If there's a consensus on
comma style I wouldn't mind going through to fix these in overleaf...

Arthur

Tim Clark

unread,
Jan 8, 2015, 12:06:51 PM1/8/15
to Arthur Smith, <idmeta@force11.org>
Thanks for your detailed remarks, Arthur. I buy most of them, and have implemented.  Please see below. I rely on you to do the commas.

Begin forwarded message:

Date: January 8, 2015 at 10:50:54 AM EST
From: Arthur Smith <aps...@aps.org>
To: Joe Hourcle <one...@annoying.org>, Ivan Herman <iv...@w3.org>
Subject: Re: [idmeta] last call for revision - IDMETA PeerJ article resubmission

I've read through in detail the pdf Tim sent around earlier - I hope that was the latest?

Thanks for all the work on this, it's looking really good!

I have some comments of quite varying significance, I'll try to put the more important ones first...

1. There is actually some "data" our paper is based on - namely the discussion on this mailing list and the various background documents that have been created. We do link back to the "datacitationimplementation" page in the text, but we don't CITE it (in the bibliography section). should we? Also more relevant are the "Team 2 discussions" and documents which we don't directly reference at all. How many of the JDDCP principles does a citation to that page meet? How many of the implementation recommendations? I think there is some value in showing we believe in what we are talking about, even if this isn't the typical sort of research data we think of...

The way to do this, if we want to do it, would be to deposit all of our documents and discussion text in a compliant or in-the-direction-of-compliant repository, such as Dataverse.  Then cite them as data.This is a lot of work that I am not willing to undertake, but if someone feels strongly enough about it to get it done by the end of today, I am okay with citing data in a repository. It would be really counter to what we are saying to just cite a web page. 

2. In the abstract and on p. 4 (paragraph after JDDCP) and possibly elsewhere we describe our "main target audience" - that's ok, but the paper does have significant messages for ordinary publishing researchers - namely to place their data in conforming repositories and to cite it there in their articles. I think we want to make sure not to turn off the casual reader who needs to hear that message by conveying that the paper doesn't have anything for them. Not sure precisely on how to word that though. In particular the paragraph on p. 4 seemed to be worded awkwardly though it does explain the need. Maybe a re-think of that and promotion of some text to the abstract would help.

I tweaked the abstract and the page 4 bit.

3. The word "mandatory" still appears on p. 8 (Items marked "conditional" are mandatory ...) - presumably should be "recommended" to match the changes in the following itemized list


FIxed.

4. On p. 6 the reference to Table 2 mentions "resolution services", and the caption for Table 2 on p. 9 also mentions "resolution", but this was removed from the table due to duplication with Table 1 and to focus on persistence.

Fixed.


5. I like the Background and Why cite data sections as structured now, I think this is a much improved introductory section. However, it seemed to me that the subtitle "Why cite data?" might be more directly tied to what our paper is about by changing it to "Why make cited data accessible?" That is what the section seems to be about ("data is available from the authors upon request" is sort of a citation of their data, but it's not making it easily accessible...) - the only other change there would be to modify the last paragraph "Citing data ..." to "Making cited data available ...".

This seems very logical but I think it presumes the audience already knows about and believes in data citation.  To my mind, that needs to be put on the table and explained first. So I would like to keep this pharsing as it is. 


6. On p. 6 on Unique identification, second paragraph, is there a contradiction between "use any ... widely (and currently) used by a community" vs "Best practice is to choose a scheme that is cross-discipline"? Can a community be cross-discipline?

I think what is intended is that cross-discipline is a plus given the foregoing. I have changed the text accordingly.


7. On p. 16 first paragraph in "Serving landing pages: linking to the data" there seems to be a circular statement in "the landing page should reference the data in the landing page" - or am I missing something?

Changed text to read "The data being described should not be served via this description URI.  Instead, the landing page data descriptions should reference the data."


8. p. 4 DCIG #3 "Common Repository Applications Program Interfaces" - at the least I think there shouldn't be an 's' on Applications. API usually translates to "application programming interface" I believe - see this wikipedia page http://en.wikipedia.org/wiki/Application_programming_interface

fixed

9. In the abstract there's still an "emerging" in the first sentence - I think that was expunged in the text earlier?

fixed


10. Why is the 'P' capitalized for Principles in the JDDCP section (title and first paragraph, but not last paragraph)?

I have now capitalized all occurrences of "Principles".


11. p. 2 I believe JDDCP and DCIG abbreviations should be given directly after their introduction (i.e. "The Joint Declaration of Data Citation Principles (JDDCP) (Data Citation Synthesis Group (2014)) ...")
Fixed.


12. I don't believe FORCE11 is explained anywhere? Also CODATA? Given the publication venue, these are probably not familiar to most readers...
I have put the long name of the CODATA-ICSTI Task Group on Data Citation Standards and Practices in the footnote listing the participants. I think that more explanation than that starts to require footnotes to footnotes. 


13. There are a lot of comma issues... Overuse of commas in most places (for example p. 2 last paragraph "The recommendations outlined here were developed as part of a community process[,] by participants representing a ..." That comma shouldn't be there. Lots of other examples. Also, I think Oxford commas are generally preferred? So in lists of more than 2 items a comma before the and (for instance first sentence on p. 2, "An underlying requirement for verification, reproducibility[] and reusability ..." - [] should have a comma. If there's a consensus on comma style I wouldn't mind going through to fix these in overleaf...

Please feel free to fix the commas. Can you have it done asap and by the end of the day today latest please? 

Arthur Smith

unread,
Jan 8, 2015, 12:12:52 PM1/8/15
to Tim Clark, <idmeta@force11.org>
Ok, I should have some time around 3 pm Eastern to look at this again, so will be plan to editing in overleaf then if that's ok?

   Arthur

Arthur Smith

unread,
Jan 8, 2015, 4:09:41 PM1/8/15
to Tim Clark, <idmeta@force11.org>
Ok, I'm done with commas...

I also fixed a few of what I thought were typos and did one reordering of a sentence - listed below:

*  "This [actually] was one of the first classes of data, ..." - removed "actually" (same word used again later in the same paragraph ...)
* "[However at this] time, sequence data could ..." - changed to "At that time" as it's referring to 1990s.
* "[But today,] the data volumes ..." - changed to "Today the data volumes ..."
* Before the explanatory "The JDDCP is the latest development..." there was a stray word "Two" at the end of the previous paragraph, which I removed.
* "means access [to data and metadata stored in a robust repository], by well-documented Web services ..." - moved the "to data and metadata" bit to later in the sentence
* "Access to the data itself should be indicated through the [appropriate] DCAT fields \texttt{accessURL} or \texttt{downloadURL} as appropriate for the data." - removed first (redundant) "appropriate"
* last section - "... this may [requiring] building systems to ... " - changed to "require"

I think that's all I did aside from adding and removing commas. Hope it looks ok!


   Arthur

On 1/8/15, 12:06 PM, Tim Clark wrote:

Arthur Smith

unread,
Jan 8, 2015, 4:17:35 PM1/8/15
to Tim Clark, <idmeta@force11.org>
One other thought, on point 1 of my earlier comments and Tim's response...


On 1/8/15, 12:06 PM, Tim Clark wrote:
Date: January 8, 2015 at 10:50:54 AM EST
From: Arthur Smith <aps...@aps.org>


1. There is actually some "data" our paper is based on - namely the discussion on this mailing list and the various background documents that have been created. We do link back to the "datacitationimplementation" page in the text, but we don't CITE it (in the bibliography section). should we? Also more relevant are the "Team 2 discussions" and documents which we don't directly reference at all. How many of the JDDCP principles does a citation to that page meet? How many of the implementation recommendations? I think there is some value in showing we believe in what we are talking about, even if this isn't the typical sort of research data we think of...

The way to do this, if we want to do it, would be to deposit all of our documents and discussion text in a compliant or in-the-direction-of-compliant repository, such as Dataverse.  Then cite them as data.This is a lot of work that I am not willing to undertake, but if someone feels strongly enough about it to get it done by the end of today, I am okay with citing data in a repository. It would be really counter to what we are saying to just cite a web page.

Should we maybe run this by the whole DCIG or FORCE11 for thoughts on a policy here? If we would like to think about recommending the documents and discussions be considered as data to be archived, it would be good to be consistent among the various working groups.

   Arthur

Joan Starr

unread,
Jan 8, 2015, 4:22:37 PM1/8/15
to Arthur Smith, Tim Clark, <idmeta@force11.org>

Ugh—“running by” the whole group would take days and days. I think maybe we are getting to deep into the weeds here.

What do others think?

--Joan

To unsubscribe from this group and stop receiving emails from it, send an email to idmeta+un...@force11.org.

Tim Clark

unread,
Jan 8, 2015, 4:33:55 PM1/8/15
to Joan Starr, Arthur Smith, <idmeta@force11.org>
I tend toward the "weeds" view of this, with all respect to Arthur for suggesting this, I think it is a bridge too far at this time. And, it commits us to kind of "recommend" a repository by depositing our stuff there. 

Arthur Smith

unread,
Jan 8, 2015, 4:36:59 PM1/8/15
to Tim Clark, Joan Starr, <idmeta@force11.org>
Well, let's not do it today, but maybe keep it in mind...

   Arthur

Ivan Herman

unread,
Jan 9, 2015, 12:34:11 AM1/9/15
to Joan Starr, Arthur Smith, Tim Clark, <idmeta@force11.org>

> On 08 Jan 2015, at 22:22 , Joan Starr <Joan....@ucop.edu> wrote:
>
> Ugh—“running by” the whole group would take days and days. I think maybe we are getting to deep into the weeds here.
> What do others think?

I think we should ship it.

"Perfect is the enemy of good"[1]

ivan

[1] https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good
> To unsubscribe from this group and stop receiving emails from it, send an email toidmeta+u...@force11.org.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to idmeta+un...@force11.org.


signature.asc

Joe Hourcle

unread,
Jan 9, 2015, 5:45:15 AM1/9/15
to Arthur Smith, <idmeta@force11.org>

On Jan 8, 2015, at 4:15 PM, Arthur Smith wrote:

> One other thought, on point 1 of my earlier comments and Tim's response...
>
> On 1/8/15, 12:06 PM, Tim Clark wrote:
>>
>>> *Date: *January 8, 2015 at 10:50:54 AM EST
>>> *From: *Arthur Smith <aps...@aps.org <mailto:aps...@aps.org>>
>>>
>>>
>>> 1. There is actually some "data" our paper is based on - namely the discussion on this mailing list and the various background documents that have been created. We do link back to the "datacitationimplementation" page in the text, but we don't CITE it (in the bibliography section). should we? Also more relevant are the "Team 2 discussions" and documents which we don't directly reference at all. How many of the JDDCP principles does a citation to that page meet? How many of the implementation recommendations? I think there is some value in showing we believe in what we are talking about, even if this isn't the typical sort of research data we think of...
>>>
>> The way to do this, if we want to do it, would be to deposit all of our documents and discussion text in a compliant or in-the-direction-of-compliant repository, such as Dataverse. Then cite them as data.This is a lot of work that I am not willing to undertake, but if someone feels strongly enough about it to get it done by the end of today, I am okay with citing data in a repository. It would be really counter to what we are saying to just cite a web page.
>
> Should we maybe run this by the whole DCIG or FORCE11 for thoughts on a policy here? If we would like to think about recommending the documents and discussions be considered as data to be archived, it would be good to be consistent among the various working groups.


FORCE11 is broader than just data citation, so definitely a no on that one. If you run it by the DCIG, you should also run it by the DCDG, as publishing is a form of dissemination.

... but if we were going to do that, we should've started the process a whole lot sooner. (and the very least, when we had the first draft submitted to PeerJ)


As for the 'cite the data' ... I could see how you could consider discussions to be data, but I would consider them to fall more into the 'analysis' stage, and not so much 'input'. If our goal was reproducibility (and not transparency), then we should consider all inputs as data, and we have managed to cite a lot of documents that have gone to influence our thinking.

What's difficult to do is cite every document that led to each person's tacit knowledge leading up to the project -- my knowledge of HTTP comes from having managed webservers since 1995. Do I cite my experience managing fark.com and the website for a MUD?

If our goal is 100% transparency ... well, then we should have recorded every telecon, and archive those plus the mailing lists, and any e-mailed side-conversations.
(and in the future, set up a mailbox that's not automatically re-distributed, that people could cc: when having side conversations in e-mail, so it gets captured)

...



So, as some of that is just way beyond where we are now, my recommendations would be:

* Acknowledge people who contributed, so that someone can track down our biographies or CVs to get information on the composition of the group, our backgrounds, and our likely skills.
* I believe that we've done so

* Cite reports derived from meetings that have significantly influenced our discussions.
* For me, I would need to cite:
* BRDI 'For Attributon' (done)
* The principles (currently a footnote)
* The DataCite recommendations
* Duerr et.al "On the utility of identification schemes for digital earth science data", http://dx.doi.org/10.1007/s12145-011-0083-6

(this list could end up getting *huge*, though, as we all have such diverse backgrounds)

* Include as 'data' any documents that were generated as part of our analysis.
* that would be the various Google Docs that we all contributed to
* the attempt to crosswalk the different schemas
* the analysis of different identifiers
* This is summarized in a table in the paper, but were there ones that we decided specifically *not* to mention?
* maybe the list that we built up of different potential techniques to use for serving machine-readable landing pages
* Appendix A doesn't mention the ones we rejected: microformats, XML + XSLT for humans.


I could mitigate some of that last one by adding to the appendix something like:

We recommend HTML for the default target of the URI, to ensure that humans will get a form that is usable to them. Many sites have their identifiers resolve to XML with an XSLT document to transform the information to something human readable. We recommend against the XML+XSLT approach as the primary record due to the moves by authoring tools, CMSes and web browsers to drop support for XSLT.

Footnotes:

https://forums.adobe.com/thread/1236664
http://umbraco.com/follow-us/blog-archive/2011/11/10/saying-goodbye-to-an-old-friend
https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/zIg2KC7PyH0


-Joe



ps. And on the need for cross-discipline approaches to track citation ... Springer lists Ruth & Bob's identifiers paper as having 3 citations ... but Google Scholar says 23. I don't know if this is just peer-reviewed articles vs. something more liberal, a sign that Springer's mining sucks, or evidence to support altmetrics:

http://link.springer.com/article/10.1007/s12145-011-0083-6
http://scholar.google.com/scholar?oi=bibs&hl=en&cites=11978380528053409863&as_sdt=5







Reply all
Reply to author
Forward
0 new messages