Release 4.9 and File DOIs

100 views
Skip to first unread message

Sherry Lake

unread,
Jun 22, 2018, 3:10:54 PM6/22/18
to Dataverse Users Community
Before I create an issue, I wanted to make sure I understand what File DOIs mean in 4.9.

I see Harvard is at 4.9. So looking at a dataset I have there: https://doi.org/10.7910/DVN/29213

it has 14 files. So I assumed that when the Dataverse version upgraded to 4.9 that each of my files in that dataset would get a DOI...

BUT,

on the File pages, all files have the same DOI as the dataset. https://dataverse.harvard.edu/file.xhtml?fileId=2539984&version=1.2
(has the filename in the citation, but the DOI is the same as the dataset).

Am I missing something about File DOIs, or has Harvard not run a script to create file DOIs for existing files?

Thanks.
Sherry

Philip Durbin

unread,
Jun 22, 2018, 3:15:28 PM6/22/18
to dataverse...@googlegroups.com
You're right. We haven't run the script to mint DOIs for older files yet. Files created and published in the last day or so have DOIs.

The step in the release notes is "Run the retroactive file PID registration script or register all file PID endpoint": https://github.com/IQSS/dataverse/releases/tag/v4.9

I hope this helps,

Phil

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/1188ab48-7595-48e7-b08d-db6becdc3338%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Condon, Kevin M

unread,
Jun 22, 2018, 3:17:41 PM6/22/18
to Dataverse Users Community


Hi Sherry,


We have not yet run a script to retroactively register doi's for existing files. We plan to do so over the next few days.


Thanks for paying attention!


Regards,


Kevin


From: dataverse...@googlegroups.com <dataverse...@googlegroups.com> on behalf of Sherry Lake <shla...@gmail.com>
Sent: Friday, June 22, 2018 3:10:54 PM
To: Dataverse Users Community
Subject: [Dataverse-Users] Release 4.9 and File DOIs
 
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

Jim Myers

unread,
Jun 23, 2018, 9:02:10 AM6/23/18
to Dataverse Users Community
A couple notes:

In terms of display, having a file DOI doesn't appear to change the Datafile page text that says "This file is part of "Best Practices in Data Collection and Management Workshop". If you use this file, please cite the dataset:" and the citation box that gives the dataset DOI and filename. On QDR's test system, we do see the File Persistent ID show up farther down the page at the top of the metadata tab, but the citation block is the same.

W.r.t. the script - we've run the script and it appears to create DOIs for draft files as well as published ones. It just loops through DataFileServiceBean.findAll()'s results. To match the way new publications work, I think this API call should filter out the datafiles with no publication date. 

(I also see an issue with older datafile entries in our database where there is no storageidentifier (dvobject table) - they don't get a DOI when the script is run. I haven't tracked down how we got these or whether they are real datafiles  that show up somewhere or just dead entries, or whether we missed some update step that should have generated a storageidentifier or not, but it looks like findAll() doesn't find them.)

-- Jim

Philip Durbin

unread,
Jun 23, 2018, 12:06:33 PM6/23/18
to dataverse...@googlegroups.com
Hi Jim, I'm not quite sure what you're saying about citation but I guess I see what you mean about "registerDataFileAll" but I haven't tested it myself: https://github.com/IQSS/dataverse/blob/v4.9/src/main/java/edu/harvard/iq/dataverse/api/Admin.java#L1040

Please do go ahead and create multiple GitHub issues if you're so inclined. I'm not sure I follow the storageidentifier issue either. Thanks for the feedback!

Phil


--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/913dc5f6-a020-411b-8323-86f700654a34%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Jim Myers

unread,
Jun 23, 2018, 2:04:54 PM6/23/18
to Dataverse Users Community

Phil,

This is what a 4.9 datafile page looks like (image below) after I've generated a DOI via the script. You can see the file DOI down in the metadata (we used the random INDEPENDENT style with a shoulder), but the citation in the blue box suggests citing the dataset and does not include the datafile DOI. So I think Sherry's just looking at the citation block and not seeing a file DOI (I thought file DOIs weren't being generated when I first ran 4.9 - I thought the file DOI would be in the citation box, but it isn't.) Overall, I don't think this part is a bug per se and I think it is probably reasonable to ask people to cite the dataset, but I wonder if the file DOI should also go in the citation text, e.g. "Myers, James. 2018. "testing draft files". Qualitative Data Repository. https://doi.org/10.5072/FK2G6NAI9. QDR Main Collection. DRAFT VERSION; age matrix.pdf (https://doi.org/10.5072/FK2G6NAI9/ZJZMHZ)". If that makes sense, it's probably an easy update but we may need more discussion if others want a direct file citation). I'll add an issue.


You can also see in the image that I've ended up with a DOI for an unpublished/draft datafile. I'll go ahead and add an issue (and probably a quick fix) for that.


-- Jim 




On Saturday, June 23, 2018 at 12:06:33 PM UTC-4, Philip Durbin wrote:
Hi Jim, I'm not quite sure what you're saying about citation but I guess I see what you mean about "registerDataFileAll" but I haven't tested it myself: https://github.com/IQSS/dataverse/blob/v4.9/src/main/java/edu/harvard/iq/dataverse/api/Admin.java#L1040

Please do go ahead and create multiple GitHub issues if you're so inclined. I'm not sure I follow the storageidentifier issue either. Thanks for the feedback!

Phil

On Sat, Jun 23, 2018 at 9:02 AM, Jim Myers <qqm...@hotmail.com> wrote:
A couple notes:

In terms of display, having a file DOI doesn't appear to change the Datafile page text that says "This file is part of "Best Practices in Data Collection and Management Workshop". If you use this file, please cite the dataset:" and the citation box that gives the dataset DOI and filename. On QDR's test system, we do see the File Persistent ID show up farther down the page at the top of the metadata tab, but the citation block is the same.

W.r.t. the script - we've run the script and it appears to create DOIs for draft files as well as published ones. It just loops through DataFileServiceBean.findAll()'s results. To match the way new publications work, I think this API call should filter out the datafiles with no publication date. 

(I also see an issue with older datafile entries in our database where there is no storageidentifier (dvobject table) - they don't get a DOI when the script is run. I haven't tracked down how we got these or whether they are real datafiles  that show up somewhere or just dead entries, or whether we missed some update step that should have generated a storageidentifier or not, but it looks like findAll() doesn't find them.)

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

Durand, Gustavo

unread,
Jun 23, 2018, 2:41:24 PM6/23/18
to dataverse...@googlegroups.com
Jim,

The original plan had been to have both PIDs in the citation; however It was Sebastian from QDR who pointed out:

"About the citation: I do think having two DOIs is unwieldly -- e.g. most reference managers are not going to be able to produce such citations, and they're going to be quite long."

You can see https://github.com/IQSS/dataverse/issues/2438 for the full discussion.

If it makes sense to revisit this, we can, of course. But just note that this was a deliberate decision based on feedback from the community.

Gustavo


On Sat, Jun 23, 2018 at 2:05 PM Jim Myers <qqm...@hotmail.com> wrote:

Phil,

This is what a 4.9 datafile page looks like (image below) after I've generated a DOI via the script. You can see the file DOI down in the metadata (we used the random INDEPENDENT style with a shoulder), but the citation in the blue box suggests citing the dataset and does not include the datafile DOI. So I think Sherry's just looking at the citation block and not seeing a file DOI (I thought file DOIs weren't being generated when I first ran 4.9 - I thought the file DOI would be in the citation box, but it isn't.) Overall, I don't think this part is a bug per se and I think it is probably reasonable to ask people to cite the dataset, but I wonder if the file DOI should also go in the citation text, e.g. "Myers, James. 2018. "testing draft files". Qualitative Data Repository. https://doi.org/10.5072/FK2G6NAI9. QDR Main Collection. DRAFT VERSION; age matrix.pdf (https://doi.org/10.5072/FK2G6NAI9/ZJZMHZ)". If that makes sense, it's probably an easy update but we may need more discussion if others want a direct file citation). I'll add an issue.


You can also see in the image that I've ended up with a DOI for an unpublished/draft datafile. I'll go ahead and add an issue (and probably a quick fix) for that.


-- Jim 




On Saturday, June 23, 2018 at 12:06:33 PM UTC-4, Philip Durbin wrote:
Hi Jim, I'm not quite sure what you're saying about citation but I guess I see what you mean about "registerDataFileAll" but I haven't tested it myself: https://github.com/IQSS/dataverse/blob/v4.9/src/main/java/edu/harvard/iq/dataverse/api/Admin.java#L1040

Please do go ahead and create multiple GitHub issues if you're so inclined. I'm not sure I follow the storageidentifier issue either. Thanks for the feedback!

Phil

On Sat, Jun 23, 2018 at 9:02 AM, Jim Myers <qqm...@hotmail.com> wrote:
A couple notes:

In terms of display, having a file DOI doesn't appear to change the Datafile page text that says "This file is part of "Best Practices in Data Collection and Management Workshop". If you use this file, please cite the dataset:" and the citation box that gives the dataset DOI and filename. On QDR's test system, we do see the File Persistent ID show up farther down the page at the top of the metadata tab, but the citation block is the same.

W.r.t. the script - we've run the script and it appears to create DOIs for draft files as well as published ones. It just loops through DataFileServiceBean.findAll()'s results. To match the way new publications work, I think this API call should filter out the datafiles with no publication date. 

(I also see an issue with older datafile entries in our database where there is no storageidentifier (dvobject table) - they don't get a DOI when the script is run. I haven't tracked down how we got these or whether they are real datafiles  that show up somewhere or just dead entries, or whether we missed some update step that should have generated a storageidentifier or not, but it looks like findAll() doesn't find them.)

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/04c3787c-7725-4b8a-bf3d-a36c17a96856%40googlegroups.com.

Sebastian Karcher

unread,
Jun 23, 2018, 4:35:35 PM6/23/18
to dataverse...@googlegroups.com
I still agree with what I said on the ticket, but figured we'd use the file-level DOI for file citations. To me that's the point of having them.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/CAF2sSedrabGBPMZ-fPwaaJ-4Do7VZbYa0r--somqpArxLTBx_Q%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.



--
Sebastian Karcher, PhD
www.sebastiankarcher.com

Jim Myers

unread,
Jun 24, 2018, 12:32:50 PM6/24/18
to Dataverse Users Community
Thanks - I wasn't aware of the history.  Given that, if others find it confusing to know a file DOI exists and to not see it in the top part of the page, perhaps a direct file citation (as Sebastian suggests) and/or a text change around the citation (e.g. "This file can be referenced via https://doi.org/file-doi, but citing the file as part of the dataset is recommended:" or similar) might be better choices. If QDR ends up wanting a change from the current display, it shouldn't be too hard to make this configurable if there isn't consensus on one choice.

-- Jim
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Sebastian Karcher

unread,
Jun 25, 2018, 12:44:35 PM6/25/18
to Dataverse Users Community
Having looked at this, I still think that not including the file DOI in the recommended citation for files is a loss. My understanding was that one of the purpose of the file-level PIDs was to fulfill the "granular citation" recommendation of the Data Citation Implementation Pilot and when we talk to users, that's how they want to use this (e.g. by stably linking to individual data files from footnotes).

Given that dataset and file DOIs are (I assume) connected via DataCite's "isPartOf" relation, I don't think it should be a problem to not reference the dataset DOI in the citation, but I could be wrong on that. I could also see two separate citations:
- To cite this specific data file, please use
- To cite the whole dataset, please use

I don't know many repositories who have file level DOIs. Here's a Dryad data file landing page:

They display the file-level DOI visibly at the top but only include the dataset one in the recommended citation. Might be worth asking Todd or someone else at Dryad how they're thinking about this. Might also be a good idea to check with the DCIP group on views on this.

Mercè Crosas

unread,
Jun 25, 2018, 2:44:56 PM6/25/18
to dataverse...@googlegroups.com
I know we went back and forth about what to display for the citation of a data file because it's not straight forward - we don't have yet a well established best practice across repositories, so we are helping define the standard. Looking into it now again put in practice, and reading Sebastian's comments, I think that the clearest and most comprehensive way to show the citations in the file page is doing what Sebastian proposes:

- To cite this specific data file, please use (citation with file DOI)
- To cite the whole dataset, please use (citation with dataset DOI)

Merce



----------
Mercè Crosas, Ph.D., Chief Data Science and Technology Officer, IQSS, Harvard University

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Durand, Gustavo

unread,
Jun 25, 2018, 3:18:23 PM6/25/18
to dataverse...@googlegroups.com
That makes sense to me, as well.  First (of hopefully few) questions*:

What is the exact format for the file citation? 

When we first looked at this and in issue #2438, we thought it should be a full citation* for the file, that is use the filename as the title, and unsure what the "author" should be (since the "author" of the file might not be the author of the dataset.) That challenge is what lead to the idea of the current citation but with two PIDs**, and then per the above, we dropped one PID.


So what I'm now thinking is that the file citation is the same citation we now show for the file (with the dataset author and title), and just the PID changed.

What is everyone's thoughts on that?


On Mon, Jun 25, 2018 at 2:45 PM Mercè Crosas <merce....@gmail.com> wrote:
I know we went back and forth about what to display for the citation of a data file because it's not straight forward - we don't have yet a well established best practice across repositories, so we are helping define the standard. Looking into it now again put in practice, and reading Sebastian's comments, I think that the clearest and most comprehensive way to show the citations in the file page is doing what Sebastian proposes:

- To cite this specific data file, please use (citation with file DOI)
- To cite the whole dataset, please use (citation with dataset DOI)

Merce



----------
Mercè Crosas, Ph.D., Chief Data Science and Technology Officer, IQSS, Harvard University

On Mon, Jun 25, 2018 at 12:44 PM, Sebastian Karcher <sebastiank...@u.northwestern.edu> wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

Sonia Barbosa

unread,
Jun 25, 2018, 3:22:57 PM6/25/18
to dataverse...@googlegroups.com
I'm with you on what I was expecting, Gustavo: 

So what I'm now thinking is that the file citation is the same citation we now show for the file (with the dataset author and title), and just the PID changed.


On Mon, Jun 25, 2018 at 3:18 PM, Durand, Gustavo <gdu...@iq.harvard.edu> wrote:
That makes sense to me, as well.  First (of hopefully few) questions*:

What is the exact format for the file citation? 

When we first looked at this and in issue #2438, we thought it should be a full citation* for the file, that is use the filename as the title, and unsure what the "author" should be (since the "author" of the file might not be the author of the dataset.) That challenge is what lead to the idea of the current citation but with two PIDs**, and then per the above, we dropped one PID.


So what I'm now thinking is that the file citation is the same citation we now show for the file (with the dataset author and title), and just the PID changed.

What is everyone's thoughts on that?


On Mon, Jun 25, 2018 at 2:45 PM Mercè Crosas <merce....@gmail.com> wrote:
I know we went back and forth about what to display for the citation of a data file because it's not straight forward - we don't have yet a well established best practice across repositories, so we are helping define the standard. Looking into it now again put in practice, and reading Sebastian's comments, I think that the clearest and most comprehensive way to show the citations in the file page is doing what Sebastian proposes:

- To cite this specific data file, please use (citation with file DOI)
- To cite the whole dataset, please use (citation with dataset DOI)

Merce



----------
Mercè Crosas, Ph.D., Chief Data Science and Technology Officer, IQSS, Harvard University

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/CAF2sSedeUieumuCNFLrL-5VNhWKy8G-t-j_2op3VMTTdE5XGgg%40mail.gmail.com.

Sebastian Karcher

unread,
Jun 25, 2018, 3:51:02 PM6/25/18
to dataverse...@googlegroups.com
We just talked about this in a meeting, and the favorite here would model other citations of a part of a hole (like a book chapter or an article in a journal), i.e. filename first, followed by dataset as I understand were the original plans. So looking at https://demo.dataverse.org/file.xhtml?fileId=29368&version=3.0

something like:
Gautier (non-SU), Julian, 2018, "Screen Shot 2018-05-23 at 11.39.32 AM.png", Testing behavior of "In Review" message, V3, Demo Dataverse, https://doi.org/10.5072/FK2/LCEK7V/<filedoiextension>

but I'm not sure this is completely clear, but it does make most sense, I think, in how citation styles work in general. E.g. when you cite a book chapter in an edited book such as

You'd cite it (in APA style) as
Erichsen, J. T., & Woodhouse, J. M. (2012). Human and Animal Vision. In Machine Vision Handbook (pp. 89–115). Springer, London. https://doi.org/10.1007/978-1-84996-169-1_3 

Where the DOI refers to the chapter, not the book (which has a separate DOI:  10.1007/978-1-84996-169-1  ) .
But as Merce says -- there are no clear established rules for this at all, so I'd recommend reaching out a bit, both to the Dataverse community & maybe to some other experts before making a final call -- I think it should be easy to get some feedback quickly.

(and sorry for not seeing this earlier, I missed the last bit of the discussion on the citation format for files or I would have chimed in before).

On Mon, Jun 25, 2018 at 3:22 PM, Sonia Barbosa <soni...@gmail.com> wrote:
I'm with you on what I was expecting, Gustavo: 
So what I'm now thinking is that the file citation is the same citation we now show for the file (with the dataset author and title), and just the PID changed.


On Mon, Jun 25, 2018 at 3:18 PM, Durand, Gustavo <gdu...@iq.harvard.edu> wrote:
That makes sense to me, as well.  First (of hopefully few) questions*:

What is the exact format for the file citation? 

When we first looked at this and in issue #2438, we thought it should be a full citation* for the file, that is use the filename as the title, and unsure what the "author" should be (since the "author" of the file might not be the author of the dataset.) That challenge is what lead to the idea of the current citation but with two PIDs**, and then per the above, we dropped one PID.


So what I'm now thinking is that the file citation is the same citation we now show for the file (with the dataset author and title), and just the PID changed.

What is everyone's thoughts on that?


On Mon, Jun 25, 2018 at 2:45 PM Mercè Crosas <merce....@gmail.com> wrote:
I know we went back and forth about what to display for the citation of a data file because it's not straight forward - we don't have yet a well established best practice across repositories, so we are helping define the standard. Looking into it now again put in practice, and reading Sebastian's comments, I think that the clearest and most comprehensive way to show the citations in the file page is doing what Sebastian proposes:

- To cite this specific data file, please use (citation with file DOI)
- To cite the whole dataset, please use (citation with dataset DOI)

Merce



----------
Mercè Crosas, Ph.D., Chief Data Science and Technology Officer, IQSS, Harvard University

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

To post to this group, send email to dataverse-community@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

To post to this group, send email to dataverse-community@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

To post to this group, send email to dataverse-community@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Sebastian Karcher, PhD
www.sebastiankarcher.com

Philipp at UiT

unread,
Jun 26, 2018, 1:56:24 AM6/26/18
to Dataverse Users Community
I agree with Sebastian's proposal; cf. Merce's summary:

- To cite this specific data file, please use (citation with file DOI)
- To cite the whole dataset, please use (citation with dataset DOI)

Also, I think the file reference should be built up similarly to a book chapter reference, i.e. mentioning the dataset the file is a part of. I can't see any need to make data citation unnecessarily different from literature citation.

Best,
Philipp
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

Sherry Lake

unread,
Jun 26, 2018, 12:01:52 PM6/26/18
to Dataverse Users Community
Great community conversation!!

I agree with Sebastian & Philipp on how I think Datafiles should be cited.

--
Sherry

Eugene Barsky

unread,
Jun 27, 2018, 12:13:58 PM6/27/18
to Dataverse Users Community
I am also in agreement with:


- To cite this specific data file, please use (study citation with file DOI)
- To cite the whole dataset, please use (study citation with dataset DOI)


Eugene

Sonia Barbosa

unread,
Jun 27, 2018, 12:20:14 PM6/27/18
to dataverse...@googlegroups.com
Agreed

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Philipp at UiT

unread,
Sep 24, 2018, 6:08:43 AM9/24/18
to Dataverse Users Community
I'm about to mint DOIs for the first datasets in our archive that have handles only. During this work I noticed how new file DOIs from Dataverse are represented in DataCite Fabrica. As you see from the screenshot below, file DOIs are marked with "Dataset" in the same way as dataset DOIs are. I guess this is because the Dataverse default value of the metadata field @type is "dataset"? This field is mapped onto the field ResourceType in the DataCite Metadata Schema 4.0. Should we add "Data file" here, so that the value of ResourceType at file level would be "Data/Data file"?

Best,
Philipp

FileDOI.png

Agreed

To post to this group, send email to dataverse...@googlegroups.com.

Philip Durbin

unread,
Sep 24, 2018, 9:11:47 AM9/24/18
to dataverse...@googlegroups.com
I'm pretty sure "Dataset" is coming from <resourceType resourceTypeGeneral="Dataset"/> at https://github.com/IQSS/dataverse/blob/v4.9.2/src/main/resources/edu/harvard/iq/dataverse/datacite_metadata_template.xml#L12 which is referenced from https://github.com/IQSS/dataverse/blob/v4.9.2/src/main/java/edu/harvard/iq/dataverse/DOIDataCiteRegisterService.java#L279 . As you can see, it's hard coded to "Dataset". You're saying that for files it should be something other that "Dataset", right? "File" or whatever. If so, can you please open a GitHub issue about this? We recently worked on this part of the code at https://github.com/IQSS/dataverse/pull/4795 for https://github.com/IQSS/dataverse/issues/4782 if you'd like to take a look.

Thanks!

Phil

Agreed

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

julian...@g.harvard.edu

unread,
Sep 24, 2018, 11:34:17 AM9/24/18
to Dataverse Users Community
It looks like this would be helpful so that in the DataCite Fabrica UI you can see when an object is a dataset and when it's a file. Is that right?

From what I understand, resourceTypeGeneral requires a term from a controlled vocabulary, and dataset is one of the terms, defined in the appendix as "Data file or files". The others in DataCite 4.0 are:

Audiovisual Collection Event Image InteractiveResource Model PhysicalObject Service Software Sound Text15 Workflow Other

I don't know if there's any functional drawback to using "Dataset" to label both datasets and files, as we define them. But I think we'd have to ping the folks at DataCite.

Philipp at UiT

unread,
Sep 24, 2018, 11:52:04 AM9/24/18
to Dataverse Users Community
I just created a GitHub issue on this matter; cf. #5086.

Best,
Philipp
Reply all
Reply to author
Forward
0 new messages