Draft datasets being used in scientific publications

137 views
Skip to first unread message

Emma Walton

unread,
Jul 1, 2025, 6:22:03 AMJul 1
to Dataverse Users Community
Hi,

I'm working in the management group of DataverseNO - Norway's national installation of Dataverse that is operated by UiT. Some of our partner institutions are reporting that some datasets are remaining in draft status for a long time, without ever being submitted for review, and a bit of investigating has even discovered that (inactive) DOIs of these draft datasets are being used in scientific publications (a bit of a whoopsie on the journal's behalf, but it happens...). The deponents in these cases claim not to have understood that dataset needed to undergo curation before it could be published. 

This is a minority of cases, but has anyone else experienced this issue? I don't think better guidance here will solve it, as the guidance is not read in the first place. Perhaps one solution could be that when a deponent creates a draft dataset, a pop-up window could appear reminding them that this is a draft dataset and that they need to submit it for review before it can be published?

Thanks in advance for any thoughts/experiences you may have had on the matter!

Emma 

Philip Durbin

unread,
Jul 1, 2025, 3:55:08 PMJul 1
to dataverse...@googlegroups.com
Hi Emma,

I'm reminded of https://github.com/IQSS/dataverse/issues/7675 from a while ago where we said this:

"We have a few spots in the application where we've been asked to add an additional nudge to get people to publish so their changes can actually take effect. The curation team is finding that some content is unexpectedly left unpublished or not submitted for review as expected. This is part of a larger effort around these workflows, but some text changes are a good first step."

This lead to this message being added, for example, when someone creates a dataset:

"Info – This draft version needs to be published. When ready for sharing, please publish it so that others can see these changes."

However, I'm sure there is more we could do in the Dataverse software itself. At one point we created some mockups* (see attached) where a to-do list would be shown to the dataset author to remind them to perform important steps such as reviewing (and possibly changing) terms of use or file permissions. Last on that to-do list was a reminder to publish (or to submit for review). The to-do list was never implemented but we still talk about now and then.

All this is to say we're open to ideas! Please keep the questions and suggestions coming! I'd love to hear how others deal with this situation of datasets being left in draft.

Thanks,

Phil

p.s. You're right. Sadly, guidance is not always read! 😄



--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/f57eede7-a63b-42f9-bc26-3a2deef8cf20n%40googlegroups.com.


--
Screenshot 2025-07-01 at 3.42.40 PM.png

Julian Gautier

unread,
Jul 1, 2025, 5:01:13 PMJul 1
to Dataverse Users Community
Wow, those conversation in https://github.com/IQSS/dataverse/issues/7675 and related issues are a blast from the past!

Back then I was tracking the number of drafts created in the "Root" collection of Harvard Dataverse, since those datasets are usually published by folks who are new to Dataverse and maybe even new publishing datasets in general. And I was tracking how long those datasets had been unpublished, as one way to measure how effective the changes we made were, like those messaging changes Phil pointed out. 

Emma, do you think doing this sort of measurement would make sense for DataverseNO? I think it might depend on how many datasets are created where the depositors are expected to curate and publish their datasets, as opposed to repository staff and other experts.

Along with status messages and the to-list, the redesign of that stack of buttons on the dataset page included plans for the publish or the submit for review button to be more prominent for people who were able to submit or publish the dataset, as a kind of call to action.

Looking back now, I think the redesign was changed eventually mostly because folks weren't sure if it was worth the effort. For example, is it worth the effort to make the buttons be different colors based on which type of account was looking at the page? I think the sort of measuring I mentioned and other kinds of research can help us see if the changes we make are working and if and when it's worth it to try other things.
Message has been deleted

Emma Walton

unread,
Jul 2, 2025, 5:05:15 AMJul 2
to dataverse...@googlegroups.com
Hi guys,

Thanks for your quick replies here and for sharing history of the discussion/developments so far to address this issue. I'll take this forward to the management group and we discuss further how to proceed. 

In DataverseNO, all of the datasets must undergo curation - which, depending on the quality of the dataset - can take some time. Perhaps any eventual new "pop-up" or other visualisation to our users could also help to convey that message too, i.e. that the dataset cannot be published instantaneously (important to bear in mind if an open link to the dataset is required for project reporting for example, which is time sensitive).

Anyway, thanks again! 

James Myers

unread,
Jul 2, 2025, 11:46:41 AMJul 2
to dataverse...@googlegroups.com

Emma,

FWIW: The dcpidreport.py script at https://github.com/gdcc/dataverse-recipes/pull/26 we use at QDR might be useful for you. It queries DataCite to find failed DOI resolutions – which often occur because someone has posted a draft DOI somewhere. (There is a second script in that draft PR but it requires some changes to Dataverse to run so is not yet an option.)

 

-Jim

Philipp Conzett

unread,
Jul 3, 2025, 12:47:38 AMJul 3
to Dataverse Users Community
Thank you all for your input!

@Jim: Thanks for sharing the script. Is my understanding correct that the script will return a list of all non-resolvable DOIs starting with the prefix used by a given Dataverse installation, e.g., 10.18710 for DataverseNO? So there is no need to create a DOI list of unpublished datasets in the installation?

Best,
Philipp

James Myers

unread,
Jul 3, 2025, 7:21:40 AMJul 3
to dataverse...@googlegroups.com

Philipp,

It doesn’t require finding unpublished datasets, but, looking again, it may only return the top-ten. They also have a page - https://stats.datacite.org/resolutions.html#tab-resolution-report – where, if you choose monthly and then find your account, you can see the top-ten failed resolutions as well. I think the script is just picking up those same ten entries. (I guess another advantage of the internal checking through Dataverse is that it is complete).

 

Also note that there is a request at https://github.com/datacite/datacite-suggestions/discussions/105 for DataCite to provide complete stats instead of top-ten. It focuses on successes but adding a note there that seeing all failures would be helpful as well might be worthwhile.

 

-- Jim

Dieuwertje Bloemen

unread,
Jul 4, 2025, 12:11:35 PMJul 4
to Dataverse Users Community
We have similar issues, from people forgetting to press "submit for review", but also forgetting they even created a draft in the first place. We currently do a clean-up action once a year for all drafts older than 12 months, where we email them to ask if they wish to publish (and how to do so) or if we can delete the drafts. This is also in an attempt to prevent people from using the Dataverse as storage without publishing. Generally this results either in deletion or in people publishing the dataset. I think maybe something similar to that could also make sense? Where a Dataverse can enable some kind of reminder emails for unpublished dataset drafts (or all drafts if preferred) after a chosen period of time. Something an installation can configure (turn on or off, what kind of drafts & after what time interval).

Emma Walton

unread,
Sep 11, 2025, 8:46:41 AMSep 11
to dataverse...@googlegroups.com
Hi Dieuwertje,
I'm just picking up the thread here again after the summer break.
Thanks for your suggestion. I think it's actually much better than my original one as all current reminders about a dataset being in draft are in the system (so a deponent would have to be logged on and working with the draft anyway to know).
I have created a feature request for this here: https://github.com/IQSS/dataverse/issues/11812
Hopefully, if it's implemented, this could reduce the workload on collection managers like yourself with regards to your yearly clean up action! 

Reply all
Reply to author
Forward
0 new messages