Recently can not publish dataset with file (with Datacite)

188 views
Skip to first unread message

Thanh Thanh Le

unread,
Jan 3, 2019, 2:49:48 AM1/3/19
to Dataverse Users Community
Hello everyone,

Firstly, Happy New Year to everybody ! ^^

I want to ask, if anyone of you have the problem publishing a dataset with file ?
Recently, we can not publish a dataset if a file is added : the operation is not done without any particular error (IHM or logs).
We have dataverse 4.9.2 on test env and 4.9.1 on production, connecting to Datacite. And the problem occurs in the 2 environnements.


Thanks in advance,

Best regards,

Thanh Thanh

Philip Durbin

unread,
Jan 3, 2019, 7:25:11 AM1/3/19
to dataverse...@googlegroups.com
Hi Thanh Thanh,

Yes, unfortunately I've seen this on test environments (my laptop and two test servers) and wrote a little bit about it at https://github.com/IQSS/dataverse/issues/5393#issuecomment-447441742

For my laptop and one of my test servers ("phoenix", see https://github.com/IQSS/dataverse/issues/5409 ) I've switched over to a new "fake" persistent ID provider that's only available as of Dataverse 4.10: http://guides.dataverse.org/en/4.10/developers/dev-environment.html#configure-your-development-environment-for-publishing

I'll probably move my other test server over to the fake PID provider as well but it's only a solution for test environments, not for production. I didn't know production installations were affected! Can you please create a GitHub issue about this?

Thanks,

Phil


--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/f0663875-67f6-477e-b05b-117e9cbe040f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

Thanh Thanh Le

unread,
Jan 3, 2019, 9:13:58 AM1/3/19
to Dataverse Users Community
Hi Phil,

Thanks for your quick answer.
If I understand well, the datacite services with test prefix have the problem these days and the workaround with FAKE provider is from the 4.10 version of dataverse ?! 

And I re-check the problem on Production, what I have is the following :
- We have 1 new dataset published 2 days ago (with files) 
- We have always the publish issue on a dataset with a zip file of 8.8Mb ( the issue first appears on 20th december 2018).

So, I can not say for sure whether the problem does exist. 
Maybe it was caused by the unavailable of datacite services on 20th-21th december that makes something goes wrong with just this dataset (re-publish impossible)?

Many thanks!!!

Best regards,
Thanh Thanh


On Thursday, 3 January 2019 13:25:11 UTC+1, Philip Durbin wrote:
Hi Thanh Thanh,

Yes, unfortunately I've seen this on test environments (my laptop and two test servers) and wrote a little bit about it at https://github.com/IQSS/dataverse/issues/5393#issuecomment-447441742

For my laptop and one of my test servers ("phoenix", see https://github.com/IQSS/dataverse/issues/5409 ) I've switched over to a new "fake" persistent ID provider that's only available as of Dataverse 4.10: http://guides.dataverse.org/en/4.10/developers/dev-environment.html#configure-your-development-environment-for-publishing

I'll probably move my other test server over to the fake PID provider as well but it's only a solution for test environments, not for production. I didn't know production installations were affected! Can you please create a GitHub issue about this?

Thanks,

Phil


On Thu, Jan 3, 2019 at 2:49 AM Thanh Thanh Le <lethithan...@gmail.com> wrote:
Hello everyone,

Firstly, Happy New Year to everybody ! ^^

I want to ask, if anyone of you have the problem publishing a dataset with file ?
Recently, we can not publish a dataset if a file is added : the operation is not done without any particular error (IHM or logs).
We have dataverse 4.9.2 on test env and 4.9.1 on production, connecting to Datacite. And the problem occurs in the 2 environnements.


Thanks in advance,

Best regards,

Thanh Thanh

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Don Sizemore

unread,
Jan 3, 2019, 9:29:19 AM1/3/19
to dataverse...@googlegroups.com
Hello,

To my understanding Dataverse handles the identifier reservation, not Datacite, so Datacite's being temporarily unavailable shouldn't affect unpublished datasets or files once they come back.
If you have access to the database, is the publicationdate field empty for the datasets or files in question?

Donald

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

Thanh Thanh Le

unread,
Jan 3, 2019, 9:59:55 AM1/3/19
to Dataverse Users Community
Hi Donald,

I've just cheked in the database (in dvobject table, publicationdate column) :  
+ for the dataset : first publish date (this dataset has already 2 majors et on minor versions)
+ for the related file : nothing 

Thx in advance,

Thanh Thanh
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

danny...@g.harvard.edu

unread,
Jan 3, 2019, 10:02:47 AM1/3/19
to Dataverse Users Community
Hi everyone,

When we got back from break yesterday, we started investigating some reported issues with publishing on Harvard's Dataverse. Some Datasets publish but others do not. 

We think this is limited to the Harvard installation, but this thread leads me to believe other installations using DataCite may be seeing issues as well. I'll update this as we uncover more information.

- Danny
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.

James Myers

unread,
Jan 3, 2019, 2:52:17 PM1/3/19
to dataverse...@googlegroups.com

All,

An update on this issue based on some community debugging…

 

It appears that DataCite made a change during their maintenance period that exposed a minor bug in Dataverse.

What’s Happening/Symptoms: When checking whether newly minted IDs for files already exist at the provider, Dataverse was accidentally sending a query about “null” rather than the new ID. Previously DataCite had responded (we assume) with a 404/not found response and publication proceeded (it would have missed if the DOI was already registered at DataCite but was not in the Dataverse database already).

After their update, they started sending a 200 response with the first page of a list of 16M DOIs. That is interpreted by Dataverse as meaning the new file ID already exists and it enters a loop to keep generating new IDs. That ~infinite loop either fails with a timeout after several minutes with no log message, or causes DataCite to bog down and respond with a 502 gateway timeout – which also causes publication to fail but does show an exception in the log.

Who/what’s affected: 

Assuming this is the only issue, it only affects installations configured to use file IDs and it only affects the publication of versions including new files (v1.0 or later versions where a file has been added), which probably helps explain why it appears intermittent.

 Fix/work-around:

In https://github.com/IQSS/dataverse/issues/5427  and the associated https://github.com/IQSS/dataverse/pull/5428, there’s a fix that Dataverse should be able to roll into a 4.10.1 release – timing TBD @ IQSS.  The fix sends the real DOI to DataCite (or other provider), fixing the bug, and allowing publication to succeed.

 In the meantime, and for 4.9.x versions, the only work-around there is so far is to disable file DOIs until one updates to a 4.10.x version with this fix. (The same /api/admin/registerDataFileAll used to convert to file DOIs to start should pick up any from publications while file DOIs are turned off. )

(I plan to report this to DataCite, but it is only partly their issue, so I don’t expect anything quick from their end that would fix things for 4.9.x installations…)

Other issues:

If anyone is seeing a publishing problem that doesn’t fit this picture, please create a new issue.

-- Jim

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.


To post to this group, send email to dataverse...@googlegroups.com.

--

You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.


To post to this group, send email to dataverse...@googlegroups.com.

Durand, Gustavo

unread,
Jan 3, 2019, 2:54:26 PM1/3/19
to dataverse...@googlegroups.com
Hi All,

Update on this:

We believe we have discovered the issue (Thanks Jim and Leonid!) and have a pull request in progress. Once this PR has been reviewed, tested, and merged, we will cut a 4.10.1 release.

Short version:
DataCite made a change during their maintenance period that exposed a minor bug in Dataverse and how we handled File PIDs.
Longer version:

While you're waiting for 4.10.1, we suggest you turn off File PIDs. Once the release is made, the same /api/admin/registerDataFileAll used to convert to file DOIs to start should pick up any from publications while file DOIs are turned off.

Thanks,
Gustavo


On Thu, Jan 3, 2019 at 10:02 AM danny...@g.harvard.edu <danny...@g.harvard.edu> wrote:
Hi everyone,

When we got back from break yesterday, we started investigating some reported issues with publishing on Harvard's Dataverse. Some Datasets publish but others do not. 

We think this is limited to the Harvard installation, but this thread leads me to believe other installations using DataCite may be seeing issues as well. I'll update this as we uncover more information.

- Danny

On Thursday, January 3, 2019 at 9:29:19 AM UTC-5, Donald Sizemore II wrote:

Thanh Thanh Le

unread,
Jan 4, 2019, 3:37:38 AM1/4/19
to Dataverse Users Community
Hi all,

Thanks for all the update and useful information!

However, we're using dataverse 4.9.1 on production, and the option to turn off file registration is not present if I am correct. (we can add the option with false value but it is not taken into consideration).
How can we do for this bug please in this case ?

Thanks in advance,
Thanh Thanh
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Don Sizemore

unread,
Jan 4, 2019, 8:46:36 AM1/4/19
to dataverse...@googlegroups.com
Hello,

If you can step-upgrade to 4.9.3 you should gain the :FilePIDsEnabled setting. 4.9.3 requires an in-place re-index but otherwise the 4.9 point releases are fairly painless upgrades.

Donald

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

James Myers

unread,
Jan 9, 2019, 7:46:50 AM1/9/19
to dataverse...@googlegroups.com

For those not watching the IQSS/Dataverse list:

 

DataCite has fixed the issue on their end. I’ve verified that this allows QDR’s v4.9.4 installation to publish again. With that, the only effect of the bug in Dataverse (fixed with v4.10.1) is that Dataverse would not catch the situation where the DOI being created for a file during publication duplicates one already at DataCite but not created by Dataverse. This can only occur if you have multiple Dataverse instances using the same authority/shoulder or have some other system creating DOIs with that authority/shoulder.  (Dataverse will catch the case where the DOI matches one already created in that Dataverse and will generate a different one).

 

-- Jim

 

From: qqm...@hotmail.com
Sent: Thursday, January 03, 2019 2:52 PM
To: Dataverse Users Community
Subject: RE: [Dataverse-Users] Recently can not publish dataset with file (with Datacite)

 

All,

An update on this issue based on some community debugging…

 

It appears that DataCite made a change during their maintenance period that exposed a minor bug in Dataverse.

What’s Happening/Symptoms: When checking whether newly minted IDs for files already exist at the provider, Dataverse was accidentally sending a query about “null” rather than the new ID. Previously DataCite had responded (we assume) with a 404/not found response and publication proceeded (it would have missed if the DOI was already registered at DataCite but was not in the Dataverse database already).

After their update, they started sending a 200 response with the first page of a list of 16M DOIs. That is interpreted by Dataverse as meaning the new file ID already exists and it enters a loop to keep generating new IDs. That ~infinite loop either fails with a timeout after several minutes with no log message, or causes DataCite to bog down and respond with a 502 gateway timeout – which also causes publication to fail but does show an exception in the log.

Who/what’s affected: 

Assuming this is the only issue, it only affects installations configured to use file IDs and it only affects the publication of versions including new files (v1.0 or later versions where a file has been added), which probably helps explain why it appears intermittent.

 Fix/work-around:

In https://github.com/IQSS/dataverse/issues/5427  and the associated https://github.com/IQSS/dataverse/pull/5428, there’s a fix that Dataverse should be able to roll into a 4.10.1 release – timing TBD @ IQSS.  The fix sends the real DOI to DataCite (or other provider), fixing the bug, and allowing publication to succeed.

 In the meantime, and for 4.9.x versions, the only work-around there is so far is to disable file DOIs until one updates to a 4.10.x version with this fix. (The same /api/admin/registerDataFileAll used to convert to file DOIs to start should pick up any from publications while file DOIs are turned off. )

(I plan to report this to DataCite, but it is only partly their issue, so I don’t expect anything quick from their end that would fix things for 4.9.x installations…)

Other issues:

If anyone is seeing a publishing problem that doesn’t fit this picture, please create a new issue.

-- Jim

 

 

From: dataverse...@googlegroups.com [mailto:dataverse...@googlegroups.com] On Behalf Of danny...@g.harvard.edu
Sent: Thursday, January 03, 2019 10:03 AM
To: Dataverse Users Community
Subject: Re: [Dataverse-Users] Recently can not publish dataset with file (with Datacite)

 

Hi everyone,

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.


To post to this group, send email to dataverse...@googlegroups.com.

--

You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.


To post to this group, send email to dataverse...@googlegroups.com.

Philip Durbin

unread,
Jan 9, 2019, 9:15:23 AM1/9/19
to dataverse...@googlegroups.com
Hi Thanh Thanh,

I just wanted to make sure you saw Jim's post on this thread which explains that three hours ago DataCite fixed a bug that might help you publish even without upgrading Dataverse. So please try again. :)


Here's the announcement of the DataCite fix from three hours ago (thanks, Martin!): https://github.com/IQSS/dataverse/issues/5427#issuecomment-452664052

This is not to discourage you from upgrading, of course. Don replied on this thread saying that the 4.9.x upgrades are "fairly painless" :) Please see https://groups.google.com/d/msg/dataverse-community/WJ6sTgKfNI4/3XowxxiTDQAJ

I hope this helps,

Phil

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Thanh Thanh Le

unread,
Jan 9, 2019, 10:26:12 AM1/9/19
to dataverse...@googlegroups.com
Hi all,

Thanks for all the updates. 
I was searching for an alternative solution because we can not immediately upgrade to 4.10.1, not to 4.9.4 either because of its  bug on french local.

This is really a great news to us. 
Again, thanks for all the help.

Best regards,
Thanh Thanh

Reply all
Reply to author
Forward
0 new messages