OAI-PMH endpoint validation issue

90 views
Skip to first unread message

Nicholas Woodward

unread,
Nov 20, 2019, 12:44:36 PM11/20/19
to Dataverse Users Community
Hi,
We're attempting to validate our OAI-PMH endpoint in our Dataverse (version 4.17) with the Open Archives Initiative, and the process is failing for an odd reason. 

We have two sets: the DEFAULT one and another set up as per the instructions with "dsPersistenId:doi:/XX.XXXXX". When I look in the Dashboard they contain the same number of Datasets.

The only validation failure with Open Archives occurs when the OAI feed gets a record (https://dataverse.tdl.org/oai?verb=GetRecord&identifier=doi%3A10%2E18738/T8/1Z6NGR&metadataPrefix=oai_dc) whose datestamp is 2019-07-16. But the record isn't listed when getting records from that time period (https://dataverse.tdl.org/oai?verb=ListRecords&from=2019-07-16T07:00:04Z&until=2019-07-16T07:00:04Z&metadataPrefix=oai_dc)

Yet if you specify the set from the dsPersistentId in the same time period query the record exists (https://dataverse.tdl.org/oai?verb=ListRecords&set=TDR&from=2019-07-16T07:00:04Z&until=2019-07-16T07:00:04Z&metadataPrefix=oai_dc). 

So the issue is there's a record with two different timestamps that appear to be based on which set it's in (default or dsPersistenId). Has anyone else run into this issue? Is there something else we need to do to fix this one validation failure with OA? We've tried running exports on both sets in the Dashboard, but the difference in datestamps persists. 


Thanks,
Nick

Philip Durbin

unread,
Nov 21, 2019, 8:41:54 AM11/21/19
to dataverse...@googlegroups.com
Hi Nick,

What's weird is that the dataset you mentioned only has one version with a publication date of Feb 2019 but like you're saying, it has a timestamp of July 2019 from OAI-PMH:

- July 2019: https://dataverse.tdl.org/oai?verb=GetRecord&identifier=doi:10.18738/T8/1Z6NGR&metadataPrefix=oai_dc

All that said, can you please share the details of the validation error? Does the validation error have to do with dates?

I tried to get the details myself at https://www.openarchives.org/Register/ValidateSite but I think all I did was cause an email to be sent to TDL support judging from the attached screenshot. Sorry!

Thanks,

Phil

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/f7a41fa9-a79f-426f-b822-67d81ab841f5%40googlegroups.com.


--
Screen Shot 2019-11-21 at 8.36.23 AM.png

Nicholas Woodward

unread,
Nov 21, 2019, 12:37:31 PM11/21/19
to Dataverse Users Community
Hi Phil,
Thanks for your response. You're correct. The validation process produces an email and a key that's temporarily valid on their site. So I believe this URL from your request will be valid for a little while longer - http://www.openarchives.org/Register/ValidateSite?log=WD0QVLS8. If not, the log from the single failure is below: 


Checking GetRecord response

PASS responseDate has correct format: 2019-11-21T14:03:02Z
PASS Datestamp in GetRecord response (2019-07-16T07:00:04Z) has the correct form for seconds granularity.
PASS Datestamp in GetRecord response (2019-07-16T07:00:04Z) matched the seconds granularity specified in the Identify response.
PASS Expected setSpec was returned in the response

Checking ListRecords response

PASS responseDate has correct format: 2019-11-21T14:03:02Z
PASS Response is well formed
FAIL ListRecords response gave a noRecordsMatch error when it should have included at least the record with identifier doi:10.18738/T8/1Z6NGR. The from and until parameters of the request were set to the datestamp of this record (2019-07-16T07:00:04Z). The from and until parameters are inclusive, see protocol spec section 2.7.1. The message included in the error response was: 'No matches for the query'


Thanks,
Nick



On Thursday, November 21, 2019 at 7:41:54 AM UTC-6, Philip Durbin wrote:
Hi Nick,

What's weird is that the dataset you mentioned only has one version with a publication date of Feb 2019 but like you're saying, it has a timestamp of July 2019 from OAI-PMH:

- July 2019: https://dataverse.tdl.org/oai?verb=GetRecord&identifier=doi:10.18738/T8/1Z6NGR&metadataPrefix=oai_dc

All that said, can you please share the details of the validation error? Does the validation error have to do with dates?

I tried to get the details myself at https://www.openarchives.org/Register/ValidateSite but I think all I did was cause an email to be sent to TDL support judging from the attached screenshot. Sorry!

Thanks,

Phil

On Wed, Nov 20, 2019 at 12:44 PM Nicholas Woodward <woodward....@gmail.com> wrote:
Hi,
We're attempting to validate our OAI-PMH endpoint in our Dataverse (version 4.17) with the Open Archives Initiative, and the process is failing for an odd reason. 

We have two sets: the DEFAULT one and another set up as per the instructions with "dsPersistenId:doi:/XX.XXXXX". When I look in the Dashboard they contain the same number of Datasets.

The only validation failure with Open Archives occurs when the OAI feed gets a record (https://dataverse.tdl.org/oai?verb=GetRecord&identifier=doi%3A10%2E18738/T8/1Z6NGR&metadataPrefix=oai_dc) whose datestamp is 2019-07-16. But the record isn't listed when getting records from that time period (https://dataverse.tdl.org/oai?verb=ListRecords&from=2019-07-16T07:00:04Z&until=2019-07-16T07:00:04Z&metadataPrefix=oai_dc)

Yet if you specify the set from the dsPersistentId in the same time period query the record exists (https://dataverse.tdl.org/oai?verb=ListRecords&set=TDR&from=2019-07-16T07:00:04Z&until=2019-07-16T07:00:04Z&metadataPrefix=oai_dc). 

So the issue is there's a record with two different timestamps that appear to be based on which set it's in (default or dsPersistenId). Has anyone else run into this issue? Is there something else we need to do to fix this one validation failure with OA? We've tried running exports on both sets in the Dashboard, but the difference in datestamps persists. 


Thanks,
Nick

--


Philip Durbin

unread,
Nov 21, 2019, 4:34:40 PM11/21/19
to dataverse...@googlegroups.com
Thanks! I grabbed a screenshot (attached) from that link so we can study it later. It's so close to passing!

Unfortunately, I'm not very familiar with Dataverse's harvesting feature or the code behind it. I look at quick look at this class but I'm not sure what's going on: https://github.com/IQSS/dataverse/blob/v4.18.1/src/main/java/edu/harvard/iq/dataverse/harvest/server/xoai/XlistRecordsHandler.java

At this point, I'd suggest emailing sup...@dataverse.org to create a ticket so that someone with more familiarity with harvesting can take a look. Then we can summarize for the list once we get to the bottom of it.

My only other thought at the moment is that perhaps some other installations of Dataverse can try running https://www.openarchives.org/Register/ValidateSite to see if it passes or fails.

Sorry I couldn't help more.

Phil

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
OAI-PMH_Data_Provider_Validation_and_Registration_-_2019-11-21_16.25.58.png

Sherry Lake

unread,
Nov 22, 2019, 11:03:10 AM11/22/19
to dataverse...@googlegroups.com
I tried running the Validator (via the ValidateSite URL below) and got an error right away with my "adminEmail".

FAIL adminEmail 'Libra Data Support ' looks bogus. This must be corrected to a valid email address before tests can continue
FAIL ABORT: looks like bogus adminEmail, aborting.

Here's what our Dataverse db has:
":SystemEmail":"Libra Data Support <li...@virginia.edu>"

Which works, but obviously breaks the OAI-PMH validator.

FYI: the Dataverse Config guide has an example with the address "text", but the validator doesn't seem to like the text.

I'll change my system Email and try again.

--
Sherry

Philip Durbin

unread,
Nov 22, 2019, 11:26:12 AM11/22/19
to dataverse...@googlegroups.com
Yeah, unfortunately, that :SystemEmail problem is a known issue. Please see "adminEmail field invalid in OAI-PMH Identify response" at https://github.com/IQSS/dataverse/issues/3619

Please do report back after you try again!

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

Nicholas Woodward

unread,
Nov 22, 2019, 12:41:43 PM11/22/19
to Dataverse Users Community
Thanks for taking a look, Phil. I posted the same message to sup...@dataverse.org to create a ticket. We're happy to help troubleshoot the issue in any way.

Nick



On Thursday, November 21, 2019 at 3:34:40 PM UTC-6, Philip Durbin wrote:
Thanks! I grabbed a screenshot (attached) from that link so we can study it later. It's so close to passing!

Unfortunately, I'm not very familiar with Dataverse's harvesting feature or the code behind it. I look at quick look at this class but I'm not sure what's going on: https://github.com/IQSS/dataverse/blob/v4.18.1/src/main/java/edu/harvard/iq/dataverse/harvest/server/xoai/XlistRecordsHandler.java

At this point, I'd suggest emailing sup...@dataverse.org to create a ticket so that someone with more familiarity with harvesting can take a look. Then we can summarize for the list once we get to the bottom of it.

My only other thought at the moment is that perhaps some other installations of Dataverse can try running https://www.openarchives.org/Register/ValidateSite to see if it passes or fails.

Sorry I couldn't help more.

Phil

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Durand, Gustavo

unread,
Nov 22, 2019, 4:13:24 PM11/22/19
to dataverse...@googlegroups.com
Hi Nick,

Thanks for posting this.

A little background: 
When we first added harvesting (back) to Dataverse 4, we saw some failures on validation of our own production and over time fixed them. Whether the issues you are seeing are due to differences in your metadata or a new bug is something we'll have to investigate. So thanks for opening that RT ticket (we'll likely open a github issue, so we can track progress on it).

(At the same time, hopefully we can look at addressing Sheery's issue https://github.com/IQSS/dataverse/issues/3619 as well)

Do note, that while the validator does find spec compliance issues (and that we want to fix them), in practice, most clients should be able to successfully harvest from you.

Thanks,
Gustavo

On Fri, Nov 22, 2019 at 12:41 PM Nicholas Woodward <woodward.n...@gmail.com> wrote:
Thanks for taking a look, Phil. I posted the same message to sup...@dataverse.org to create a ticket. We're happy to help troubleshoot the issue in any way.

Nick



On Thursday, November 21, 2019 at 3:34:40 PM UTC-6, Philip Durbin wrote:

Nicholas Woodward

unread,
Dec 2, 2019, 2:38:40 PM12/2/19
to Dataverse Users Community
Hi Gustavo,
Thanks for the information. Yeah, the validation would be nice, but as you said, folks can harvest all of the sets/records just fine, so we're not too worried about it now. I'll look out for the github issue and keep an eye on it.

Thanks,
Nick


On Friday, November 22, 2019 at 3:13:24 PM UTC-6, Gustavo Durand wrote:
Hi Nick,

Thanks for posting this.

A little background: 
When we first added harvesting (back) to Dataverse 4, we saw some failures on validation of our own production and over time fixed them. Whether the issues you are seeing are due to differences in your metadata or a new bug is something we'll have to investigate. So thanks for opening that RT ticket (we'll likely open a github issue, so we can track progress on it).

(At the same time, hopefully we can look at addressing Sheery's issue https://github.com/IQSS/dataverse/issues/3619 as well)

Do note, that while the validator does find spec compliance issues (and that we want to fix them), in practice, most clients should be able to successfully harvest from you.

Thanks,
Gustavo

On Fri, Nov 22, 2019 at 12:41 PM Nicholas Woodward <woodward....@gmail.com> wrote:
Thanks for taking a look, Phil. I posted the same message to sup...@dataverse.org to create a ticket. We're happy to help troubleshoot the issue in any way.

Nick



On Thursday, November 21, 2019 at 3:34:40 PM UTC-6, Philip Durbin wrote:
Thanks! I grabbed a screenshot (attached) from that link so we can study it later. It's so close to passing!

Unfortunately, I'm not very familiar with Dataverse's harvesting feature or the code behind it. I look at quick look at this class but I'm not sure what's going on: https://github.com/IQSS/dataverse/blob/v4.18.1/src/main/java/edu/harvard/iq/dataverse/harvest/server/xoai/XlistRecordsHandler.java

At this point, I'd suggest emailing sup...@dataverse.org to create a ticket so that someone with more familiarity with harvesting can take a look. Then we can summarize for the list once we get to the bottom of it.

My only other thought at the moment is that perhaps some other installations of Dataverse can try running https://www.openarchives.org/Register/ValidateSite to see if it passes or fails.

Sorry I couldn't help more.

Phil

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Sherry Lake

unread,
Dec 2, 2019, 2:56:32 PM12/2/19
to Dataverse Users Community
I changed our ":SystemEmail" to only have the email address, took out the mailbox display name. But our OAI-PMH identity response still has the original ":SystemEmail".

Do I need to disable/enable our Harvesting Server? 

Clients are able to harvest from us. I was just doing this exercise to help with the original problem/question.


--
Sherry


On Friday, November 22, 2019 at 11:26:12 AM UTC-5, Philip Durbin wrote:
Yeah, unfortunately, that :SystemEmail problem is a known issue. Please see "adminEmail field invalid in OAI-PMH Identify response" at https://github.com/IQSS/dataverse/issues/3619

Please do report back after you try again!

On Fri, Nov 22, 2019 at 11:03 AM Sherry Lake <shla...@gmail.com> wrote:
I tried running the Validator (via the ValidateSite URL below) and got an error right away with my "adminEmail".

FAIL adminEmail 'Libra Data Support ' looks bogus. This must be corrected to a valid email address before tests can continue
FAIL ABORT: looks like bogus adminEmail, aborting.

Here's what our Dataverse db has:
":SystemEmail":"Libra Data Support <li...@virginia.edu>"

Which works, but obviously breaks the OAI-PMH validator.

FYI: the Dataverse Config guide has an example with the address "text", but the validator doesn't seem to like the text.

I'll change my system Email and try again.

--
Sherry




On Thu, Nov 21, 2019 at 4:34 PM Philip Durbin <philip...@harvard.edu> wrote:
Thanks! I grabbed a screenshot (attached) from that link so we can study it later. It's so close to passing!

Unfortunately, I'm not very familiar with Dataverse's harvesting feature or the code behind it. I look at quick look at this class but I'm not sure what's going on: https://github.com/IQSS/dataverse/blob/v4.18.1/src/main/java/edu/harvard/iq/dataverse/harvest/server/xoai/XlistRecordsHandler.java

At this point, I'd suggest emailing sup...@dataverse.org to create a ticket so that someone with more familiarity with harvesting can take a look. Then we can summarize for the list once we get to the bottom of it.

My only other thought at the moment is that perhaps some other installations of Dataverse can try running https://www.openarchives.org/Register/ValidateSite to see if it passes or fails.

Sorry I couldn't help more.

Phil

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Philip Durbin

unread,
Dec 2, 2019, 3:17:50 PM12/2/19
to dataverse...@googlegroups.com
Hmm, I remember something about there also being a JVM option for system email: https://github.com/IQSS/dataverse/issues/4210#issuecomment-338686521

But I'm not sure if it's necessary to change that too? And maybe you need to restart Glassfish?
 

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/012ef218-f4ee-47fb-a9e5-f8e3fe290ce8%40googlegroups.com.

Sherry Lake

unread,
Dec 2, 2019, 3:51:04 PM12/2/19
to dataverse...@googlegroups.com
No sys-admin email in JVM, but restarting Glassfish updated the email for OAI-PMH endpoint.

Now to try verification again.

Reply all
Reply to author
Forward
0 new messages