OAI Support Question

77 views
Skip to first unread message

Amber Leahey

unread,
Oct 19, 2016, 11:54:00 AM10/19/16
to Dataverse Users Community
Hi Dataverse, 

I'm loving the new Dashboard for harvesting, it is really easy to use! We figured out pretty quickly that you have to wait 1-day for the indexing (?) to occur in order to get the OAI set endpoints to work. We are replacing all of our OAI sets since the upgrade and testing support for different OAI actions now. The one that is probably most useful for our community is "ListRecords" but it doesn't seem to work, unfortunately. 

Here is a summary of our testing to date:
All of these work:

Philip Durbin

unread,
Oct 19, 2016, 12:04:14 PM10/19/16
to dataverse...@googlegroups.com
Hi Amber, I'm out of my depth here with regard to OAI-PHM and harvesting. Can you please email sup...@dataverse.org ? Once we have an answer we can summarize for the list. (Others, please remind us if we forget!) Obviously, if anyone on the list just knows the answer, please jump in!

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/e6aca70a-51f8-4450-9744-a554f8edbf54%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Philip Durbin

unread,
Oct 19, 2016, 4:35:53 PM10/19/16
to dataverse...@googlegroups.com
Thanks for opening https://help.hmdc.harvard.edu/Ticket/Display.html?id=242659 , Amber. Again, once we know more, someone will summarize for this list.

On Wed, Oct 19, 2016 at 12:04 PM, Philip Durbin <philip...@harvard.edu> wrote:
Hi Amber, I'm out of my depth here with regard to OAI-PHM and harvesting. Can you please email sup...@dataverse.org ? Once we have an answer we can summarize for the list. (Others, please remind us if we forget!) Obviously, if anyone on the list just knows the answer, please jump in!
On Wed, Oct 19, 2016 at 11:54 AM, Amber Leahey <amber...@gmail.com> wrote:
Hi Dataverse, 

I'm loving the new Dashboard for harvesting, it is really easy to use! We figured out pretty quickly that you have to wait 1-day for the indexing (?) to occur in order to get the OAI set endpoints to work. We are replacing all of our OAI sets since the upgrade and testing support for different OAI actions now. The one that is probably most useful for our community is "ListRecords" but it doesn't seem to work, unfortunately. 

Here is a summary of our testing to date:
All of these work:

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

danny...@g.harvard.edu

unread,
Oct 21, 2016, 12:05:02 PM10/21/16
to Dataverse Users Community, philip...@harvard.edu
Amber - thanks for the comment how easy it is to use Harvesting - I've shared this feedback with the team! :)

Kevin wrote a response to your questions earlier today and I wanted to include it here for the community:

"With respect to ListRecords, this is a known limitation of our initial OAI support. It is being addressed in the next release, 4.6, in this ticket: https://github.com/IQSS/dataverse/issues/3307.

Regarding your second question, how to create a set that includes all datasets, you can search on the persistent id namespace used by your instance. We have two namespaces at our site and use this syntax: dsPersistentId:"hdl:1902.1" OR dsPersistentId:"doi:10.7910" . Please be aware that harvesting from a Dataverse 4 instance without specifying a set will also retrieve all available content."

Cheers,

Danny

On Wednesday, October 19, 2016 at 4:35:53 PM UTC-4, Philip Durbin wrote:
Thanks for opening https://help.hmdc.harvard.edu/Ticket/Display.html?id=242659 , Amber. Again, once we know more, someone will summarize for this list.
On Wed, Oct 19, 2016 at 12:04 PM, Philip Durbin <philip...@harvard.edu> wrote:
Hi Amber, I'm out of my depth here with regard to OAI-PHM and harvesting. Can you please email sup...@dataverse.org ? Once we have an answer we can summarize for the list. (Others, please remind us if we forget!) Obviously, if anyone on the list just knows the answer, please jump in!
On Wed, Oct 19, 2016 at 11:54 AM, Amber Leahey <amber...@gmail.com> wrote:
Hi Dataverse, 

I'm loving the new Dashboard for harvesting, it is really easy to use! We figured out pretty quickly that you have to wait 1-day for the indexing (?) to occur in order to get the OAI set endpoints to work. We are replacing all of our OAI sets since the upgrade and testing support for different OAI actions now. The one that is probably most useful for our community is "ListRecords" but it doesn't seem to work, unfortunately. 

Here is a summary of our testing to date:
All of these work:

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

flavio castro

unread,
Nov 13, 2019, 8:44:11 PM11/13/19
to Dataverse Users Community

Good night Phil.
We are having problems with harvest, we are in 
version 4.5.1, but soon we will be pissed. But I am not able to make the configures.
Command result is.

curl -s -H "X-Dataverse-key: $API_TOKEN" http://localhost:8080/api/harvest/clients


{"status":"OK","data":{"harvestingClients":[{"nickName":"dsPersistentId","dataverseAlias":"unifesp","type":"oai","harvestUrl":"https://repositoriodedados.unifesp.br/oai","archiveUrl":"https://repositoriodedados.unifesp.br","archiveDescription":"This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.","metadataFormat":"oai_ddi","set":"dsPersistentId","schedule":"none","status":"inActive","lastHarvest":"Wed Nov 13 22:26:14 BRT 2019","lastResult":"SUCCESS","lastSuccessful":"Wed Nov 13 22:26:14 BRT 2019","lastNonEmpty":"N/A","lastDatasetsHarvested":"N/A","lastDatasetsDeleted":"N/A","lastDatasetsFailed":"N/A"}]}}[root@repositoriodedados fcsousa]# 






Captura de tela 2019-11-13 22.34.10.png

Danny Brooke

unread,
Nov 13, 2019, 10:14:12 PM11/13/19
to Dataverse Users Community
Hi Flavio,

Someone may have an answer here, but I wanted to mention that we've done quite a bit of work in the harvesting code over the last three years since 4.5.1 was released. I'd strongly suggest upgrading to take advantage of the additional stability and new features related to harvesting.

- Danny

flavio castro

unread,
Nov 14, 2019, 5:19:53 AM11/14/19
to dataverse...@googlegroups.com
Thanks Danny for your help. In the case of my institution, we still don't have an agreement to acquire DOI's. Is there a way to assign DOI's in the current version, Even in the fake DOI version?
I really intend to update as soon as possible. I need to resolve this issue first.
Att Flavio Castro.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/9cccc4a2-a2db-4b42-9d16-aac495be8c14%40googlegroups.com.


--
Flavio Castro de Sousa
Técnico em Tecnologia da Informação
Superintendência de Tecnologia da Informação
UNIFESP - Reitoria 
Rua Sena Madureira, 1500 – 4º andar
Voip  8581

Philip Durbin

unread,
Nov 14, 2019, 7:07:43 AM11/14/19
to dataverse...@googlegroups.com
Hi Flavio,

According to the code* there should be a log of harvesting activity next to server.log and the file should start with "harvest_".

Can you find these logs?

I believe the harvesting logs should be somewhere like this:

/usr/local/glassfish4/glassfish/domains/domain1/logs/harvest_

My hope is that more details about the failures will be in the harvesting logs. (Your screenshot showed two failures.)

Thanks,

Phil


p.s. If anyone reading this can confirm that harvesting logs like this exist, please advise. Also, please feel free to create an issue here to document these logs: https://github.com/IQSS/dataverse/issues

flavio castro

unread,
Nov 15, 2019, 3:48:56 PM11/15/19
to Dataverse Users Community
Hi Phil , this is my log files.
=================

server  server.log  server.log_2019-11-11T21-16-54

[root@repositoriodedados logs]# ls

harvest_teste_cliente2019-11-15T17-36-09.log  server  server.log  server.log_2019-11-11T21-16-54

[root@repositoriodedados logs]# vim harvest_teste_cliente2019-11-15T17-36-09.log 

[root@repositoriodedados logs]# clear




































[root@repositoriodedados logs]# ls

harvest_teste_cliente2019-11-15T17-36-09.log  harvest_teste_cliente2019-11-15T17-41-16.log  server  server.log  server.log_2019-11-11T21-16-54

[root@repositoriodedados logs]# vim harvest_teste_cliente2019-11-15T17-41-16.log 






































</record>

<record>

  <date>2019-11-15T17:41:16</date>

  <millis>1573850476424</millis>

  <sequence>10297</sequence>

  <logger>edu.harvard.iq.dataverse.harvest.client.HarvesterServiceBean.unifesp2019-11-15T17-41-16</logger>

  <level>INFO</level>

  <class>edu.harvard.iq.dataverse.harvest.client.HarvesterServiceBean</class>

  <method>harvestOAI</method>

  <thread>130</thread>

  <message>processing identifier: doi:10.5072/FK2/VQ1UZI</message>

</record>

<record>

  <date>2019-11-15T17:41:16</date>

  <millis>1573850476767</millis>

  <sequence>10300</sequence>

  <logger>edu.harvard.iq.dataverse.harvest.client.HarvesterServiceBean.unifesp2019-11-15T17-41-16</logger>

  <level>SEVERE</level>

  <class>edu.harvard.iq.dataverse.harvest.client.HarvesterServiceBean</class>

  <method>logGetRecordException</method>

  <thread>130</thread>

  <message>Exception processing getRecord(), oaiUrl=https://repositoriodedados.unifesp.br/oai,identifier=doi:10.5072/FK2/VQ1UZI edu.harvard.iq.dataverse.api.imports.ImportException The dataset with the global id doi:10.5072/FK2/VQ1UZI already exists, in the dataverse crbu, skipping.</message>

</record>

<record>

  <date>2019-11-15T17:41:16</date>

  <millis>1573850476767</millis>

  <sequence>10301</sequence>

  <logger>edu.harvard.iq.dataverse.harvest.client.HarvesterServiceBean.unifesp2019-11-15T17-41-16</logger>

  <level>INFO</level>

  <class>edu.harvard.iq.dataverse.harvest.client.HarvesterServiceBean</class>

  <method>harvestOAI</method>

  <thread>130</thread>

  <message>Total content processed in this batch so far: 0</message>

</record>

<record>

  <date>2019-11-15T17:41:16</date>

  <millis>1573850476767</millis>

  <sequence>10302</sequence>

                                                                                                                                          14,1          17%

==================
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.


--
Flavio Castro de Sousa
Técnico em Tecnologia da Informação
Superintendência de Tecnologia da Informação
UNIFESP - Reitoria 
Rua Sena Madureira, 1500 – 4º andar
Voip  8581

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Philip Durbin

unread,
Nov 15, 2019, 4:27:30 PM11/15/19
to dataverse...@googlegroups.com
Thanks! I'm not an expert in troubleshooting harvesting problems (I would suggest emailing sup...@dataverse.org at this point) but I do see "The dataset with the global id doi:10.5072/FK2/VQ1UZI already exists, in the dataverse crbu, skipping." Perhaps that's why you had two failures? I'm not sure.

On a related note, Don Sizemore made a pull request today (thanks!) to offer suggestions for troubleshooting harvesting problems: https://github.com/IQSS/dataverse/pull/6376

One of his primary suggestions is to upgrade to Dataverse 4.6 or higher.

I hope this helps,

Phil


To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.


--
Flavio Castro de Sousa
Técnico em Tecnologia da Informação
Superintendência de Tecnologia da Informação
UNIFESP - Reitoria 
Rua Sena Madureira, 1500 – 4º andar
Voip  8581

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/3fb439de-ef71-44f6-8859-aa6810aeaf05%40googlegroups.com.

Don Sizemore

unread,
Nov 15, 2019, 5:29:42 PM11/15/19
to dataverse...@googlegroups.com
Good evening,

10.5072/FK2 DOI was the "test" authority/shoulder range from the former EZID DOI provider, and I believe is what a Dataverse instance will generate when set to use the "FAKE" DOI provider included for testing purposes.
If you're harvesting from a Dataverse instance using 10.5072/FK2 you may expect collisions as no DOI generated in this range is permanent.

Depending on your use case Jonathan Crabtree <jonathan...@unc.edu> with the GDCC may be able to offer you a test account (though you wouldn't want to use a GDCC test account in production, either).
Someone else please correct me if I'm wrong — I usually am!

I hope this helps?
Donald

flavio castro

unread,
Nov 16, 2019, 11:13:39 AM11/16/19
to dataverse...@googlegroups.com
Hi Phil .

I upgraded to version 4.6.0.
How could you test, to confirm that the harvest is running with the correct metadata? Who could help me?
My url is https://repositoriodedados.unifesp.br/oai
Thanks.



Philip Durbin

unread,
Nov 18, 2019, 10:41:30 AM11/18/19
to dataverse...@googlegroups.com
Hi Flavio,

I'm sorry but I'm not sure what you mean. It looks like your harvesting server is working. I can see content here:


If you want me to try harvesting from your server, I can but you'll have to let me know the name of a set like the ones at https://docs.google.com/spreadsheets/d/12cxymvXCqP_kCsLKXQD32go79HBWZ1vU_tdG4kvP5S8/edit#gid=0

(Once you upgrade to 4.9.3 or higher there should be a default harvesting set: https://github.com/IQSS/dataverse/issues/4659 )

I hope this helps. I saw you in IRC earlier today but missed you. If you want, we can try to talk more there if it's easier: http://chat.dataverse.org

Thanks,

Phil


Reply all
Reply to author
Forward
0 new messages