Problems creating query for harvesting sets

87 views
Skip to first unread message

Laura Huisintveld

unread,
Jan 13, 2023, 4:26:31 AM1/13/23
to Dataverse Users Community
Dear all,

I am trying to create harvesting sets for our federated (institutional) Dataverse instance (DataverseNL). Each participating institution has its own dataverse with subdataverses. 
I would like to create an OAI-PMH set for each institution. 

You can use as query such as (parentId:xxx OR parentId:xx), but if you have a top-level dataverse with 150 subdataverses, that is quite undoable. You will need to look up 150 dataverse ID's. And if someone creates a new subdataverse, your query will become incomplete. 

I have tried to use (subtree: 'AvansHogeschool). But I am getting mixed results with this query. This example- subtree: 'AvansHogeschool - gives 0 results, but the dataverse contains 3 published datasets. (https://dataverse.nl/dataverse/AvansHogeschool).

Another example query (subtree: 'hr') results in 6 results, while the dataverse only contains 3 published datasets. 

Is there anyone with experience in creating OAI-PMH sets for an institutional dataverse? Is this a known issue? 

Thanks in advance,
Laura

Julian Gautier

unread,
Jan 13, 2023, 12:57:39 PM1/13/23
to Dataverse Users Community
Hi Laura,

The guides page for creating OAI Sets says:

"A good way to master the Dataverse Software search query language is to experiment with the Advanced Search page. We also recommend that you consult the Search API section of the API Guide."

And the Search API uses subtree, but I don't think subtree is used for creating queries for OAI Sets. Have you tried subtreePaths, instead? It's similar but the path is needed when it's a dataverse within a dataverse, e.g. subtreePaths:"/AAA/BBB/NNN"

Maybe related, when I try the search API where subtree is AvansHogeschool, https://dataverse.nl/api/search?q=data&subtree=AvansHogeschool, just 2 of the three published datasets in that dataverse are returned.

Hope that helps!
Julian

Julian Gautier

unread,
Jan 13, 2023, 1:01:39 PM1/13/23
to Dataverse Users Community
Just realized you wrote that using subtree: 'hr' results in 6 results instead of the three you expected, so maybe subtree should work when creating OAI sets, too?

Philipp Conzett

unread,
Jan 14, 2023, 6:03:09 AM1/14/23
to Dataverse Users Community
Hi Laura,

In DataverseNO, we have done this as described in our Admin Guide. Unfortunately, this part of the guide is only in Norwegian, but below, I have translated the relevant section into English. I hope this helps.

Best, Philipp

## How to define OAI-PMH harvesting sets for DataverseNO sub-collections

When a new collection is created, we also need to create an OAI-PMH harvesting set for the new collection. To create such a set, click on your username at the top right of DataverseNO, and select Dashboard. In the Harvesting Server box, click on Manage Server. Click on Add Set and fill in the fields. For example, for the NTNU collection (https://dataverse.no/dataverse/ntnu), this looks as follows:

Definition Query: subtreePaths:"/5622"
Name: ntnu
Description: Harvesting Set for NTNU Collection

The parentId can be obtained in this way: Log in to dataverse.no. Navigate to the relevant collection, click on Edit, and select Permissions. The parentId is now displayed at the very end of the URL (e.g. "id=5622" for the NTNU collection).

The address to be used for harvesting will then be as follows (e.g., for the NTNU collection):
https://dataverse.no/oai?verb=ListRecords&metadataPrefix=oai_dc&set=ntnu

Add this address under Metadatahausting|Metadata Harvesting on the About|About page at info.dataverse.no; cf. https://site.uit.no/dataverseno/about/#metadata-harvesting.

Philip Durbin

unread,
Jan 17, 2023, 10:04:06 AM1/17/23
to dataverse...@googlegroups.com
Thanks for pointing out subtreePaths, Philipp. We attempted to dcumented it in https://github.com/IQSS/dataverse/pull/8197 to close https://github.com/IQSS/dataverse.harvard.edu/issues/124

That is, please look for "subtreePaths" at https://guides.dataverse.org/en/5.12.1/admin/harvestserver.html

These docs can probably be further improved! Please feel free to create a pull request!

Thanks,

Phil

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/a5a4a927-8c23-451a-9678-2a6ad81b81ffn%40googlegroups.com.


--

Laura Huisintveld

unread,
Jan 20, 2023, 8:10:54 AM1/20/23
to Dataverse Users Community
Thanks very much Philipp! The query with subtreePaths worked! I now get the results I expected.
It would be helpful if Dataverse would flag the use of 'subtree' in the query as invalid, as it does give results, but not a correct result. 
For other queries I tried I sometimes got a warning, but not with 'subtree'. 

Laura


Op dinsdag 17 januari 2023 om 16:04:06 UTC+1 schreef Philip Durbin:

Philip Durbin

unread,
Jan 20, 2023, 9:14:50 AM1/20/23
to dataverse...@googlegroups.com
Good idea. Laura, if you'd like to suggest a doc change, you can log in and click the pencil icon here to make a pull request: https://github.com/IQSS/dataverse/blob/develop/doc/sphinx-guides/source/admin/harvestserver.rst

This falls under "quick fix" so this might be helpful: https://guides.dataverse.org/en/5.12.1/developers/documentation.html#quick-fix

o.be...@fz-juelich.de

unread,
Jan 23, 2023, 11:31:12 AM1/23/23
to Dataverse Users Community
Hello there, fellow Dataversians!

Would it make sense to open up a feature request to make all collections available via the set spec?

> a setSpec -- a colon [:] separated list indicating the path from the root of the set hierarchy to the respective node. Each element in the list is a string consisting of any valid URI unreserved characters, which must not contain any colons [:]. Since a setSpec forms a unique identifier for the set within the repository, it must be unique for each set. Flat set organizations have only sets with setSpec that do not contain any colons [:].

Meaning, we could create something like a set hierarchy starting with `collections:` being the root Dataverse collection and digging your path down the tree by separating with ":".
Would that be helpful?

Best,
Oliver

Philipp Conzett

unread,
Jan 24, 2023, 1:57:45 AM1/24/23
to Dataverse Users Community

Hi Oliver, all,

That sounds like a good idea to me. Would this also allow you to specify a harvesting set for sub-collection A excluding the datasets within sub-sub-collection B within that sub-collection?

Best,
Philipp

Bertuch, Oliver

unread,
Jan 24, 2023, 2:27:16 AM1/24/23
to dataverse...@googlegroups.com
Sounds reasonable to me!

Expressing the set with such a spec meaning results comprise only from datasets of the specified node (collection), no children. Include linked maybe.

If you want to go for a full subtree in a set, create a set as usual using the search function.

Does this cover all your needs?

-------- Ursprüngliche Nachricht --------
Von: Philipp Conzett <uit.p...@gmail.com>
Datum: 24.01.23 07:57 (GMT+01:00)
An: Dataverse Users Community <dataverse...@googlegroups.com>
Betreff: Re: [Dataverse-Users] Re: Problems creating query for harvesting sets

You received this message because you are subscribed to a topic in the Google Groups "Dataverse Users Community" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dataverse-community/cCLda28RpX4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/24d24c2a-6ac9-4c49-a7fd-daad29d3925an%40googlegroups.com.


------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Dr. Ir. Pieter Jansens,
Prof. Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

Philipp Conzett

unread,
Jan 24, 2023, 2:52:39 AM1/24/23
to Dataverse Users Community

Yes, this should cover our current use cases at DataverseNO. Thanks!
Best, Philipp

James Myers

unread,
Jan 24, 2023, 11:26:56 AM1/24/23
to dataverse...@googlegroups.com

HI all,

As you hopefully know, we’ve been running 2 meetings for the bi-weekly Dataverse Community Calls, with the first happening at 2 AM UTC. The last few 2 AM UTC calls have had 1 or no attendees (aside from me). Most people attend the 3 PM UTC call.

 

I wanted to check in with the community and see if the 2 AM UTC call is still valuable and/or whether we should drop it, have it less frequently, do more to advertise it/send a slack reminder as Phil does for the 3 PM call, etc.

 

Please either reply here or let me know on Slack, etc. what would work best. FWIW: I’m happy to show up (9PM or 10PM my time depending on daylight savings) if/when it’s useful, but would also like to know when I’ll have a free evening.

 

Thanks!

-- Jim

 

o.be...@fz-juelich.de

unread,
Jan 30, 2023, 1:46:48 AM1/30/23
to Dataverse Users Community

Yuyun W

unread,
Feb 6, 2023, 8:20:39 PM2/6/23
to Dataverse Users Community
Hi Jim, 
I still find it valuable and that's the only opportunity to connect with other installations. 
I would like to suggest that we reduce the frequency of Community Call #1, though. Perhaps we can try once a month first, instead of fortnightly? 

Thanks much for hosting this,
yuyun
Reply all
Reply to author
Forward
0 new messages