Exclude deaccessioned datasets from harvesting

21 views
Skip to first unread message

Philipp at UiT

unread,
Sep 28, 2018, 1:11:48 AM9/28/18
to Dataverse Users Community
We'd like to exclude deaccessioned datasets from harvesting. In the GUI search those can be identified using:
publicationStatus%3A%22Deaccessioned

But what is the search terms to be used when defining the OAI set? I have tried NOT dsVersion:deaccessioned, dsVersion:released and similar terms, but got only error messages.



Best,
Philipp

Philip Durbin

unread,
Sep 28, 2018, 6:49:32 AM9/28/18
to dataverse...@googlegroups.com
I would try a "!" to mean "not".

Here's a different example but if I add...

!fileAccess:"Restricted"

... as a facet (filter query) for files, all the non-restricted (public) files show up:


I got this idea from https://stackoverflow.com/questions/10688910/complex-solr-query-including-not-and-or but I'm sure there are other resources for Solr (the search engine Dataverse uses).

I hope this helps!

Phil



--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/3024fa0b-9503-4dcc-94cb-d5c2948fdaad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Screen Shot 2018-09-28 at 6.47.37 AM.png

Philipp at UiT

unread,
Sep 28, 2018, 7:23:04 AM9/28/18
to Dataverse Users Community
Thanks, Phil! The "!" sounds like a good idea, but I cannot figure out the name of the database field I have to use; cf. Guide on OAI sets. I have tried !dsVersion:"deaccessioned", !fileMetadata.datasetVersion:"deaccessioned", and a lot of simliar combinations. I guess I'm using the wrong database field and values, but I cannot figure out where the information about deaccessioning i stored; cf. this overview using SchemaSpy.

Philip Durbin

unread,
Sep 28, 2018, 7:42:15 AM9/28/18
to dataverse...@googlegroups.com
Ah, you're not searching fields in the database (PostgreSQL), you're searching fields in Solr. So you should look at https://github.com/IQSS/dataverse/blob/v4.9.2/conf/solr/7.3.0/schema.xml#L122

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Philipp at UiT

unread,
Sep 28, 2018, 10:23:48 AM9/28/18
to Dataverse Users Community
Hm... I have tried publicationStatus:Deaccessioned and similar queries, but this results only in empty sets, although we have 3 deaccessioned datasets. I also tried publicationStatus:Published, publicationStatus:published, but no results. What am I missing?


fredag 28. september 2018 13.42.15 UTC+2 skrev Philip Durbin følgende:
Ah, you're not searching fields in the database (PostgreSQL), you're searching fields in Solr. So you should look at https://github.com/IQSS/dataverse/blob/v4.9.2/conf/solr/7.3.0/schema.xml#L122

On Fri, Sep 28, 2018 at 7:23 AM Philipp at UiT <uit.p...@gmail.com> wrote:
Thanks, Phil! The "!" sounds like a good idea, but I cannot figure out the name of the database field I have to use; cf. Guide on OAI sets. I have tried !dsVersion:"deaccessioned", !fileMetadata.datasetVersion:"deaccessioned", and a lot of simliar combinations. I guess I'm using the wrong database field and values, but I cannot figure out where the information about deaccessioning i stored; cf. this overview using SchemaSpy.


fredag 28. september 2018 07.11.48 UTC+2 skrev Philipp at UiT følgende:
We'd like to exclude deaccessioned datasets from harvesting. In the GUI search those can be identified using:
publicationStatus%3A%22Deaccessioned

But what is the search terms to be used when defining the OAI set? I have tried NOT dsVersion:deaccessioned, dsVersion:released and similar terms, but got only error messages.



Best,
Philipp

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Philip Durbin

unread,
Sep 28, 2018, 10:34:35 AM9/28/18
to dataverse...@googlegroups.com
Hmm, I'm not sure. Can you please create a GitHub issue for this? Maybe deaccessioned datasets shouldn't be harvest-able anyway.

On Fri, Sep 28, 2018 at 10:23 AM Philipp at UiT <uit.p...@gmail.com> wrote:
Hm... I have tried publicationStatus:Deaccessioned and similar queries, but this results only in empty sets, although we have 3 deaccessioned datasets. I also tried publicationStatus:Published, publicationStatus:published, but no results. What am I missing?


fredag 28. september 2018 13.42.15 UTC+2 skrev Philip Durbin følgende:
Ah, you're not searching fields in the database (PostgreSQL), you're searching fields in Solr. So you should look at https://github.com/IQSS/dataverse/blob/v4.9.2/conf/solr/7.3.0/schema.xml#L122

On Fri, Sep 28, 2018 at 7:23 AM Philipp at UiT <uit.p...@gmail.com> wrote:
Thanks, Phil! The "!" sounds like a good idea, but I cannot figure out the name of the database field I have to use; cf. Guide on OAI sets. I have tried !dsVersion:"deaccessioned", !fileMetadata.datasetVersion:"deaccessioned", and a lot of simliar combinations. I guess I'm using the wrong database field and values, but I cannot figure out where the information about deaccessioning i stored; cf. this overview using SchemaSpy.


fredag 28. september 2018 07.11.48 UTC+2 skrev Philipp at UiT følgende:
We'd like to exclude deaccessioned datasets from harvesting. In the GUI search those can be identified using:
publicationStatus%3A%22Deaccessioned

But what is the search terms to be used when defining the OAI set? I have tried NOT dsVersion:deaccessioned, dsVersion:released and similar terms, but got only error messages.



Best,
Philipp

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Philipp at UiT

unread,
Sep 28, 2018, 11:36:18 AM9/28/18
to Dataverse Users Community
I have submitted a GitHub issue on this; cf. #5112. Thanks, Phil!


fredag 28. september 2018 16.34.35 UTC+2 skrev Philip Durbin følgende:
Hmm, I'm not sure. Can you please create a GitHub issue for this? Maybe deaccessioned datasets shouldn't be harvest-able anyway.

On Fri, Sep 28, 2018 at 10:23 AM Philipp at UiT <uit.p...@gmail.com> wrote:
Hm... I have tried publicationStatus:Deaccessioned and similar queries, but this results only in empty sets, although we have 3 deaccessioned datasets. I also tried publicationStatus:Published, publicationStatus:published, but no results. What am I missing?


fredag 28. september 2018 13.42.15 UTC+2 skrev Philip Durbin følgende:
Ah, you're not searching fields in the database (PostgreSQL), you're searching fields in Solr. So you should look at https://github.com/IQSS/dataverse/blob/v4.9.2/conf/solr/7.3.0/schema.xml#L122

On Fri, Sep 28, 2018 at 7:23 AM Philipp at UiT <uit.p...@gmail.com> wrote:
Thanks, Phil! The "!" sounds like a good idea, but I cannot figure out the name of the database field I have to use; cf. Guide on OAI sets. I have tried !dsVersion:"deaccessioned", !fileMetadata.datasetVersion:"deaccessioned", and a lot of simliar combinations. I guess I'm using the wrong database field and values, but I cannot figure out where the information about deaccessioning i stored; cf. this overview using SchemaSpy.


fredag 28. september 2018 07.11.48 UTC+2 skrev Philipp at UiT følgende:
We'd like to exclude deaccessioned datasets from harvesting. In the GUI search those can be identified using:
publicationStatus%3A%22Deaccessioned

But what is the search terms to be used when defining the OAI set? I have tried NOT dsVersion:deaccessioned, dsVersion:released and similar terms, but got only error messages.



Best,
Philipp

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages