Workbench Isl7 extraction - how to narrow scope

25 views
Skip to first unread message

gi...@shaw.ca

unread,
Mar 7, 2023, 2:54:04 PM3/7/23
to islandora
Hi folks,
I am trying to extract data out of an Islandora 7 instance.
Specifically, I only want to extract content that is a member of the 'coutlines' collection.
My problem is that the filters are likely set incorrectly. I keep extracting the full solr database content.
In green: desired data. In red: undesired data.
Can somebody see what I am doing wrong ?
Thanks.

Query.png

Donald Moses

unread,
Mar 7, 2023, 3:06:39 PM3/7/23
to islandora
Hello:
Are you using Solr to export the data?
In your Solr query limit your results to the namespace by including
PID:coutlines*
as one of the criteria.
Hope that helps.
Donald

Mark Jordan

unread,
Mar 7, 2023, 3:15:12 PM3/7/23
to islandora

Also, the 'collection' configuration setting should be the collection's PID, not its namespace.


Mark




From: isla...@googlegroups.com <isla...@googlegroups.com> on behalf of Donald Moses <dmo...@upei.ca>
Sent: Tuesday, March 7, 2023 12:06 PM
To: islandora
Subject: [islandora] Re: Workbench Isl7 extraction - how to narrow scope
 
--
Learn more about Islandora in general at islandora.ca and join the community at https://github.com/Islandora/islandora-community/wiki
---
You received this message because you are subscribed to the Google Groups "islandora" group.
To unsubscribe from this group and stop receiving emails from it, send an email to islandora+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/islandora/4d40b57f-55ac-47ee-a7f7-3c6e6be1d3fan%40googlegroups.com.

gi...@shaw.ca

unread,
Mar 7, 2023, 4:44:15 PM3/7/23
to islandora
Thanks folks.

Changing the solr request
from
http://siteurl:8080/solr/select?q=PID:**&wt=csv&rows=1000000&fl=PID,RELS_EXT_hasModel_uri_s,RELS_EXT_isMemberOfCollection_uri_ms,RELS_EXT_isMemberOf_uri_ms,mods_originInfo_encoding_iso8601_dateIssued_mdt,file

to
http://siteurl:8080/solr/select?q=PID:coutlines*&wt=csv&rows=1000000&fl=PID,RELS_EXT_hasModel_uri_s,RELS_EXT_isMemberOfCollection_uri_ms,RELS_EXT_isMemberOf_uri_ms,mods_originInfo_encoding_iso8601_dateIssued_mdt,file

.......definitely narrowed it down.

So that is good.

But then
.....................
csv_output_path: 'test_output.csv'
obj_directory: '/tmp/objects'
log_file_path: 'testlog.log'
id_field: 'ID'
collection: 'coutlines:nursing'
id_start_number: 1
debug: True

Did not narrow it down further. Here is a solr view of the collection document:
Untitled.png

Note that it has about 60 other sub-collections which I also want to import...

What did I miss ?

Thanks

Mark Jordan

unread,
Mar 7, 2023, 4:58:44 PM3/7/23
to islandora

The ':' in the PID can interfere with Solr queries. Can you try:


collection: 'coutlines\:nursing'


and report back?

Re. subcollections, can you replace the "collection" filter with:

ancestor_ms: ''coutlines\:nursing'

Mark


From: isla...@googlegroups.com <isla...@googlegroups.com> on behalf of gi...@shaw.ca <gi...@shaw.ca>
Sent: Tuesday, March 7, 2023 1:44 PM
To: islandora
Subject: Re: [islandora] Re: Workbench Isl7 extraction - how to narrow scope
 

Jared Whiklo

unread,
Mar 7, 2023, 5:09:34 PM3/7/23
to isla...@googlegroups.com
You can also (in some contexts) URL encode the colon as %3A, i.e.
collection: 'coutlines%3Anursing'.


On 2023-03-07 3:58 p.m., Mark Jordan wrote:
>
> The ':' in the PID can interfere with Solr queries. Can you try:
>
>
> collection: 'coutlines\:nursing'
>
>
> and report back?
>
> Re. subcollections, can you replace the "collection" filter with:
>
> ancestor_ms: ''coutlines\:nursing'
>
> Mark
>
> ------------------------------------------------------------------------
> *From:* isla...@googlegroups.com <isla...@googlegroups.com> on
> behalf of gi...@shaw.ca <gi...@shaw.ca>
> *Sent:* Tuesday, March 7, 2023 1:44 PM
> *To:* islandora
> *Subject:* Re: [islandora] Re: Workbench Isl7 extraction - how to
> narrow scope
> Thanks folks.
>
> Changing the solr request
> from
> http://siteurl:8080/solr/select?q=PID:****&wt=csv&rows=1000000&fl=PID,RELS_EXT_hasModel_uri_s,RELS_EXT_isMemberOfCollection_uri_ms,RELS_EXT_isMemberOf_uri_ms,mods_originInfo_encoding_iso8601_dateIssued_mdt,file
>
> to
> http://siteurl:8080/solr/select?q=PID:*coutlines**&wt=csv&rows=1000000&fl=PID,RELS_EXT_hasModel_uri_s,RELS_EXT_isMemberOfCollection_uri_ms,RELS_EXT_isMemberOf_uri_ms,mods_originInfo_encoding_iso8601_dateIssued_mdt,file
>
> .......definitely narrowed it down.
>
> So that is good.
>
> But then
> .....................
> csv_output_path: 'test_output.csv'
> obj_directory: '/tmp/objects'
> log_file_path: 'testlog.log'
> id_field: 'ID'
> *collection: 'coutlines:nursing'*
> id_start_number: 1
> debug: True
>
> Did not narrow it down further. Here is a solr view of the collection
> document:
> Untitled.png
>
> Note that it has about 60 other sub-collections which I also want to
> import...
>
> What did I miss ?
>
> Thanks
> On Tuesday, March 7, 2023 at 12:15:12 PM UTC-8 Mark Jordan wrote:
>
> Also, the 'collection' configuration setting should be the
> collection's PID, not its namespace.
>
>
> Mark
>
>
>
> ------------------------------------------------------------------------
> *From:* isla...@googlegroups.com <isla...@googlegroups.com> on
> behalf of Donald Moses <dmo...@upei.ca>
> *Sent:* Tuesday, March 7, 2023 12:06 PM
> *To:* islandora
> *Subject:* [islandora] Re: Workbench Isl7 extraction - how to
> narrow scope
> Hello:
> Are you using Solr to export the data?
> In your Solr query limit your results to the namespace by including
> PID:coutlines*
> as one of the criteria.
> Hope that helps.
> Donald
> On Tuesday, March 7, 2023 at 3:54:04 PM UTC-4 gi...@shaw.ca wrote:
>
> Hi folks,
> I am trying to extract data out of an Islandora 7 instance.
> Specifically, I only want to extract content that is a member
> of the 'coutlines' collection.
> My problem is that the filters are likely set incorrectly. I
> keep extracting the full solr database content.
> In green: desired data. In red: undesired data.
> Can somebody see what I am doing wrong ?
> Thanks.
>
> Query.png
>
> --
> Learn more about Islandora in general at islandora.ca
> <http://islandora.ca> and join the community at
> https://github.com/Islandora/islandora-community/wiki
> ---
> You received this message because you are subscribed to the Google
> Groups "islandora" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to islandora+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/islandora/4d40b57f-55ac-47ee-a7f7-3c6e6be1d3fan%40googlegroups.com
> <https://groups.google.com/d/msgid/islandora/4d40b57f-55ac-47ee-a7f7-3c6e6be1d3fan%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
> --
> Learn more about Islandora in general at islandora.ca and join the
> community at https://github.com/Islandora/islandora-community/wiki
> ---
> You received this message because you are subscribed to the Google
> Groups "islandora" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to islandora+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/islandora/92d7baa0-4fba-4ae3-b2db-79910c7bc43fn%40googlegroups.com
> <https://groups.google.com/d/msgid/islandora/92d7baa0-4fba-4ae3-b2db-79910c7bc43fn%40googlegroups.com?utm_medium=email&utm_source=footer>.
> --
> Learn more about Islandora in general at islandora.ca and join the
> community at https://github.com/Islandora/islandora-community/wiki
> ---
> You received this message because you are subscribed to the Google
> Groups "islandora" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to islandora+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/islandora/b409ca1e3d534b328f5054f7dad5579b%40sfu.ca
> <https://groups.google.com/d/msgid/islandora/b409ca1e3d534b328f5054f7dad5579b%40sfu.ca?utm_medium=email&utm_source=footer>.

--
Jared Whiklo
jwh...@gmail.com

OpenPGP_signature

gi...@shaw.ca

unread,
Mar 7, 2023, 5:47:14 PM3/7/23
to islandora
Neither worked. I am still picking up content from outside the specified scope...
Can those two lines be combined ?
Thanks

Mark Jordan

unread,
Mar 7, 2023, 5:52:59 PM3/7/23
to islandora

ancestor_ms is not in all Islandora 7 Solrs. Do you know if yours has it?




Sent: Tuesday, March 7, 2023 2:47 PM
Reply all
Reply to author
Forward
0 new messages