Collection Search not working on collections with Transcript datastream

88 views
Skip to first unread message

Tony Kurtz

unread,
Apr 26, 2019, 6:02:30 PM4/26/19
to islandora
Greetings,

We're encountering an odd behavior in the deployment of our Collection Search. It works beautifully for all targeted collections EXCEPT, for some reason, it yields zero results when searching within collections that have Transcript datastreams (not Oral History Module transcripts DSID, but Transcript DSIDs added to take advantage of the Islandora Transcript module).  For a long time we could not figure out why those few collections were coming up empty in Collection Search queries, but we finally determined that the common feature in all of them is the Transcript DSID. Has anyone encountered this, and are there any thoughts on how we might overcome it?  Many thanks!  

Tony Kurtz
Western Washington University

dp...@metro.org

unread,
Apr 29, 2019, 9:09:24 AM4/29/19
to islandora
Hi Tony,

I have no experience with the TRANSCRIPT DSID but there are some generic things you need to check.
My first guess is that GSEARCH is failing (silently but probably visible in the logs) to index anything that has that DS, means there are no Solr Docs for those objects and in consequence no search results.
You will see some conditionals that deal with Datastream indexing based on mimetype and DSID name. Probably in your case (probably) something in your datastream content is being handled incorrectly there, either the content itself, or its access (check mimetype) and that is making Gseach fail. A single failure in the chain avoids Gsearch sending the document to Solr o, possibly also, skipping the last part where the ancestors are indexed. Recently (2 years ago maybe) Gsearch/config allows for silently skipping issues. Like if you have something missing instead of exploding it will simply continue, so the best way of debugging this is looking at your fedoragsearch log (live, to a tail -f) while indexing an object. Any change on an object will trigger a reindex, so changing the label, e.g on the islandora side on one of those objects should give you a better sence of what is happening there.

Hope this helps
Diego Pino
Metro.org

Peter MacDonald

unread,
Apr 29, 2019, 11:07:06 AM4/29/19
to isla...@googlegroups.com
I ran into this issue a couple of years ago. Here is what fixed it for me.

When we started using both the Oral Histories module[1] and TRANSCRIPT module[2], I found that there was a problem in indexing the datastreams because they both use "TRANSCRIPT" as a DSID. [OH's is "application/xml" and the TRANSCRIPT modules is "text/plain". The "text/plain" datastream was not getting indexed, in my case.]

The problem turned out to be that the code of the Oral Histories module looks for and consumes the XML "TRANSCRIPT" DSID before Solr ever gets around to applying the text_to_solr.xslt transform -- so Solr never sees the "text/plain" TRANSCRIPT datastream.

Since the Oral Histories module looks for any DSID labeled "TRANSCRIPT" without regard to mimetype, I hacked the module's code so it looks only for a TRANSCRIPT DSID where the data is "application/xml". This freed up the "text/plain" TRANSCRIPT" DSID to go on to be indexed by text_to_solr.xslt.

After doing this, Solr now stores the TRANSCRIPT "text/plain" in a "TRANSCRIPT_t" datastream.

[Sorry, but I don't have access to the exact code I used because I'm now retired and no longer have access to the server.]


Peter MacDonald

--
For more information about using this group, please read our Listserv Guidelines: http://islandora.ca/content/welcome-islandora-listserv
---
You received this message because you are subscribed to the Google Groups "islandora" group.
To unsubscribe from this group and stop receiving emails from it, send an email to islandora+...@googlegroups.com.
Visit this group at https://groups.google.com/group/islandora.
To view this discussion on the web visit https://groups.google.com/d/msgid/islandora/54f7e702-be8f-460d-9985-34dbe29e2232%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tony Kurtz

unread,
Jul 24, 2019, 4:08:10 PM7/24/19
to islandora
A very belated but wholehearted THANK YOU to Peter and Diego for their replies to my initial query. We were able to enable indexing of all TRANSCRIPT DSIDs by editing the or_transcript_solr.xslt file to eliminate ambiguity between the different possible TRANSCRIPT datastreams  by specifying MIMETYPE, much as Peter explained. Happy to share more detail with anyone who might be experiencing the same issue.

We are now experiencing another vexing problem with Collection Search: Collection Search is not indexing compound object constituents (children), so they are not being discovered in Collection Search. The best I can determine is that, since our child objects do not have a fedora relationships declaring them as members of a collection (isMemberOfCollection), they are not being included in the ancestors_ms field. If I add an isMemberOfCollection relationship to a child object's Rels-Ext, it becomes searchable in Collection Search, but that creates a situation in which the child is both part of its parent compound object AND is a separate object within the parent collection.

Am I on track with that? Does anyone have experience with this issue?

Our settings for Compound Objects are that we Hide child objects in RI results, but do not hide them in Solr results. We also are using the Default Legacy SPARQL as the compound member query setting.

Many thanks, again.
 
To unsubscribe from this group and stop receiving emails from it, send an email to isla...@googlegroups.com.

Tony Kurtz

unread,
Jul 25, 2019, 11:37:21 AM7/25/19
to islandora
FYI - I've edited the subject line of this post to reference the additional question I've added.

Jared Whiklo

unread,
Jul 25, 2019, 4:54:35 PM7/25/19
to isla...@googlegroups.com
Hey Tony,

First, as it is a DGI module, you might want to reach out to them directly.

We do use that module but not extensively and I modified our fork a
while ago to fit our needs so I am less familiar with the current iteration.

But from my experience with it, the problem is exactly what you have
already determined. The collection search returns items that match the
search term(s) and are part of the collection. Compound children are
part of the compound (which itself might be part of the collection) so
they fail the second test.

The simplest solution is to aggregate your child data into the compound
parent and it will be returned for all searches. But that might not fit
your use case, which is when reaching out to DGI would be in order.

cheers,
jared
> <https://groups.google.com/group/islandora>.
> <https://groups.google.com/d/msgid/islandora/54f7e702-be8f-460d-9985-34dbe29e2232%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> For more information about using this group, please read our Listserv
> Guidelines: http://islandora.ca/content/welcome-islandora-listserv
> ---
> You received this message because you are subscribed to the Google
> Groups "islandora" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to islandora+...@googlegroups.com
> <mailto:islandora+...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/islandora/d4a46cfa-c79c-4c80-b82c-5888543a1e75%40googlegroups.com
> <https://groups.google.com/d/msgid/islandora/d4a46cfa-c79c-4c80-b82c-5888543a1e75%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Jared Whiklo
Pronouns: he/him/his
jwh...@gmail.com
--------------------------------------------------
Quantum mechanics: The dreams stuff is made of.

signature.asc

Tony Kurtz

unread,
Jul 30, 2019, 3:05:43 PM7/30/19
to islandora
Thank you, Jared. We've decided to check in with DGI on this one. 
Reply all
Reply to author
Forward
0 new messages