Item not showing in search/browse

229 views
Skip to first unread message

Fitchett, Deborah

unread,
Aug 2, 2023, 6:30:18 PM8/2/23
to dspace-tech

Kia ora,

 

I’ve been doing some data tidying in DSpace 5.8 (xmlui) in preparation for an upcoming migration to 7.4 – mostly directly in the database. A few days later I was alerted to a record https://researcharchive.lincoln.ac.nz/handle/10182/16202 which isn’t showing up either by searching on the title, or in the title/author/keyword browse indexes. The item has Anonymous READ permissions (and anyway the search/browse still doesn’t work when I’m logged in as an Administrator) so I assumed this was because I’d been lazy and neglected to run a re-index.

 

So overnight we ran a job [dspace] /bin/dspace index-discovery -b expecting this would resolve the issue. But we’re still seeing the same problem.

 

Is there anything else that could be blocking it from being indexed?

Any other jobs we should run?

If I throw my hands up in despair and just go ahead with the migration, will that magically fix it?  (This is not actually my preference for various reasons, but some days a little magic would be nice!)

 

Deborah

––––––––––––––––––––––––––––––––––

Deborah Fitchett (she/her) MLIS, RLIANZA

Associate University Librarian, Digital Scholarship

 

––––––––––––––––––––––––––––––––––

Learning, Teaching and Library – Te Whare Pūrākau

PO Box 85064, Lincoln University

Lincoln 7647, Christchurch, New Zealand

+64 3 423 0358

deborah....@lincoln.ac.nz

ltl.lincoln.ac.nz

 

––––––––––––––––––––––––––––––––––

Lincoln University

Te Whare Wānaka o Aoraki

––––––––––––––––––––––––––––––––––




"The contents of this e-mail (including any attachments) may be confidential and/or subject to copyright. Any unauthorised use, distribution, or copying of the contents is expressly prohibited. If you have received this e-mail in error, please advise the sender by return e-mail or telephone and then delete this e-mail together with all attachments from your system."

DSpace Technical Support

unread,
Aug 4, 2023, 1:08:26 PM8/4/23
to DSpace Technical Support
Hi Deborah,

I'd recommend checking to see if there were any errors in indexing that item (e.g. in DSpace logs or Solr logs).  You also could try and trigger an index of *just that item* (`./dspace index-discovery -i [item-uuid]`) to see if that helps in any way, or perhaps gives a more specific errors.

Beyond that, if neither of those help, that'd imply to me that there must be some sort of permissions issue (or corrupt data? or missing/wrong metadata fields?) on the specific Item in question.  But it'd be hard to say for certain.

Tim

Fitchett, Deborah

unread,
Aug 7, 2023, 8:27:40 PM8/7/23
to DSpace Technical Support

Thanks very much, Tim!

 

I’ve checked permissions for item/bundles/bitstreams are all Anon READ. The metadata looks normal too including the Really Important fields like dc.type.

 

When I try index-discovery -i [itemid] I get "Unrecognized option: -i"

 

But the dSpace log from when we ran index-discovery -b shows:

 

2023-08-03 02:29:40,261 ERROR org.dspace.discovery.SolrServiceImpl @ Error while writing item to discovery index: 10182/16202 message:org.apache.tika.exception.TikaException: Failed to parse an email message

org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: org.apache.tika.exception.TikaException: Failed to parse an email message

     at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)

     at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)

     at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)

     at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)

     at org.dspace.discovery.SolrServiceImpl.writeDocument(SolrServiceImpl.java:748)

     at org.dspace.discovery.SolrServiceImpl.buildDocument(SolrServiceImpl.java:1429)

     at org.dspace.discovery.SolrServiceImpl.indexContent(SolrServiceImpl.java:230)

     at org.dspace.discovery.SolrServiceImpl.updateIndex(SolrServiceImpl.java:410)

     at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:370)

     at org.dspace.discovery.IndexClient.main(IndexClient.java:117)

     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

     at java.lang.reflect.Method.invoke(Method.java:498)

     at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)

     at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)

 

We found a Tika troubleshooting page at Troubleshooting Tika - TIKA - Apache Software Foundation so it looks like for some reason Tika thinks it’s supposed to be parsing an email message. This was utterly bewildering because the bitstream files are just regular PDFs: they have PDF file extensions, the format is marked as Adobe PDF in DSpace, and they open successfully as PDFs in the browser/Adobe Reader…

 

but then I looked at the text that had been extracted for the search index and found in each of the problem cases it begins eg:

 

Received: 22 June 2022 | Revised: 16 April 2023 | Accepted: 26 April 2023

 

This refers to when the journal first received the submitted article, but I guess Tika is interpreting the “Received:” as the start of an email header!

 

Fortunately we can see in our DSpace 7 dev environment this issue isn’t arising, so we’ll just ignore the issue until we can complete our upgrade.

 

Deborah

 

 

From: DSpace Technical Support <dspac...@googlegroups.com>
Sent: Saturday, August 5, 2023 5:08 AM
To: DSpace Technical Support <dspac...@googlegroups.com>
Subject: [dspace-tech] Re: Item not showing in search/browse

 

Caution: This email originated from outside our organisation. Do not click links or open attachments unless you recognize the sender and know the content is safe.

 

--
All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
---
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/a8cd5679-9b3b-40de-9012-c63fe5752842n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages