Tika parsing error. Could not index full text.

52 views
Skip to first unread message

alfred mahlangu

unread,
Apr 29, 2024, 7:23:02 AM4/29/24
to DSpace Community
Dear All,

I am upgrading from DSpace-CRIS 5.8 to DSpace-CRIS 7.6

When I run the command below to re-index:
#/dspace/bin/dspace index-discovery -b

I get the error below, any suggestions on how can work around this?

The script has started
(Re)building index from scratch.
java.lang.RuntimeException: Tika parsing error. Could not index full text.
        at org.dspace.discovery.SolrServiceImpl.indexContent(SolrServiceImpl.java:179)
        at org.dspace.discovery.SolrServiceImpl.updateIndex(SolrServiceImpl.java:365)
        at org.dspace.discovery.SolrServiceImpl.updateIndex(SolrServiceImpl.java:352)
        at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:320)
        at org.dspace.discovery.IndexClient.internalRun(IndexClient.java:120)
        at org.dspace.scripts.DSpaceRunnable.run(DSpaceRunnable.java:154)
        at org.dspace.app.launcher.ScriptLauncher.executeScript(ScriptLauncher.java:174)
        at org.dspace.app.launcher.ScriptLauncher.handleScript(ScriptLauncher.java:151)
        at org.dspace.app.launcher.ScriptLauncher.handleScript(ScriptLauncher.java:125)
        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:100)
Caused by: java.io.IOException: Tika parsing error. Could not index full text.
        at org.dspace.discovery.indexobject.IndexFactoryImpl.writeDocument(IndexFactoryImpl.java:135)
        at org.dspace.discovery.indexobject.ItemIndexFactoryImpl.writeDocument(ItemIndexFactoryImpl.java:752)
        at org.dspace.discovery.indexobject.ItemIndexFactoryImpl.writeDocument(ItemIndexFactoryImpl.java:81)
        at org.dspace.discovery.SolrServiceImpl.update(SolrServiceImpl.java:186)
        at org.dspace.discovery.SolrServiceImpl.indexContent(SolrServiceImpl.java:175)
        ... 9 more
Caused by: org.apache.tika.exception.TikaException: exception parsing the csv
        at org.apache.tika.parser.csv.TextAndCSVParser.parse(TextAndCSVParser.java:218)
        at org.dspace.discovery.indexobject.IndexFactoryImpl.writeDocument(IndexFactoryImpl.java:119)
        ... 13 more
Caused by: java.lang.IllegalStateException: IOException reading next record: java.io.IOException: (startline 515) EOF reached before encapsulated token finished
        at org.apache.commons.csv.CSVParser$CSVRecordIterator.getNextRecord(CSVParser.java:149)
        at org.apache.commons.csv.CSVParser$CSVRecordIterator.hasNext(CSVParser.java:159)
        at org.apache.tika.parser.csv.TextAndCSVParser.parse(TextAndCSVParser.java:198)
        ... 14 more
Caused by: java.io.IOException: (startline 515) EOF reached before encapsulated token finished
        at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:371)
        at org.apache.commons.csv.Lexer.nextToken(Lexer.java:285)
        at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:701)
        at org.apache.commons.csv.CSVParser$CSVRecordIterator.getNextRecord(CSVParser.java:146)

Regards
Reply all
Reply to author
Forward
0 new messages