Tika exception error while indexing rich documents rails 3

68 views
Skip to first unread message

Karan Nanda

unread,
Sep 27, 2012, 7:39:42 AM9/27/12
to bangal...@googlegroups.com

Hi All,

Well I am just implementing full text search in rich documents using sunspot_cell. I am using paperclip for attachment.

I have done all the required configurations and include all the *.jar files in solr/lib dir. But its not able index the document. I am getting the following Tika exception error:

RSolr::Error::Http (RSolr::Error::Http - 500 Internal Server Error Error: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.PDFParser@17fc44f org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.PDFParser@17fc44f at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)

My Gemfile looks like:

gem 'sunspot', :git => "git://github.com/sunspot/sunspot.git" gem 'sunspot_rails', :git => "git://github.com/sunspot/sunspot.git", :require => "sunspot_rails" gem 'sunspot_test' gem 'sunspot_cell', :git => 'git://github.com/zheileman/sunspot_cell.git'

group :development, :test do gem 'sunspot_cell_jars', :git => 'https://github.com/mrcsparker/sunspot_cell_jars.git' gem 'sunspot_solr', :git => "git://github.com/sunspot/sunspot.git", :require => "sunspot_solr" gem 'progress_bar' end

Any solutions to this.


--
Thanks and Regards
KARAN NANDA
+919953779677

deepak kannan

unread,
Oct 1, 2012, 1:51:11 AM10/1/12
to bangal...@googlegroups.com
I am guessing that Tika is not able to parse the document
I have not used Tika before, but i would ask some questions like:

Can you wrote a java program that works with the same pdf ?
This is to check that the problem is with the pdf and not the ruby driver to code

also how do you interface with Tika, http or some other protocol ?
if you are using a ruby driver, check the opened bugs on the ruby driver
if you are using http, can you try with curl and reproduce the issue 

hope that helps

--
 
 



--
best,
deepak
w: https://gist.github.com/deepak
Reply all
Reply to author
Forward
0 new messages