Re: Confusing Tika Issue

174 views
Skip to first unread message

Mark Mandel

unread,
Jul 31, 2012, 12:48:58 AM7/31/12
to javaloa...@googlegroups.com
So which line is:
C:\inetpub\wwwroot\Navy\solr\SolrIndex.cfc:49

?

Mark

On Tue, Jul 31, 2012 at 2:39 PM, Jim Leether <j...@leether.com> wrote:
Hello all,
 
I have an issue with Tika and JavaLoader that really kind of has me stumped.
 
I'm using:
Newest JavaLoader 1.1 code
Newest Tika 1.2
SolrJ 3.6.0
ColdFusion 9
 
I'm fighting the common issue with OpenXML files, which should be a pretty straight forward remedy using the switchThreadContextClassLoader in JavaLoader, but I'm stuck beating my head against my desk.
 
I'm creating my JavaLoader instance in the normal way, using the ColdFusion class path:
 
var paths = arrayNew(1);
arrayAppend(paths,expandPath("/#application.applicationname#/solr/solrj-lib/apache-solr-solrj-3.6.0.jar"));
arrayAppend(paths,expandPath("/#application.applicationname#/solr/lib/tika-app-1.2.jar"));
APPLICATION.javaloader = createObject("component", "solr.javaloader.JavaLoader").init(loadPaths=paths,loadColdFusionClassPath=true);
 
Then creating my cached Tika instance:
 
application.tika = application.javaloader.create("org.apache.tika.Tika").init();
 
When I encounter an openXML File I call:
 
return application.javaloader.switchThreadContextClassLoader(processOpenXmlFile, { filePath = arguments.filePath });
 
My code to parse the OpenXML file was provided by Jeff Coughlin (Thanks again Jeff!):
 
<cffunction name="processOpenXMLFile" access="private" returntype="string">
     <cfargument name="filepath" type="string" required="yes">
       
        <cfscript>
          // grab a new instance of tika
          var tika = application.javaloader.create("org.apache.tika.Tika").init();
  
          // parse the file
          var returnValue = tika.parseToString(createObject("java","java.io.File").init(arguments.filePath));
  
          // return the parsed string
          return returnValue;
  
        </cfscript>
       
 </cffunction>
 
I'm calling my function using switchThreadContextClassLoader, but I'm still getting
 
java.lang.ExceptionInInitializerError
 at org.apache.poi.openxml4j.opc.internal.unmarshallers.PackagePropertiesUnmarshaller.<clinit>(PackagePropertiesUnmarshaller.java:49)
 at org.apache.poi.openxml4j.opc.OPCPackage.init(OPCPackage.java:154)
 at org.apache.poi.openxml4j.opc.OPCPackage.<init>(OPCPackage.java:141)
 at org.apache.poi.openxml4j.opc.Package.<init>(Package.java:54)
 at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:99)
 at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:207)
 at org.apache.tika.parser.pkg.ZipContainerDetector.detectOfficeOpenXML(ZipContainerDetector.java:194)
 at org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:134)
 at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:77)
 at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113)
 at org.apache.tika.Tika.parseToString(Tika.java:380)
 at org.apache.tika.Tika.parseToString(Tika.java:492)
 at org.apache.tika.Tika.parseToString(Tika.java:472)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at coldfusion.runtime.StructBean.invoke(StructBean.java:508)
 at coldfusion.runtime.CfJspPage._invoke(CfJspPage.java:2393)
 at cfSolrIndex2ecfc320775421$funcPROCESSSOLRFILE.runFunction(C:\inetpub\wwwroot\Navy\solr\SolrIndex.cfc:49)
 at coldfusion.runtime.UDFMethod.invoke(UDFMethod.java:472)
 at coldfusion.runtime.UDFMethod$ReturnTypeFilter.invoke(UDFMethod.java:405)
 at coldfusion.runtime.UDFMethod$ArgumentCollectionFilter.invoke(UDFMethod.java:368)
 at coldfusion.filter.FunctionAccessFilter.invoke(FunctionAccessFilter.java:55)
 at coldfusion.runtime.UDFMethod.runFilterChain(UDFMethod.java:321)
 at coldfusion.runtime.UDFMethod.invoke(UDFMethod.java:220)
 at coldfusion.runtime.CfJspPage._invokeUDF(CfJspPage.java:2582)
 at cfSolrIndex2ecfc320775421$funcINDEXFILE.runFunction(C:\inetpub\wwwroot\Navy\solr\SolrIndex.cfc:285)
 at coldfusion.runtime.UDFMethod.invoke(UDFMethod.java:472)
 at coldfusion.filter.SilentFilter.invoke(SilentFilter.java:47)
 at coldfusion.runtime.UDFMethod$ReturnTypeFilter.invoke(UDFMethod.java:405)
 at coldfusion.runtime.UDFMethod$ArgumentCollectionFilter.invoke(UDFMethod.java:368)
 at coldfusion.filter.FunctionAccessFilter.invoke(FunctionAccessFilter.java:55)
 at coldfusion.runtime.UDFMethod.runFilterChain(UDFMethod.java:321)
 at coldfusion.runtime.UDFMethod.invoke(UDFMethod.java:220)
 at coldfusion.runtime.TemplateProxy.invoke(TemplateProxy.java:491)
 at coldfusion.runtime.TemplateProxy.invoke(TemplateProxy.java:337)
 at coldfusion.runtime.CfJspPage._invoke(CfJspPage.java:2360)
 at cfJLLISFile2ecfc592206074$funcADDENTITYJLLISFILESBYENTITYIDANDENTITYTYPECODE.runFunction(C:\inetpub\wwwroot\Navy\cfc\Services\JLLISFile.cfc:57)
 at coldfusion.runtime.UDFMethod.invoke(UDFMethod.java:472)
 at coldfusion.filter.SilentFilter.invoke(SilentFilter.java:47)
 at coldfusion.runtime.UDFMethod$ReturnTypeFilter.invoke(UDFMethod.java:405)
 at coldfusion.runtime.UDFMethod$ArgumentCollectionFilter.invoke(UDFMethod.java:368)
 at coldfusion.filter.FunctionAccessFilter.invoke(FunctionAccessFilter.java:55)
 at coldfusion.runtime.UDFMethod.runFilterChain(UDFMethod.java:321)
 at coldfusion.runtime.UDFMethod.invoke(UDFMethod.java:517)
 at coldfusion.runtime.CfJspPage._invokeUDF(CfJspPage.java:2547)
 at coldfusion.runtime.SuperScope.invoke(SuperScope.java:18)
 at coldfusion.runtime.CfJspPage._invoke(CfJspPage.java:2301)
 at cfJLLISFileRemote2ecfc1186757023$funcADDENTITYJLLISFILESBYENTITYIDANDENTITYTYPECODE.runFunction(C:\inetpub\wwwroot\Navy\cfc\Services\JLLISFileRemote.cfc:127)
 at coldfusion.runtime.UDFMethod.invoke(UDFMethod.java:472)
 at coldfusion.filter.SilentFilter.invoke(SilentFilter.java:47)
 at coldfusion.runtime.UDFMethod$ReturnTypeFilter.invoke(UDFMethod.java:405)
 at coldfusion.runtime.UDFMethod$ArgumentCollectionFilter.invoke(UDFMethod.java:368)
 at coldfusion.filter.FunctionAccessFilter.invoke(FunctionAccessFilter.java:55)
 at coldfusion.runtime.UDFMethod.runFilterChain(UDFMethod.java:321)
 at coldfusion.runtime.UDFMethod.invoke(UDFMethod.java:517)
 at coldfusion.runtime.TemplateProxy.invoke(TemplateProxy.java:496)
 at coldfusion.runtime.TemplateProxy.invoke(TemplateProxy.java:355)
 at coldfusion.filter.ComponentFilter.invoke(ComponentFilter.java:188)
 at coldfusion.filter.ApplicationFilter.invoke(ApplicationFilter.java:374)
 at coldfusion.filter.RequestMonitorFilter.invoke(RequestMonitorFilter.java:48)
 at coldfusion.filter.MonitoringFilter.invoke(MonitoringFilter.java:40)
 at coldfusion.filter.PathFilter.invoke(PathFilter.java:94)
 at coldfusion.filter.LicenseFilter.invoke(LicenseFilter.java:27)
 at coldfusion.filter.ExceptionFilter.invoke(ExceptionFilter.java:70)
 at coldfusion.filter.ClientScopePersistenceFilter.invoke(ClientScopePersistenceFilter.java:28)
 at coldfusion.filter.BrowserFilter.invoke(BrowserFilter.java:38)
 at coldfusion.filter.NoCacheFilter.invoke(NoCacheFilter.java:46)
 at coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:38)
 at coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22)
 at coldfusion.xml.rpc.CFCServlet.invoke(CFCServlet.java:138)
 at coldfusion.xml.rpc.CFCServlet.doPost(CFCServlet.java:289)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:760)
 at org.apache.axis.transport.http.AxisServletBase.service(AxisServletBase.java:327)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
 at coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:89)
 at jrun.servlet.FilterChain.doFilter(FilterChain.java:86)
 at coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringServletFilter.java:42)
 at coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)
 at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
 at jrun.servlet.FilterChain.service(FilterChain.java:101)
 at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
 at jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
 at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:286)
 at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
 at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
 at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
 at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
 at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
 at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)
Caused by: java.lang.ClassCastException: org.dom4j.DocumentFactory cannot be cast to org.dom4j.DocumentFactory
 at org.dom4j.DocumentFactory.getInstance(DocumentFactory.java:97)
 at org.dom4j.tree.AbstractNode.<clinit>(AbstractNode.java:39)
 ... 92 more
 
in the exception log, which is the usual problem when the threadContextClassLoader is not switching.
 
I've been over and over the documentation and I'm beginning to second guess my sanity.
 
Maybe someone will see something I'm missing...
 
I'm going to begin drinking heavily soon. :-)
 

--
You received this message because you are subscribed to the Google Groups "javaloader-dev" group.
To view this discussion on the web visit https://groups.google.com/d/msg/javaloader-dev/-/CPFa5KgApAoJ.
To post to this group, send email to javaloa...@googlegroups.com.
To unsubscribe from this group, send email to javaloader-de...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/javaloader-dev?hl=en.



--
E: mark....@gmail.com
T: http://www.twitter.com/neurotic
W: www.compoundtheory.com

2 Devs from Down Under Podcast

Jim Leether

unread,
Jul 31, 2012, 12:56:12 AM7/31/12
to javaloa...@googlegroups.com

Heh, it’s the closing parenthesis in a SQL function

 

AND NOT EXISTS

               (

               SELECT 1

               FROM JLLIS.dbo.EntityJLLISFile EJF WITH (NOLOCK)

               WHERE EJF.JLLISFileID = JF.JLLISFileID

               AND EJF.EntityID = <cfqueryparam cfsqltype="cf_sql_integer" value="#arguments.nEntityID#">

               AND EJF.EntityTypeCode = <cfqueryparam cfsqltype="cf_sql_varchar" value="#arguments.cEntityTypeCode#">

               )

               SELECT SCOPE_IDENTITY() AS NewIdentifier

 

Second to last line.  This code works without problems when the solr code is commented out.

Mark Mandel

unread,
Jul 31, 2012, 12:58:56 AM7/31/12
to javaloa...@googlegroups.com
Huh?

But that's where the error gets thrown - how can it be a SQL function?

Mark

Jeff Coughlin

unread,
Jul 31, 2012, 1:31:14 AM7/31/12
to javaloa...@googlegroups.com
You mentioned the new version of Tika that was recently released (v1.2).  Just curious... does the same error happen if you use Tika v1.1?

--
Jeff Coughlin

Jim Leether

unread,
Jul 31, 2012, 1:38:49 AM7/31/12
to javaloa...@googlegroups.com

Hi Jeff,

 

I figured it out.  One of the new “features” of our application was a security enhancement where they renamed the files using a UUID and stripped off the file extension on upload.  As a result, the code looking for the file extension wasn’t finding anything and was trying to process them as regular files.  Result = KABOOM.  Luckily they do save the original file extension in the database and I’m able to pull that out and use it to parse the file correctly.

 

I love it when things that were working fine “just break”.

 

Thank you to everyone for your help while I regained my sanity.

 

Thank you again, Jeff, for your help with our initial setup of our Solr server.  You are indeed, the man.

 

--Jim

 

From: javaloa...@googlegroups.com [mailto:javaloa...@googlegroups.com] On Behalf Of Jeff Coughlin
Sent: Tuesday, July 31, 2012 1:31 AM
To: javaloa...@googlegroups.com
Subject: Re: Confusing Tika Issue

 

You mentioned the new version of Tika that was recently released (v1.2).  Just curious... does the same error happen if you use Tika v1.1?

Reply all
Reply to author
Forward
0 new messages