[sakai-kernel] [Sakai Jira] Created: (KERN-754) Tika exceptions when starting up nakamura

9 views
Skip to first unread message

Christian Vuerings (JIRA)

unread,
Apr 28, 2010, 9:03:43 AM4/28/10
to sakai-...@googlegroups.com
Tika exceptions when starting up nakamura
-----------------------------------------

Key: KERN-754
URL: http://jira.sakaiproject.org/browse/KERN-754
Project: Nakamura
Issue Type: Bug
Components: System - other
Affects Versions: 0.4
Reporter: Christian Vuerings
Priority: Minor
Fix For: 0.6


When we start nakamura (./tools/run_debug.sh) we get a whole range of Tika exceptions.
Everything is working fine after the initial start, but some people still think something went wrong and they assume nakamura won't.
Atm I just have to tell them to ignore those errors and just get along with it, but IMHO this is not a best practice.


28.04.2010 13:22:50.663 *INFO* [SCR Component Actor] org.apache.sling.jcr.contentloader.internal.ContentLoaderService createFile: Cannot find content type for .project, using application/octet-stream
28.04.2010 13:23:01.428 *WARN* [jackrabbit-pool-3] org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField Failed to extract text from a binary property org.apache.tika.exception.TikaException: TIKA-237: Illegal SAXException from org.apache.tika.parser.xml.DcXMLParser@1d525bb
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:130)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
at org.apache.jackrabbit.core.query.lucene.JackrabbitParser.parse(JackrabbitParser.java:189)
at org.apache.jackrabbit.core.query.lucene.JackrabbitParser.parse(JackrabbitParser.java:195)
at org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:160)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:637)
Caused by: org.xml.sax.SAXParseException: Premature end of file.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:195)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:174)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:388)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1414)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1059)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:395)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:198)
at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:72)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
... 12 more
28.04.2010 13:23:01.776 *WARN* [jackrabbit-pool-1] org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField Failed to extract text from a binary property org.apache.tika.exception.TikaException: TIKA-237: Illegal SAXException from org.apache.tika.parser.xml.DcXMLParser@1d525bb
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:130)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
at org.apache.jackrabbit.core.query.lucene.JackrabbitParser.parse(JackrabbitParser.java:189)
at org.apache.jackrabbit.core.query.lucene.JackrabbitParser.parse(JackrabbitParser.java:195)
at org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:160)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:637)
Caused by: org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:195)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:174)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:388)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1414)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$TrailingMiscDriver.next(XMLDocumentScannerImpl.java:1422)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:395)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:198)
at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:72)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
... 12 more
28.04.2010 13:23:01.901 *WARN* [jackrabbit-pool-3] org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField Failed to extract text from a binary property org.apache.tika.exception.TikaException: TIKA-237: Illegal SAXException from org.apache.tika.parser.xml.DcXMLParser@1d525bb
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:130)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
at org.apache.jackrabbit.core.query.lucene.JackrabbitParser.parse(JackrabbitParser.java:189)
at org.apache.jackrabbit.core.query.lucene.JackrabbitParser.parse(JackrabbitParser.java:195)
at org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:160)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:637)
Caused by: org.xml.sax.SAXParseException: Premature end of file.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:195)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:174)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:388)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1414)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1059)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:395)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:198)
at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:72)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
... 12 more
28.04.2010 13:23:04.092 *INFO* [SCR Component Actor] org.apache.felix.scr Running task: Enable Component: org.apache.sling.jcr.ocm.impl.ObjectContentManagerFactoryImpl (103)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.sakaiproject.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



--
You received this message because you are subscribed to the Google Groups "Sakai Kernel" group.
To post to this group, send email to sakai-...@googlegroups.com.
To unsubscribe from this group, send email to sakai-kernel...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sakai-kernel?hl=en.

Ian Boston (JIRA)

unread,
Apr 28, 2010, 7:23:43 PM4/28/10
to sakai-...@googlegroups.com

[ http://jira.sakaiproject.org/browse/KERN-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=98518#action_98518 ]

Ian Boston commented on KERN-754:
---------------------------------

The exceptions come from Tika embedded in Jackrabbit.

They are caused by a document that claims to be XML not being valid XML.

This is really a Jackrabbit issue, but there are 2 solutions.

1. Make the document valid XML
2. Dont claim that it is xml when its not (I am not certain how Tika decides that it should be XML)

AFAICT From a Tika pov the warning is valid, however contacting Jackrabbit or Tika user lists might get an "official" view.

Christian Vuerings (JIRA)

unread,
Apr 29, 2010, 4:07:42 AM4/29/10
to sakai-...@googlegroups.com

[ http://jira.sakaiproject.org/browse/KERN-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=98528#action_98528 ]

Christian Vuerings commented on KERN-754:
-----------------------------------------

I would definitely want to try to go for option 1 (make it a valid xml).

Do you also know which xml files are causing the exceptions?
I tried to distract this from the log files but it wasn't clear to me.

Ian Boston (JIRA)

unread,
May 27, 2010, 9:45:43 AM5/27/10
to sakai-...@googlegroups.com

[ http://jira.sakaiproject.org/browse/KERN-754?page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#action_51414 ]

Ian Boston logged work on KERN-754:
-----------------------------------

Author: Ian Boston
Created on: 27-May-2010 06:44
Start Date: 27-May-2010 06:43
Worklog Time Spent: 4 hours
Work Description: Logging work done investigating this, Tika has configuration, jackrabbit configures Tika, but all attempts to override the config have failed. Suggestions on list amounted to "cant be done", leaving 8h of work on this one.

Issue Time Tracking
-------------------

Remaining Estimate: 1 day
Time Spent: 4 hours

> Tika exceptions when starting up nakamura
> -----------------------------------------
>
> Key: KERN-754
> URL: http://jira.sakaiproject.org/browse/KERN-754
> Project: Nakamura
> Issue Type: Bug
> Components: System - other
> Affects Versions: 0.4
> Reporter: Christian Vuerings
> Priority: Minor
> Fix For: 0.6
>

> Time Spent: 4 hours
> Remaining Estimate: 1 day

Simon Gaeremynck (JIRA)

unread,
Jun 10, 2010, 9:08:42 AM6/10/10
to sakai-...@googlegroups.com

[ http://jira.sakaiproject.org/browse/KERN-754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Gaeremynck reassigned KERN-754:
-------------------------------------

Assignee: Simon Gaeremynck

> Tika exceptions when starting up nakamura
> -----------------------------------------
>
> Key: KERN-754
> URL: http://jira.sakaiproject.org/browse/KERN-754
> Project: Nakamura
> Issue Type: Bug
> Components: System - other
> Affects Versions: 0.4
> Reporter: Christian Vuerings

> Assignee: Simon Gaeremynck


> Priority: Minor
> Fix For: 0.6
>

> Time Spent: 4 hours
> Remaining Estimate: 1 day
>

Simon Gaeremynck (JIRA)

unread,
Jun 10, 2010, 1:07:42 PM6/10/10
to sakai-...@googlegroups.com

[ http://jira.sakaiproject.org/browse/KERN-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=100447#action_100447 ]

Simon Gaeremynck commented on KERN-754:
---------------------------------------

The files are:
* myfriends.html
* helloworld.html
* sendmessage.html
* .project files
* helloworld.html
* jcap.js

Doing a file --mime-type on there returns text/html most of the time, so I don't really know why Tika tries to parse it as XML

> Tika exceptions when starting up nakamura
> -----------------------------------------
>
> Key: KERN-754
> URL: http://jira.sakaiproject.org/browse/KERN-754
> Project: Nakamura
> Issue Type: Bug
> Components: System - other
> Affects Versions: 0.4
> Reporter: Christian Vuerings

> Assignee: Simon Gaeremynck


> Priority: Minor
> Fix For: 0.6
>

> Time Spent: 4 hours
> Remaining Estimate: 1 day
>

Ian Boston (JIRA)

unread,
Jun 10, 2010, 4:40:44 PM6/10/10
to sakai-...@googlegroups.com

[ http://jira.sakaiproject.org/browse/KERN-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=100457#action_100457 ]

Ian Boston commented on KERN-754:
---------------------------------

I think a comment I made got lost somewhere.

I asked the JR list

http://markmail.org/thread/6nfty43dcxs6p2bv

The problem is that the HTML is perfect for HTML but is being classified as XML because the settings in tika-config.xml incorrectly think that namespace less html is xml, when its really html.

To fix this we need to find a way of modifying the tika-config.xml and get it loaded in preference to the one that comes with the jackrabbit core jar

> Tika exceptions when starting up nakamura
> -----------------------------------------
>
> Key: KERN-754
> URL: http://jira.sakaiproject.org/browse/KERN-754
> Project: Nakamura
> Issue Type: Bug
> Components: System - other
> Affects Versions: 0.4
> Reporter: Christian Vuerings

> Assignee: Simon Gaeremynck


> Priority: Minor
> Fix For: 0.6
>

> Time Spent: 4 hours
> Remaining Estimate: 1 day
>

Simon Gaeremynck (JIRA)

unread,
Jun 21, 2010, 6:16:42 AM6/21/10
to sakai-...@googlegroups.com

[ http://jira.sakaiproject.org/browse/KERN-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=100724#action_100724 ]

Simon Gaeremynck commented on KERN-754:
---------------------------------------

I'm not so sure if the tika-config.xml file is the culprit. That just maps mimetypes to a Parser.
Doing:
java -jar tika-app.0.6.jar -t helloworld.html
gives the same exception.

I think it's the tika-mimetypes.xml file that doesn't suit our HTML fragments patterns.

> Tika exceptions when starting up nakamura
> -----------------------------------------
>
> Key: KERN-754
> URL: http://jira.sakaiproject.org/browse/KERN-754
> Project: Nakamura
> Issue Type: Bug
> Components: System - other
> Affects Versions: 0.4
> Reporter: Christian Vuerings

> Assignee: Simon Gaeremynck


> Priority: Minor
> Fix For: 0.6
>

> Time Spent: 4 hours
> Remaining Estimate: 1 day
>

Ian Boston (JIRA)

unread,
Jun 21, 2010, 6:36:42 AM6/21/10
to sakai-...@googlegroups.com

[ http://jira.sakaiproject.org/browse/KERN-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=100726#action_100726 ]

Ian Boston commented on KERN-754:
---------------------------------

If

java -jar tika-app.0.6.jar -t helloworld.html

gives the same exception (ie TIKA-237: Illegal SAXException )
then the SAX Parser is being used which indicates that the XML parser is also being used rather than the loose HTML parser, which is exactly what is happening with Nakamura.

If you can force Tika to use the HTML parser and you still get an exception (one that does not mention SAX, since IIRC the HTML parser doesnt use SAX) then our fragments are not parseable by the HTML parser, and we need to find an alternative route.

> Tika exceptions when starting up nakamura
> -----------------------------------------
>
> Key: KERN-754
> URL: http://jira.sakaiproject.org/browse/KERN-754
> Project: Nakamura
> Issue Type: Bug
> Components: System - other
> Affects Versions: 0.4
> Reporter: Christian Vuerings

> Assignee: Simon Gaeremynck


> Priority: Minor
> Fix For: 0.6
>

> Time Spent: 4 hours
> Remaining Estimate: 1 day
>

Simon Gaeremynck (JIRA)

unread,
Jun 21, 2010, 7:15:42 AM6/21/10
to sakai-...@googlegroups.com

[ http://jira.sakaiproject.org/browse/KERN-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=100729#action_100729 ]

Simon Gaeremynck commented on KERN-754:
---------------------------------------

The HTMLParser doesn't use SAX and doing a HTMLParser.parse(....) with our fragments works just fine.
When using the AutoDetectParser it detects that the file is application/xml
It then tries to determine if the file is HTML or XML .
It does this by looking at the first bytes and checks if there is an HTML, link, head or title, .. tag in there.

The new widget specs states that all the widgets will be full-blown HTML pages and not fragments anymore.
I asked one of our UI guys and he said that they want to move to that spec sooner, rather then later.
Which makes me feel that we shouldn't spend to much time on it?

> Tika exceptions when starting up nakamura
> -----------------------------------------
>
> Key: KERN-754
> URL: http://jira.sakaiproject.org/browse/KERN-754
> Project: Nakamura
> Issue Type: Bug
> Components: System - other
> Affects Versions: 0.4
> Reporter: Christian Vuerings

> Assignee: Simon Gaeremynck


> Priority: Minor
> Fix For: 0.6
>

> Time Spent: 4 hours
> Remaining Estimate: 1 day
>

Ian Boston (JIRA)

unread,
Jun 21, 2010, 7:27:42 AM6/21/10
to sakai-...@googlegroups.com

[ http://jira.sakaiproject.org/browse/KERN-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=100730#action_100730 ]

Ian Boston commented on KERN-754:
---------------------------------

My point exactly,
AutoDetectParser detects that HTML is application/xml for the purposes of parsing which means the file must be valid XML.

Only xhtml is guaranteed to be valid XML, most HTML is not valid XML, and wherever a fragment is stored in the content system, TIKA-237 will be reported, not just widgets, so application/xml should really only be selected where the file is guaranteed to be valid XML.

I agree that making widgets full html pages and forcing them to be fully valid XML will help, but the fundamental problem remains, HTML should use the HTMLParser not the XMLParser. To fix that we need to fix the tika-config.xml to that is correctly identifies html and text/html and not application/xml

> Tika exceptions when starting up nakamura
> -----------------------------------------
>
> Key: KERN-754
> URL: http://jira.sakaiproject.org/browse/KERN-754
> Project: Nakamura
> Issue Type: Bug
> Components: System - other
> Affects Versions: 0.4
> Reporter: Christian Vuerings

> Assignee: Simon Gaeremynck


> Priority: Minor
> Fix For: 0.6
>

> Time Spent: 4 hours
> Remaining Estimate: 1 day
>

Simon Gaeremynck (JIRA)

unread,
Jun 22, 2010, 1:16:42 PM6/22/10
to sakai-...@googlegroups.com

[ http://jira.sakaiproject.org/browse/KERN-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=100781#action_100781 ]

Simon Gaeremynck commented on KERN-754:
---------------------------------------

I've comitted a bundle that unpacks Tika and has a custom tika-mimetypes.xml file.
If I do a java -jar tika-app.jar -m on our fragments it correctly parses it as text/html

When I load the content in the server however, it tries to parse it as application/xml.
When I attach a debugger to check it, it correctly uses text/html again.

I'm fairly sure that the jackrabbit-core jar is using tika with our tika-mimetypes file but it looks like there is some threading issue going on.

> Tika exceptions when starting up nakamura
> -----------------------------------------
>
> Key: KERN-754
> URL: http://jira.sakaiproject.org/browse/KERN-754
> Project: Nakamura
> Issue Type: Bug
> Components: System - other
> Affects Versions: 0.4
> Reporter: Christian Vuerings

> Assignee: Simon Gaeremynck


> Priority: Minor
> Fix For: 0.6
>

> Time Spent: 4 hours
> Remaining Estimate: 1 day
>

Ian Boston (JIRA)

unread,
Jun 25, 2010, 7:17:42 AM6/25/10
to sakai-...@googlegroups.com

[ http://jira.sakaiproject.org/browse/KERN-754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ian Boston resolved KERN-754.
-----------------------------

Resolution: Fixed

This is now fixed,

SOme of the files starting with <!-- were categorized as html and then re-categorized as xml.

> Tika exceptions when starting up nakamura
> -----------------------------------------
>
> Key: KERN-754
> URL: http://jira.sakaiproject.org/browse/KERN-754
> Project: Nakamura
> Issue Type: Bug
> Components: System - other
> Affects Versions: 0.4
> Reporter: Christian Vuerings

> Assignee: Simon Gaeremynck
> Priority: Minor

> Fix For: 0.7


>
> Time Spent: 4 hours
> Remaining Estimate: 1 day
>

Reply all
Reply to author
Forward
0 new messages