NullPointerException when sharing HTMLSchema

23 views
Skip to first unread message

Joe Humphreys

unread,
Dec 19, 2012, 9:39:49 PM12/19/12
to tagsoup...@googlegroups.com
Hi. I have some server code in which different threads create Parsers using a shared (static) copy of an HTMLSchema. Since we always set the ignoreBogons feature, there shouldn't be any changes to the data in the Schema after it's created. However, we often see this error:

java.lang.NullPointerException
null
at org.ccil.cowan.tagsoup.Element.<init>(Element.java:39)
at org.ccil.cowan.tagsoup.Parser.setup(Parser.java:467)
at org.ccil.cowan.tagsoup.Parser.parse(Parser.java:439)

Which would seem to indicate that theSchema.getElementType("<root>")  returned null. 

I cannot reproduce this in development, though it happens regularly in production. I've looked at this code about 600 times and cannot see how this could happen.

I found this somewhat related issue (https://issues.apache.org/jira/browse/TIKA-719) but it's duped to another issue that indicates it was fixed by turning on ignoreBogons.

 Anyone have any ideas?

Thanks,
Joe H

John Cowan

unread,
Dec 20, 2012, 4:51:56 PM12/20/12
to Joe Humphreys, tagsoup...@googlegroups.com
Joe Humphreys scripsit:

> Hi. I have some server code in which different threads create Parsers
> using a shared (static) copy of an HTMLSchema. Since we always set
> the ignoreBogons feature, there shouldn't be any changes to the data
> in the Schema after it's created.

I have nothing to suggest except not to use a shared copy. I have come
to believe that shared schemas were a mistake, and I will probably remove them
from TagSoup in a future release. They are penny-wise pound-foolish.

--
John Cowan co...@ccil.org http://www.ccil.org/~cowan
Does anybody want any flotsam? / I've gotsam.
Does anybody want any jetsam? / I can getsam.
--Ogden Nash, No Doctors Today, Thank You
Reply all
Reply to author
Forward
0 new messages