You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to hounder
I have an issue with the crawler that I'm sure is an easy fix but I'm
not sure where to look. I keep getting:
java.lang.IllegalArgumentException: Adding text to an XML document
must not be null
in $BASEDIR/crawler/log/hounder.log over and over and over. It seems
to be only for PDF files. Is there something I have to do to turn on
PDF indexing?
Thanks in advance,
Billford
Jorge Handl
unread,
Aug 18, 2009, 12:50:58 PM8/18/09
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to hou...@googlegroups.com
Bill, you can either filter PDF file through the regex-urlfilter.txt file or add the parser in the plugin.includes property in the nutch-site.xml file. - Jorge
Bilford
unread,
Aug 18, 2009, 3:41:19 PM8/18/09
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to hounder
Thanks Jorge
On Aug 18, 12:50 pm, Jorge Handl <jha...@gmail.com> wrote:
> Bill, you can either filter PDF file through the regex-urlfilter.txt file or
> add the parser in the plugin.includes property in the nutch-site.xml file.
> - Jorge
>