words.txt

3 views
Skip to first unread message

marian...@gmail.com

unread,
Apr 27, 2009, 8:36:52 PM4/27/09
to hounder
I want to start the crawler, but when I delete the keywords in the
archive called words.txt, it doesn't crawl any web pages, it is as if
I have to ask the crawler to bring any keyword it finds, but I used
the * and it is still not returning anything, the error log is the
only thing that increases.

Thank you very much for answering my question
Mariana

Jorge Handl

unread,
Apr 27, 2009, 9:03:25 PM4/27/09
to hou...@googlegroups.com
Mariana, that is because you are using the WordFilterModule, which filters any page that doesn't have at least one of the words/phrases specified in the words.txt file. Specifying "*" does not work with that module, and it doesn't handle regular expressions either. If you want to crawl unrestricted, just eliminate the reference to the WorldFilterModule in the crawler.properties file.

- Jorge
Reply all
Reply to author
Forward
0 new messages