Hi,
I'm facing some problems with session query string and anchors which
cause crawler loops and several duplicate pages.
Doing some research I found nutch has addressed this issue through a
"url normalization" configuration file, above link contains a sample
of this file.
http://svn.apache.org/viewvc/lucene/nutch/trunk/conf/regex-normalize.xml.template?revision=627890
I would like to know if Hounder can use some kind of configuration
like this, and in case it isn't if you could point me on right
direction to add this feature.
Thanks a lot!
Gustavo Arjones