Hi,
we're using the Norbert robots.txt parser as-is, so I don't know
off the top of my head what's going on. It would help to have an
example file that causes the problem for diagnosis. There is
always the possibility that the problem is on the target server's
side.
Best regards,
Andreas.
On 01/02/13 06:20, Altruist wrote:
>
> Hello All,
>
> When ldspider parses a robots.txt file it seems it does not escape
> certain characters properly as can be seen in the exception stack trace
> below.Can you please advise how to overcome this issue.Please note that
> I have masked certain data.
>
> Thanks.
>
> INFO: ERROR: URLDecoder: *Incomplete trailing escape (%) pattern:
>
http://www.xxxxxx.com/robots.txt*
> *1359695278 266 127.0.0.1 TCP_MISS/200 -1 GET
> *Exception in thread
> "LT-2:
http://www.xxxxx.com/xxxxxx-x--x--x---x-x-/xxx-x--x--x--x-x--x-xxxxx"
> *java.lang.NullPointerException*
> at org.osjava.norbert.NoRobotClient.isUrlAllowed(Unknown Source)
> at com.ontologycentral.ldspider.http.robot.Robot.isUrlAllowed(Unknown
> Source)
> at com.ontologycentral.ldspider.http.robot.Robots.accessOk(Unknown Source)
> at com.ontologycentral.ldspider.http.LookupThread.run(Unknown Source)
>
> --
> You received this message because you are subscribed to the Google
> Groups "LDSpider" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
ldspider+u...@googlegroups.com.
> For more options, visit
https://groups.google.com/groups/opt_out.
>
>