Hi Julien,
> and it would be great to have a new release.
Yes, definitely.
> There is 1 PR in process
> <
https://github.com/crawler-commons/crawler-commons/pull/360>, should we wait
> for it or go ahead with the release?
I would opt to wait for it. If we get it done, SimpleRobotRulesParser will pass
all unit tests of Google's robots.txt parser which is also the reference
implementation of RFC 9309.
#360 is blocked by #390 (or #114) - one unit test fails because
SimpleRobotRulesParser closes a rule block when a Crawl-Delay instruction is
observed. The RFC says that instructions not specified in the RFC should not
change the behavior of the parser - they might be ignored or followed without
any side effects. But it's mostly about a decision how the parser should behave.
Please have a look!
I should be able to work on #360 during the next days. Because there are quite a
few changes in the robots.txt parser, I would also like to test the current
parser with some real robots.txt. If the release can wait for another week...
Best,
Sebastian
On 5/24/23 09:05, Julien Nioche wrote:
> Hi,
>
> We have loads of improvements and bugfixes committed since 1.3 (see CHANGES.txt
> <
https://github.com/crawler-commons/crawler-commons/blob/master/CHANGES.txt>)
> and it would be great to have a new release.
>
> There is 1 PR in process
> <
https://github.com/crawler-commons/crawler-commons/pull/360>, should we wait