Possible parsing bug?

14 views
Skip to first unread message

zap

unread,
Jun 23, 2011, 5:45:43 AM6/23/11
to scrapy-users

Pablo Hoffman

unread,
Jun 23, 2011, 10:48:09 AM6/23/11
to scrapy...@googlegroups.com
This is using selectors or link extractors?. They are based on different
parsing technologies (libxml2 and SGML parser).

Can you share the URL where this happens?

On Thu, Jun 23, 2011 at 02:45:43AM -0700, zap wrote:
> Is this a parsing bug when scrapy hits bad html tags
> http://stackoverflow.com/questions/6443485/scrapy-parsing-issue-with-malformed-br-tags
>

> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.

Message has been deleted

zap

unread,
Jun 23, 2011, 12:40:03 PM6/23/11
to scrapy-users
This is using link extractors. I have a local file that serves as
start_url for my crawl spider. The contents of the file are below. If
there is no space between br & slash, only the first link is
extracted.
<a href="http://www.dmoz.org/Shopping/Office_Products/">Officep</a><br/
>
<a href="http://www.dmoz.org/Health/Nutrition/">Nutrition</a><br/>

zap

unread,
Jul 1, 2011, 1:54:13 PM7/1/11
to scrapy-users
Hi Pablo, Were you able to verify the issue? Any update/fix for this?
Reply all
Reply to author
Forward
0 new messages