Possible parsing bug?

zap

unread,

Jun 23, 2011, 5:45:43 AM6/23/11

to scrapy-users

Is this a parsing bug when scrapy hits bad html tags
http://stackoverflow.com/questions/6443485/scrapy-parsing-issue-with-malformed-br-tags

Pablo Hoffman

unread,

Jun 23, 2011, 10:48:09 AM6/23/11

to scrapy...@googlegroups.com

This is using selectors or link extractors?. They are based on different
parsing technologies (libxml2 and SGML parser).

Can you share the URL where this happens?

On Thu, Jun 23, 2011 at 02:45:43AM -0700, zap wrote:
> Is this a parsing bug when scrapy hits bad html tags
> http://stackoverflow.com/questions/6443485/scrapy-parsing-issue-with-malformed-br-tags
>

> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.

Message has been deleted

zap

unread,

Jun 23, 2011, 12:40:03 PM6/23/11

to scrapy-users

This is using link extractors. I have a local file that serves as
start_url for my crawl spider. The contents of the file are below. If
there is no space between br & slash, only the first link is
extracted.
<a href="http://www.dmoz.org/Shopping/Office_Products/">Officep</a><br/
>
<a href="http://www.dmoz.org/Health/Nutrition/">Nutrition</a><br/>

zap

unread,

Jul 1, 2011, 1:54:13 PM7/1/11

to scrapy-users

Hi Pablo, Were you able to verify the issue? Any update/fix for this?

Reply all

Reply to author

Forward