> I came across this line that makes beautifulsoup fails
>
> <EMBED NAME="IncrediFlash" SRC="http://www.heroturko.org/
> globaldomain.swf?317338960" BGCOLOR="#000000"
> WIDTH="150" HEIGHT="250" TYPE="application/x-shockwave-flash"
> pluginspage=http://www.macromedia.com/go/getflashplayer">
> </EMBED>
What version of Python, BeautifulSoup, and in what context.
> Hope you can fix it.
Hope you can provide a complete bug report.
S
pluginspage=http://www.macromedia.com/go/getflashplayer"
Maybe a search and replace before putting it into Beautiful Soup would be good?
src = src.replace('pluginspage=h', 'pluginspage="h')
-Aaron DeVore
On Wed, Feb 3, 2010 at 4:30 AM, Telemat <kdo...@gmail.com> wrote:
> Hi
>
> I came across this line that makes beautifulsoup fails
>
> <EMBED NAME="IncrediFlash" SRC="http://www.heroturko.org/
> globaldomain.swf?317338960" BGCOLOR="#000000"
> WIDTH="150" HEIGHT="250" TYPE="application/x-shockwave-flash"
> pluginspage=http://www.macromedia.com/go/getflashplayer">
> </EMBED>
>
> Hope you can fix it.
>
> --
> You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
> To post to this group, send email to beauti...@googlegroups.com.
> To unsubscribe from this group, send email to beautifulsou...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/beautifulsoup?hl=en.
>
>
Excellent! For the record, this will be either difficult or impossible
to fix as part of HTMLParser. The problem is deciding what should
happen when the value of the attribute really is foo". Case in point
<tag attr=value">
Is that 'value"'? That's how sgmllib handles it. Or is it 'value'?
What happens in this case:
<tag attr="value>
sgmllib handles it as '"value'. HTMLParser silently ignores the tag
and the rest of the document. Manual filtering is really the only way
to handle this error.
-Aaron DeVore
>
> On Feb 4, 1:29 am, Aaron DeVore <aaron.dev...@gmail.com> wrote:
>> I didn't test this, but maybe it's the missing quote mark on this part:
>>
>> pluginspage=http://www.macromedia.com/go/getflashplayer"
>>
>> Maybe a search and replace before putting it into Beautiful Soup would be good?
>>
>> src = src.replace('pluginspage=h', 'pluginspage="h')
>>
>> -Aaron DeVore
>>
I just look back on this message and noticed a problem. 'pluginspage='
will catch valid pluginspage attributes
<tag pluginspage="...">
goes to
<tag pluginspage=""...">
That's why I instead matched 'pluginspage=h'.
-Aaron DeVore