Problems with CData

302 views
Skip to first unread message

Gustavo Ambrozio

unread,
Apr 20, 2012, 3:19:52 PM4/20/12
to beautifulsoup
Hi,

In my html I have a bunch of code like this:

<html><body><script>a="<a href='teste'>";</script></body></html>

If I parse this with BS4 I get what is the correct (in the strict
sense) result:

>>> from bs4 import BeautifulSoup
>>> from bs4 import CData
>>> html = "<html><body><script>a=\"<a href='teste'>\";</script></body></html>"
>>> s = BeautifulSoup(html)
>>> s
<html><body><script>a="&lt;a href='teste'&gt;";</script></body></html>

But even though it's correct it does not work in the browser because
now the browser thinks the variable a is "&lt;a href='teste'&gt;".
Weird but true.

I'd like to get the code wrapped in a cdata, so I did this:

>>> script_text = s.html.body.script.children.next()
>>> cdata = CData(script_text.string)
>>> cdata
u'a="<a href=\'teste\'>";'

>>> script_text.replace_with(cdata)
>>> s
<html><body><script><![CDATA[a="&lt;a href='teste'&gt;";]]></script></
body></html>

Is this the correct behavior? Shouldn't it be?:

<html><body><script><![CDATA[a="<a href='teste'>";]]></script></body></
html>

If not, how can I achieve this? I now that doing:

>>> s.encode(formatter=None)
'<html><body><script><![CDATA[a="<a href=\'teste\'>";]]></script></
body></html>'

Works, but I'd like it to still correct wrong chars inside an <a> tag,
for example and using formatter=None would not correct those, so....

Any help is appreciated.

Cheers,
Gustavo

Leonard Richardson

unread,
Apr 26, 2012, 10:46:38 AM4/26/12
to beauti...@googlegroups.com
Gustavo,

This is a bug:

https://bugs.launchpad.net/beautifulsoup/+bug/988905

It'll be fixed in the next release.

Leonard
> --
> You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
> To post to this group, send email to beauti...@googlegroups.com.
> To unsubscribe from this group, send email to beautifulsou...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/beautifulsoup?hl=en.
>
Reply all
Reply to author
Forward
0 new messages