<gust
...@gustavo.eng.br> wrote:
> Hi,
> In my html I have a bunch of code like this:
> <html><body><script>a="<a href='teste'>";</script></body></html>
> If I parse this with BS4 I get what is the correct (in the strict
> sense) result:
>>>> from bs4 import BeautifulSoup
>>>> from bs4 import CData
>>>> html = "<html><body><script>a=\"<a href='teste'>\";</script></body></html>"
>>>> s = BeautifulSoup(html)
>>>> s
> <html><body><script>a="<a href='teste'>";</script></body></html>
> But even though it's correct it does not work in the browser because
> now the browser thinks the variable a is "<a href='teste'>".
> Weird but true.
> I'd like to get the code wrapped in a cdata, so I did this:
>>>> script_text = s.html.body.script.children.next()
>>>> cdata = CData(script_text.string)
>>>> cdata
> u'a="<a href=\'teste\'>";'
>>>> script_text.replace_with(cdata)
>>>> s
> <html><body><script><![CDATA[a="<a href='teste'>";]]></script></
> body></html>
> Is this the correct behavior? Shouldn't it be?:
> <html><body><script><![CDATA[a="<a href='teste'>";]]></script></body></
> html>
> If not, how can I achieve this? I now that doing:
>>>> s.encode(formatter=None)
> '<html><body><script><![CDATA[a="<a href=\'teste\'>";]]></script></
> body></html>'
> Works, but I'd like it to still correct wrong chars inside an <a> tag,
> for example and using formatter=None would not correct those, so....
> Any help is appreciated.
> Cheers,
> Gustavo
> --
> You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
> To post to this group, send email to beautifulsoup@googlegroups.com.
> To unsubscribe from this group, send email to beautifulsoup+unsubscribe@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/beautifulsoup?hl=en.