BeautifulStoneSoup xml parsing issue

234 views
Skip to first unread message

Nikolay

unread,
Jan 19, 2009, 4:07:57 AM1/19/09
to beautifulsoup
How can I disable the following xml-rewriting feature (or bug?):

In [4]: print BeautifulSoup.__version__
3.1.0.1
In [5]: print BeautifulStoneSoup("<a><b><a></a></b></a>").prettify()
<a>
<b>
</b>
</a>
<a>
</a>

For instance, Django object to XML dumper generates similar code. And
so this dump become invalid after some work with it in BS.

Have a nice day,
Nikolay.

Leonard Richardson

unread,
Jan 19, 2009, 8:36:59 AM1/19/09
to beauti...@googlegroups.com
Nikolay,

You need to tell BS that <a> tags can be nested within themselves.
Customizing the list of netable tags is covered here.

http://www.crummy.com/software/BeautifulSoup/documentation.html#Customizing%20the%20Parser

Leonard

Nikolay Panov

unread,
Jan 19, 2009, 9:03:42 AM1/19/09
to beauti...@googlegroups.com
Thank you!

Have a nice day,
Nikolay.



Nikolay Panov

unread,
Jan 19, 2009, 9:10:05 AM1/19/09
to beauti...@googlegroups.com
Another question. Why BS do that:
In [13]: print MyBeautifulSoup("<A></A>").prettify()
<a>
</a>
How can I prevent lowercasing?

Have a nice day,
Nikolay.



On Mon, Jan 19, 2009 at 16:36, Leonard Richardson <leon...@segfault.org> wrote:
>

Leonard Richardson

unread,
Jan 19, 2009, 9:18:14 AM1/19/09
to beauti...@googlegroups.com
> Another question. Why BS do that:
> In [13]: print MyBeautifulSoup("<A></A>").prettify()
> <a>
> </a>
> How can I prevent lowercasing?

BS does this because HTMLParser does it. If you need to preserve tag
case, try lxml.

Leonard

Reply all
Reply to author
Forward
0 new messages