Re: Error parsing (some) large XML files

294 views
Skip to first unread message

Leonard Richardson

unread,
Oct 11, 2012, 2:20:53 PM10/11/12
to beauti...@googlegroups.com
This is a bug in Beautiful Soup. I've filed it for you:

https://bugs.launchpad.net/beautifulsoup/+bug/1065617

A fix will be present in the next release.

Leonard

On Thu, Oct 11, 2012 at 10:29 AM, Matthew Wilkens <mattw...@gmail.com> wrote:
> I'm having what seems like an odd problem parsing the attached XML file.
> When I try this:
>
> from bs4 import BeautifulSoup
> soup = BeautifulSoup(open('problem.xml', 'r'), 'xml')
>
> I get the following error:
>
> Traceback (most recent call last):
> File "test-bs-xml.py", line 2, in <module>
> soup = BeautifulSoup(open('problem.xml', 'r'), 'xml')
> File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 172, in
> __init__
> self._feed()
> File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 185, in
> _feed
> self.builder.feed(self.markup)
> File "/Library/Python/2.7/site-packages/bs4/builder/_lxml.py", line 85, in
> feed
> self.parser.close()
> File "parser.pxi", line 1164, in lxml.etree._FeedParser.close
> (src/lxml/lxml.etree.c:79835)
> File "parsertarget.pxi", line 126, in
> lxml.etree._TargetParserContext._handleParseResult
> (src/lxml/lxml.etree.c:88881)
> File "lxml.etree.pyx", line 282, in
> lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:7469)
> File "saxparser.pxi", line 170, in lxml.etree._handleSaxStart
> (src/lxml/lxml.etree.c:84109)
> File "parsertarget.pxi", line 73, in
> lxml.etree._PythonSaxParserTarget._handleSaxStart
> (src/lxml/lxml.etree.c:88236)
> File "/Library/Python/2.7/site-packages/bs4/builder/_lxml.py", line 126,
> in start
> attr = NamespacedAttribute(nsprefix, attr, namespace)
> File "/Library/Python/2.7/site-packages/bs4/element.py", line 30, in
> __new__
> obj = unicode.__new__(cls, prefix + ":" + name)
> TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
>
> Things are working fine on (some) other XML files, so it seems to something
> about the content of this one (and many others like it in a set I'm working
> with). A Unicode problem, maybe? Any idea what the issue could be? Dumbness
> on my part, I suspect.
>
> I'm using Beautiful Soup 4.1.3 and lxml 2.3.6 under Python 2.7.2 on Mac OS X
> 10.8.2.
>
> Many thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "beautifulsoup" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/beautifulsoup/-/29poWwuOHgAJ.
> To post to this group, send email to beauti...@googlegroups.com.
> To unsubscribe from this group, send email to
> beautifulsou...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/beautifulsoup?hl=en.

Matthew Wilkens

unread,
Oct 16, 2012, 4:27:41 PM10/16/12
to beauti...@googlegroups.com, leon...@segfault.org
Excellent - thanks very much!
Reply all
Reply to author
Forward
0 new messages