I'm having what seems like an odd problem parsing the attached XML file. When I try this:
from bs4 import BeautifulSoup soup = BeautifulSoup(open('problem.xml', 'r'), 'xml')
I get the following error:
Traceback (most recent call last): File "test-bs-xml.py", line 2, in <module> soup = BeautifulSoup(open('problem.xml', 'r'), 'xml') File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 172, in __init__ self._feed() File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 185, in _feed self.builder.feed(self.markup) File "/Library/Python/2.7/site-packages/bs4/builder/_lxml.py", line 85, in feed self.parser.close() File "parser.pxi", line 1164, in lxml.etree._FeedParser.close (src/lxml/lxml.etree.c:79835) File "parsertarget.pxi", line 126, in lxml.etree._TargetParserContext._handleParseResult (src/lxml/lxml.etree.c:88881) File "lxml.etree.pyx", line 282, in lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:7469) File "saxparser.pxi", line 170, in lxml.etree._handleSaxStart (src/lxml/lxml.etree.c:84109) File "parsertarget.pxi", line 73, in lxml.etree._PythonSaxParserTarget._handleSaxStart (src/lxml/lxml.etree.c:88236) File "/Library/Python/2.7/site-packages/bs4/builder/_lxml.py", line 126, in start attr = NamespacedAttribute(nsprefix, attr, namespace) File "/Library/Python/2.7/site-packages/bs4/element.py", line 30, in __new__ obj = unicode.__new__(cls, prefix + ":" + name) TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
Things are working fine on (some) other XML files, so it seems to something about the content of this one (and many others like it in a set I'm working with). A Unicode problem, maybe? Any idea what the issue could be? Dumbness on my part, I suspect.
I'm using Beautiful Soup 4.1.3 and lxml 2.3.6 under Python 2.7.2 on Mac OS X 10.8.2.
On Thu, Oct 11, 2012 at 10:29 AM, Matthew Wilkens <mattwilk...@gmail.com> wrote:
> I'm having what seems like an odd problem parsing the attached XML file.
> When I try this:
> Traceback (most recent call last):
> File "test-bs-xml.py", line 2, in <module>
> soup = BeautifulSoup(open('problem.xml', 'r'), 'xml')
> File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 172, in
> __init__
> self._feed()
> File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 185, in
> _feed
> self.builder.feed(self.markup)
> File "/Library/Python/2.7/site-packages/bs4/builder/_lxml.py", line 85, in
> feed
> self.parser.close()
> File "parser.pxi", line 1164, in lxml.etree._FeedParser.close
> (src/lxml/lxml.etree.c:79835)
> File "parsertarget.pxi", line 126, in
> lxml.etree._TargetParserContext._handleParseResult
> (src/lxml/lxml.etree.c:88881)
> File "lxml.etree.pyx", line 282, in
> lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:7469)
> File "saxparser.pxi", line 170, in lxml.etree._handleSaxStart
> (src/lxml/lxml.etree.c:84109)
> File "parsertarget.pxi", line 73, in
> lxml.etree._PythonSaxParserTarget._handleSaxStart
> (src/lxml/lxml.etree.c:88236)
> File "/Library/Python/2.7/site-packages/bs4/builder/_lxml.py", line 126,
> in start
> attr = NamespacedAttribute(nsprefix, attr, namespace)
> File "/Library/Python/2.7/site-packages/bs4/element.py", line 30, in
> __new__
> obj = unicode.__new__(cls, prefix + ":" + name)
> TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
> Things are working fine on (some) other XML files, so it seems to something
> about the content of this one (and many others like it in a set I'm working
> with). A Unicode problem, maybe? Any idea what the issue could be? Dumbness
> on my part, I suspect.
> I'm using Beautiful Soup 4.1.3 and lxml 2.3.6 under Python 2.7.2 on Mac OS X
> 10.8.2.
> Many thanks!
> --
> You received this message because you are subscribed to the Google Groups
> "beautifulsoup" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/beautifulsoup/-/29poWwuOHgAJ.
> To post to this group, send email to beautifulsoup@googlegroups.com.
> To unsubscribe from this group, send email to
> beautifulsoup+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/beautifulsoup?hl=en.
> On Thu, Oct 11, 2012 at 10:29 AM, Matthew Wilkens <mattw...@gmail.com<javascript:>> > wrote: > > I'm having what seems like an odd problem parsing the attached XML file. > > When I try this:
> > Traceback (most recent call last): > > File "test-bs-xml.py", line 2, in <module> > > soup = BeautifulSoup(open('problem.xml', 'r'), 'xml') > > File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 172, in > > __init__ > > self._feed() > > File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 185, in > > _feed > > self.builder.feed(self.markup) > > File "/Library/Python/2.7/site-packages/bs4/builder/_lxml.py", line > 85, in > > feed > > self.parser.close() > > File "parser.pxi", line 1164, in lxml.etree._FeedParser.close > > (src/lxml/lxml.etree.c:79835) > > File "parsertarget.pxi", line 126, in > > lxml.etree._TargetParserContext._handleParseResult > > (src/lxml/lxml.etree.c:88881) > > File "lxml.etree.pyx", line 282, in > > lxml.etree._ExceptionContext._raise_if_stored > (src/lxml/lxml.etree.c:7469) > > File "saxparser.pxi", line 170, in lxml.etree._handleSaxStart > > (src/lxml/lxml.etree.c:84109) > > File "parsertarget.pxi", line 73, in > > lxml.etree._PythonSaxParserTarget._handleSaxStart > > (src/lxml/lxml.etree.c:88236) > > File "/Library/Python/2.7/site-packages/bs4/builder/_lxml.py", line > 126, > > in start > > attr = NamespacedAttribute(nsprefix, attr, namespace) > > File "/Library/Python/2.7/site-packages/bs4/element.py", line 30, in > > __new__ > > obj = unicode.__new__(cls, prefix + ":" + name) > > TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
> > Things are working fine on (some) other XML files, so it seems to > something > > about the content of this one (and many others like it in a set I'm > working > > with). A Unicode problem, maybe? Any idea what the issue could be? > Dumbness > > on my part, I suspect.
> > I'm using Beautiful Soup 4.1.3 and lxml 2.3.6 under Python 2.7.2 on Mac > OS X > > 10.8.2.
> > Many thanks!
> > -- > > You received this message because you are subscribed to the Google > Groups > > "beautifulsoup" group. > > To view this discussion on the web visit > > https://groups.google.com/d/msg/beautifulsoup/-/29poWwuOHgAJ. > > To post to this group, send email to beauti...@googlegroups.com<javascript:>.