UnicodeWarning

Pat

unread,

Apr 18, 2008, 6:48:56 PM4/18/08

to beautifulsoup

Hey everyone, i'm new at BS so picked up the documentation and
starting trying the examples for parsing XML.

I get all of 3 lines into the script when I get this warning:
"
BeautifulSoup.py:1611: UnicodeWarning: Unicode equal comparison failed
to convert both arguments to Unicode - interpreting them as being
unequal
elif data[:3] == '\xef\xbb\xbf':
BeautifulSoup.py:1614: UnicodeWarning: Unicode equal comparison failed
to convert both arguments to Unicode - interpreting them as being
unequal
elif data[:4] == '\x00\x00\xfe\xff':
BeautifulSoup.py:1617: UnicodeWarning: Unicode equal comparison failed
to convert both arguments to Unicode - interpreting them as being
unequal
elif data[:4] == '\xff\xfe\x00\x00':
"

To do that I did the follow, can anyone help me discern what the heck
I did wrong?

xml = "http://gd2.mlb.com/components/game/mlb/year_2008/month_04/
day_08/gid_2008_04_08_atlmlb_colmlb_1/inning/inning_1.xml"

import urllib
xmlpage = urllib.urlopen(xml)
from BeautifulSoup import BeautifulStoneSoup
soup = BeautifulStoneSoup(xmlpage)

That's where the error comes in....

On a lark i tried using urllib2...that didn't trigger the error....Why
does urllib cause all the drama in BS while urllib2 doesn't?
thanks

Kent Johnson

unread,

Apr 19, 2008, 12:00:29 PM4/19/08

to beauti...@googlegroups.com

Pat wrote:
> import urllib
> xmlpage = urllib.urlopen(xml)

Try
xmlpage = urllib.urlopen(xml).read()

Kent

PATRICK CAIN

unread,

Apr 19, 2008, 12:54:30 PM4/19/08

to beauti...@googlegroups.com

thanks kent.

i've never used a parser before (i'm totally new at all of this) and i'm trying to scrape xml pages that look like this....

<pitch des="In play, no out" id="4" type="X" x="120.17" y="147.65" sv_id="080408_183820" start_speed="96.8" end_speed="88.1" sz_top="3.221" sz_bot="1.58" pfx_x="-4.073" pfx_z="5.789" px="-0.537" pz="2.618" x0="-1.786" y0="50.0" z0="6.269" vx0="4.94" vy0="-141.752" vz0="-6.468" ax="-8.164" ay="36.947" az="-20.499" break_y="23.7" break_angle="17.4" break_length="4.7" pitch_type="FA" type_confidence="1.2611603255950259"/>

my approach to date has been doing a lot of .split() and the knowing each variable's index...everyone I tell looks at me like im crazy and says to learn a parser...

can you push me in a direction w/ BS that might help me?

thanks!

Kent Johnson

unread,

Apr 19, 2008, 2:50:13 PM4/19/08

to beauti...@googlegroups.com

PATRICK CAIN wrote:
> thanks kent.
>
> i've never used a parser before (i'm totally new at all of this) and i'm
> trying to scrape xml pages that look like this....
>
> <pitch des="In play, no out" id="4" type="X" x="120.17" y="147.65"
> sv_id="080408_183820" start_speed="96.8" end_speed="88.1" sz_top="3.221"
> sz_bot="1.58" pfx_x="-4.073" pfx_z="5.789" px="-0.537" pz="2.618"
> x0="-1.786" y0="50.0" z0="6.269" vx0="4.94" vy0="-141.752" vz0="-6.468"
> ax="-8.164" ay="36.947" az="-20.499" break_y="23.7" break_angle="17.4"
> break_length="4.7" pitch_type="FA" type_confidence="1.2611603255950259"/>
>
> my approach to date has been doing a lot of .split() and the knowing
> each variable's index...everyone I tell looks at me like im crazy and
> says to learn a parser...
>
> can you push me in a direction w/ BS that might help me?

In [1]: from BeautifulSoup import BeautifulStoneSoup
In [2]: data = '''<pitch des="In play, no out" id="4" type="X"

x="120.17" y="147.65" sv_id="080408_183820" start_speed="96.8"
end_speed="88.1" sz_top="3.221" sz_bot="1.58" pfx_x="-4.073"
pfx_z="5.789" px="-0.537" pz="2.618" x0="-1.786" y0="50.0" z0="6.269"
vx0="4.94" vy0="-141.752" vz0="-6.468" ax="-8.164" ay="36.947"
az="-20.499" break_y="23.7" break_angle="17.4" break_length="4.7"

pitch_type="FA" type_confidence="1.2611603255950259"/>'''
In [3]: soup=BeautifulStoneSoup(data)
In [5]: soup.pitch
Out[5]: <pitch des="In play, no out" id="4" type="X" x="120.17"

y="147.65" sv_id="080408_183820" start_speed="96.8" end_speed="88.1"
sz_top="3.221" sz_bot="1.58" pfx_x="-4.073" pfx_z="5.789" px="-0.537"
pz="2.618" x0="-1.786" y0="50.0" z0="6.269" vx0="4.94" vy0="-141.752"
vz0="-6.468" ax="-8.164" ay="36.947" az="-20.499" break_y="23.7"
break_angle="17.4" break_length="4.7" pitch_type="FA"

type_confidence="1.2611603255950259"></pitch>
In [6]: soup.pitch['des']
Out[6]: u'In play, no out'
In [7]: soup.pitch['x']
Out[7]: u'120.17'