PATRICK CAIN wrote:
> thanks kent.
> i've never used a parser before (i'm totally new at all of this) and i'm
> trying to scrape xml pages that look like this....
> <pitch des="In play, no out" id="4" type="X" x="120.17" y="147.65"
> sv_id="080408_183820" start_speed="96.8" end_speed="88.1" sz_top="3.221"
> sz_bot="1.58" pfx_x="-4.073" pfx_z="5.789" px="-0.537" pz="2.618"
> x0="-1.786" y0="50.0" z0="6.269" vx0="4.94" vy0="-141.752" vz0="-6.468"
> ax="-8.164" ay="36.947" az="-20.499" break_y="23.7" break_angle="17.4"
> break_length="4.7" pitch_type="FA" type_confidence="1.2611603255950259"/>
> my approach to date has been doing a lot of .split() and the knowing
> each variable's index...everyone I tell looks at me like im crazy and
> says to learn a parser...
> can you push me in a direction w/ BS that might help me?
In [1]: from BeautifulSoup import BeautifulStoneSoup
In [2]: data = '''<pitch des="In play, no out" id="4" type="X"
x="120.17" y="147.65" sv_id="080408_183820" start_speed="96.8"
end_speed="88.1" sz_top="3.221" sz_bot="1.58" pfx_x="-4.073"
pfx_z="5.789" px="-0.537" pz="2.618" x0="-1.786" y0="50.0" z0="6.269"
vx0="4.94" vy0="-141.752" vz0="-6.468" ax="-8.164" ay="36.947"
az="-20.499" break_y="23.7" break_angle="17.4" break_length="4.7"
pitch_type="FA" type_confidence="1.2611603255950259"/>'''
In [3]: soup=BeautifulStoneSoup(data)
In [5]: soup.pitch
Out[5]: <pitch des="In play, no out" id="4" type="X" x="120.17"
y="147.65" sv_id="080408_183820" start_speed="96.8" end_speed="88.1"
sz_top="3.221" sz_bot="1.58" pfx_x="-4.073" pfx_z="5.789" px="-0.537"
pz="2.618" x0="-1.786" y0="50.0" z0="6.269" vx0="4.94" vy0="-141.752"
vz0="-6.468" ax="-8.164" ay="36.947" az="-20.499" break_y="23.7"
break_angle="17.4" break_length="4.7" pitch_type="FA"
type_confidence="1.2611603255950259"></pitch>
In [6]: soup.pitch['des']
Out[6]: u'In play, no out'
In [7]: soup.pitch['x']
Out[7]: u'120.17'
etc.
Kent