A working XML parsing example!

1,048 views
Skip to first unread message

mark mclaren

unread,
Apr 11, 2008, 4:55:13 AM4/11/08
to Google App Engine
I finally got an XML parsing example working! I appreciate this is
very simple stuff but I am not a native Python programmer (apologies
for this).

At present, expat seems to be required to perform typical XML parsing
using minidom or ElementTree methods. However, ElementTree suggests
using a non-expat parser, namely SimpleXMLTreeBuilder.

As I could not find SimpleXMLTreeBuilder in my Python 2.5 installation
or in the Google App engine SDK I located SimpleXMLTreeBuilder using
Google Code Search (http://www.google.com/codesearch?
q=SimpleXMLTreeBuilder).

Downloaded it, tweaked it slightly to replace "import ElementTree"
with "from xml.etree import ElementTree".

I then set about reproducing Yahoo! Developer Networks Weather example
from here:

http://developer.yahoo.com/python/python-xml.html

This is my code:

from google.appengine.api import urlfetch
from xml.etree import ElementTree
import SimpleXMLTreeBuilder

WEATHER_URL = 'http://xml.weather.yahoo.com/forecastrss?p=%s'
WEATHER_NS = 'http://xml.weather.yahoo.com/ns/rss/1.0'

def parse( url ) :
result = urlfetch.fetch(url)
if result.status_code == 200:
parser = SimpleXMLTreeBuilder.TreeBuilder()
parser.feed(result.content)
return parser.close()

def weather_for_zip(zip_code):
url = WEATHER_URL % zip_code
rss = parse(url)
forecasts = []
for element in rss.findall('channel/item/{%s}forecast' %
WEATHER_NS):
forecasts.append({
'date': element.get('date'),
'low': element.get('low'),
'high': element.get('high'),
'condition': element.get('text')
})
ycondition = rss.find('channel/item/{%s}condition' % WEATHER_NS)
return {
'current_condition': ycondition.get('text'),
'current_temp': ycondition.get('temp'),
'forecasts': forecasts,
'title': rss.findtext('channel/title')
}

print 'Content-Type: text/plain'
print ''
print weather_for_zip('94089')

and it works!

http://yahoo-weather.appspot.com/

Hope this helps!

Mark

David

unread,
Apr 12, 2008, 11:11:17 PM4/12/08
to Google App Engine
Cool Mark. thanks. I am looking into this now and you may have just
given me a big head start!

Srinath

unread,
May 14, 2008, 3:05:03 AM5/14/08
to Google App Engine, mark.m...@gmail.com
Hi,

Its returning 500, internal server error for some reason for me

-
Srinath

Joel Odom

unread,
May 14, 2008, 7:59:26 AM5/14/08
to Google App Engine
minidom now works in SDK 1.0.2!



On Apr 11, 4:55 am, mark mclaren <mark.mcla...@gmail.com> wrote:

James Levy

unread,
May 20, 2008, 8:34:11 AM5/20/08
to Google App Engine
SimpleXMLTreeBuilder is no longer supported in the latest SDK. Returns
"object is unsubscriptable" error.

mark mclaren

unread,
May 21, 2008, 4:28:49 AM5/21/08
to Google App Engine
Update: I hope this is still useful for people searching for a working
example!

I can confirm that minidom now works in SDK 1.0.2. As I understand
it, this is because pyexpat is now in the GAE whitelist of C
libraries:

http://code.google.com/appengine/kb/libraries.html

I am using SDK 1.0.2 on Windows XP, therefore for this to work I
needed to patch urlfetch_stub.py because the development server was
discarding my URL parameters.

http://code.google.com/p/googleappengine/issues/detail?id=341

You no longer need to use upload a version of the SimpleXMLTreeBuilder
(although this still works) and the code using minidom now looks like
this:

from google.appengine.api import urlfetch
from xml.dom import minidom

WEATHER_URL = 'http://xml.weather.yahoo.com/forecastrss?p=%s'
WEATHER_NS = 'http://xml.weather.yahoo.com/ns/rss/1.0'


def parse( url ) :
result = urlfetch.fetch(url)
if result.status_code == 200:
return minidom.parseString(result.content)

def weather_for_zip(zip_code):
url = WEATHER_URL % zip_code
dom = parse(url)
forecasts = []
for node in dom.getElementsByTagNameNS(WEATHER_NS, 'forecast'):
forecasts.append({
'date': node.getAttribute('date'),
'low': node.getAttribute('low'),
'high': node.getAttribute('high'),
'condition': node.getAttribute('text')
})
return {
'forecasts': forecasts,
'title': dom.getElementsByTagName('title')[0].firstChild.data
Reply all
Reply to author
Forward
0 new messages