Importing external data [Python]

11 views
Skip to first unread message

benji

unread,
Oct 27, 2009, 12:35:23 PM10/27/09
to Google App Engine
Hi,

I am brand new to GAE and Python and have very limited programming
skills thus far, so this is a cry for help after spending the last few
days trawling google for a solution :(

I want to import, parse and embed an RSS feed from a Google
spreedsheet. I have done this before in a PHP script that simply acts
as a proxy to parse RSS into HTML and used a Server Side Include to
embed the output of the PHP script - nice and easy and SEO friendly.

So far I currently only have a very simple static set up on GAE as
follows:

handlers:

- url: /

static_files: assets/index.html

upload: assets/index.html


- url: /

static_dir: assets


I thought I may have 3 different potential solutions,

1) somehow point a url at an external domain that can handle PHP -
thought it was worth a try in the YAML file - wishful thinking!!

ie.

- url:/PageNeedingRSSProxyData

static_dir: http://otherdomain.com/PageWithRSSProxyData

2) Grabbing the output of the external PHP file with URL fetch API and
caching it, then including the file in the HTM using django templates
- i looked at this and got very lost.

3) Grabing the RSS feed directly and parsing it with Python and
embedding it in a web page - again got very lost.


Can anyone point me in the right direction? or am i too far out of my
depth?

Any help on this would be awesome and very much appreciated.

Cheers

Ben


ryan baldwin

unread,
Oct 28, 2009, 10:27:20 AM10/28/09
to google-a...@googlegroups.com
Point 3 seems like the proper choice (along with some caching). Where are you getting lost? In how to parse with python, how to cache, or how to render in a template (or all of the above?)

- ryan.

benji

unread,
Oct 28, 2009, 10:47:12 AM10/28/09
to Google App Engine
Thanks Ryan,

I thought point 3 seems like the proper direction to go in, thanks.

I can handle putting together a simple django template I would say -
seems straight forward enough. However I'm stuck from there on in!
i.e. the python based importing of the RSS feed, cache, parsing to
html elements and output into the template...

I think URL Fetch seems to be the way to go from what I can work out
form the documentation as this has in built caching facility - right?

I got this functionality working with JSON using JS code provided by
google code examples - but the requirement is to embed the data for
SEO purposes and of course implementing a cache for
unobtrusiveness....

where do I start?

Thanks again for the help, really very much appreciated!

Cheers

Ben

ryan baldwin

unread,
Oct 28, 2009, 10:59:20 AM10/28/09
to google-a...@googlegroups.com
Yea, URLFetch has some builtin caching, however that will only cache the response of the URLFetch, it won't cache all the parsing/processing of the RSS feed (which would be a better thing for you to cache in Memcache).  The idea of Memcache is fairly simple: Look for the item in memcache by some key that you know about, if the item isn't in memcache then put the item in memcache (in your case, urlfetch the rss, parse it, and throw it in memcache) using the same key you tried to get it with.  The Memcache pattern article can be found here:  http://code.google.com/appengine/docs/python/memcache/usingmemcache.html#Pattern

As far as parsing/processing the RSS feed... there's a million ways to skin a dead cat. Personally I would be inclined to write an xslt transformation that does the RSS to HTML for me, but if you're a beginner then xslt is probably not a good option.  There's an opensource RSS feed parser for python lib that you can use - http://www.feedparser.org/ - although I haven't used it. A quick looks as though it converts the feed into a dictionary.  You could then pass this dict to whatever django template you're using.

Those are some high level ideas for you. Hopefully it's enough to get you started.

Good luck!

- ryan.
Reply all
Reply to author
Forward
0 new messages