Easy RSS Parsing with Ruby on Rails

90 views
Skip to first unread message

frank_kany

unread,
Feb 2, 2010, 10:52:43 AM2/2/10
to Nashville Ruby Group
I hope this helps someone else out. I couldn't find any good articles
when googling on how to parse RSS feeds. Found some that were pretty
vague and some that didn't even work. When mixed all together, here's
what worked for parsing a Craigslist RSS feed. If you can improve the
code, please do so and help us all out =).

***CONTROLLER***:
source = "http://#{city}.craigslist.org/#{category}/index.rss" #
url or local file
content = "" # raw content of rss feed will be loaded here
open(source) do |s| content = s.read end
craigslist_rss_feed = RSS::Parser.parse(content, false)

***VIEW***:
<!-- Craigslist Category Title -->
<h1> <%= "RSS title: #{craigslist_rss_feed.channel.title}" %> </h1>
<br>

<!-- Load All Craigslist ads for City & Category -->
<% for rss_item in craigslist_rss_feed.items do %>
<a href='<%= "#{rss_item.link}" %>'> <%= rss_item.title %> </a><br>
<%= rss_item.description %><br><br>
<% end %>

Thanks,

Frank

Daniel Nelson

unread,
Feb 2, 2010, 11:27:22 AM2/2/10
to nashville-...@googlegroups.com
"Programming Ruby" by the Pragmatic Programmers (the "pickaxe" book)
has a straightforward code snippet on the RSS page of its Standard
Library section (p. 728 in the second printed edition, which covers
Ruby 1.8).

Ruby 1.8:
http://pragprog.com/titles/ruby/programming-ruby

Ruby 1.9
http://www.pragprog.com/titles/ruby3/programming-ruby-1-9

If you are going to parse RSS in the view, you should use a helper.
Also, it generally isn't a good idea to hit the RSS feed every time a
page is visited. Cache it and only check it every so often (how often
to be dependent on the nature of the feed and how time-sensitive it
is).

Cheers,

Daniel

andrew mcelroy

unread,
Feb 2, 2010, 11:32:49 AM2/2/10
to nashville-...@googlegroups.com
On Tue, Feb 2, 2010 at 9:52 AM, frank_kany <fran...@gmail.com> wrote:
I hope this helps someone else out.  I couldn't find any good articles
when googling on how to parse RSS feeds.  Found some that were pretty
vague and some that didn't even work.  When mixed all together, here's
what worked for parsing a Craigslist RSS feed.  If you can improve the
code, please do so and help us all out =).

This certainly will work:

http://stackoverflow.com/questions/214590/parsing-atom-rss-in-ruby-rails
check this out.

My initial gut reaction was why isn't this a job for simple-rss or  nokogiri.
Hpricot would have also been useful.
 
Andrew

***CONTROLLER***:
   source = "http://#{city}.craigslist.org/#{category}/index.rss" #
url or local file
   content = "" # raw content of rss feed will be loaded here
   open(source) do |s| content = s.read end
   craigslist_rss_feed = RSS::Parser.parse(content, false)

***VIEW***:
<!-- Craigslist Category Title -->
<h1> <%= "RSS title: #{craigslist_rss_feed.channel.title}" %> </h1>
<br>

<!-- Load All Craigslist ads for City & Category -->
<% for rss_item in craigslist_rss_feed.items do %>
 <a href='<%= "#{rss_item.link}" %>'> <%= rss_item.title %> </a><br>
 <%= rss_item.description %><br><br>
<% end %>

Thanks,

Frank

--
==========================================================
You received this message because you are subscribed to the Google
Groups "Nashville Ruby Group" group.
To post to this group, send email to
nashville-...@googlegroups.com
To unsubscribe from this group, send email to
nashville-ruby-g...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nashville-ruby-group?hl=en

Frank Kany

unread,
Feb 2, 2010, 11:51:07 AM2/2/10
to nashville-...@googlegroups.com
I appreciate the feedback.  What are the advantages of putting the code in a Helper instead of the Controller?
 
When you mean 'cache' do you mean use something like the following? If not, how should it be coded?
Thanks,

Frank

Greg Donald

unread,
Feb 2, 2010, 11:55:12 AM2/2/10
to nashville-...@googlegroups.com
On Tue, Feb 2, 2010 at 10:51 AM, Frank Kany <fran...@gmail.com> wrote:
> I appreciate the feedback.  What are the advantages of putting the code in a
> Helper instead of the Controller?

None, unless your using it in more than one place. Do you plan to
read more than one feed, for example.

> When you mean 'cache' do you mean use something like the following? If not,
> how should it be coded?
> source ||= "http://#{city}.craigslist.org/#{category}/index.rss"

Cache the RSS data, in a local file or in your database. The point is
to only make the remote call periodically.


--
Greg Donald
destiney.com | gregdonald.com

Frank Kany

unread,
Feb 2, 2010, 12:16:07 PM2/2/10
to nashville-...@googlegroups.com
Yes, the app will be reading more than 1 feed and from different RSS providers.  I'll will tweak the code and put it in a Helper.
 
The '||=" operator should only read the feed if the 'source' variable isn't already populated.  But if this method isn't recommended over using a database, then I will convert the code to use a database.
 
I appreciate all your guys input.  I will put them to use.
 
Thanks,

Frank


--

Daniel Nelson

unread,
Feb 2, 2010, 12:39:51 PM2/2/10
to nashville-...@googlegroups.com
I do everything in a helper because my code caches the HTML rendered
from the RSS. Application_helper.rb contains a method named
rss_content(name), that is called by the view (this leaves the view
very clean). 'name' is used to query the database for cached RSS
content. It is also used to determine the URL for the RSS feed so that
I can update the feed from my admin panel.

When the cache is outdated, the RSS feed is read and parsed anew. It
is parsed into an array that is then passed into
render_content_from_rss_array. Render_content_from_rss_array is in the
controller specific helper, so we get a form of duck typing: the call
to render_content_from_rss_array is called from application_helper.rb,
but the method is defined in any of the other controller specific
helper files, enabling the custom rendering code to be kept in its
proper helper.

If the connection to the feed fails, or if parsing fails, then the
cached content is used, and the cache is updated so that we only try
again after the time delay elapses again (otherwise, we'd continue to
hit the problematic resource every time the page were displayed; a
cache isn't much good if it only saves your system resources when the
target resource is functioning properly).

Here is a gist of the RSS methods in my application_helper.rb:
http://gist.github.com/292834

Cheers,

Daniel

Frank Kany

unread,
Feb 2, 2010, 12:44:50 PM2/2/10
to nashville-...@googlegroups.com
That's awesome!  Thanks for sharing.
 
Frank


Cheers,

Daniel

Reply all
Reply to author
Forward
0 new messages