Eratic behavior when using :if_modified_since

33 views
Skip to first unread message

Keith Hunniford

unread,
Jan 10, 2013, 7:24:59 PM1/10/13
to feed...@googlegroups.com
Firstly thanks for all your hard work!

As step 1 to doing a better job of limiting injestion, I'm using :if_modified_since and seeing very random behavior.  I wanted to know if this is likely due to the RSS feeds I'm calling or something within feedzirra.

Even if I do :if_modified_since => Time.now

I still have a 50/50 chance of getting a 304.

Thanks in advance

Keith

Paul Dix

unread,
Jan 10, 2013, 9:48:48 PM1/10/13
to feed...@googlegroups.com
I would expect that to be a problem on the RSS server side. Not everyone bothers to set the header. Which feed is it?

Best,
Paul


Keith

--
You received this message because you are subscribed to the Google Groups "feedzirra" group.
To view this discussion on the web visit https://groups.google.com/d/msg/feedzirra/-/0Vho4LDUSsgJ.
To post to this group, send email to feed...@googlegroups.com.
To unsubscribe from this group, send email to feedzirra+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/feedzirra?hl=en.

Keith Hunniford

unread,
Jan 11, 2013, 12:40:43 PM1/11/13
to feed...@googlegroups.com
3 different feeds on 3 different domains of the same industry ( that I don't really want to expose that I'm interested in on a Google Group ;)  I also think it is the servers.. In console I can initiate the same call every 10 seconds on a non updated feed and have a 50/50 chance of getting a 304.

What I've ended up doing is creating a method to get the headers and then make my own decisions based on what I find.  As it might be generically useful, here's the code snippet for others who come to this group experiencing the same quirkiness.. 

How to retrieve a pages headers with net/http including proxy

 def self.get_head_meta(url, port=80)
    
    non_http_url = url.gsub('http://','')
    host = non_http_url.split('/')[0]
    path =  non_http_url.gsub(host,'')
    proxy_host=[proxy NOT INCLUDING http://]
    proxy_port=[integer]

    response = nil
    Net::HTTP::Proxy(proxy_host, proxy_port).start(host,port) {|http|
      response=http.head(path)
    }
    return response.to_hash

  end

The RSS feeds are so huge I can totally justify doing a call out to this and eat the extra call. I think next step might be trying to use Net:HTTP to retrieve 5k of a 1Mb file (ie the top) which I've seen one example of out there.

Thanks again for Feedzirra. It's making life easier for sure.

Keith
Reply all
Reply to author
Forward
0 new messages