Any Idea why my feedzirra daemon is killing himself? :-)

67 views
Skip to first unread message

StuFF mc

unread,
May 24, 2010, 4:39:41 AM5/24/10
to feed...@googlegroups.com
Here is the code: http://pastie.org/974267

It's also here: https://gist.github.com/b342c3ed638045d1bda1 - if
anyone wants to fork and modify.

I don't think I'm doing anything wrong in the code itself. For your
information I have 498 podcasts and currently 17961 episodes. I'm
pretty sure the script runs for a few days before somehow dying.

Anybody else experiencing this?

Cheers.

--
You received this message because you are subscribed to the Google Groups "feedzirra" group.
To post to this group, send email to feed...@googlegroups.com.
To unsubscribe from this group, send email to feedzirra+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/feedzirra?hl=en.

Samuel Lown

unread,
May 24, 2010, 11:48:59 AM5/24/10
to feed...@googlegroups.com
Hi,

Check that its not running out of memory. god is useful for to keeping things running.

You might also want to checkout spawn and fork the process before doing a loop and wait for it to complete. This will ensure your memory is release after each loop, and automatically handle the ActiveRecord database connection (which you may also be loosing after a period of time.)

Cheers,
sam

StuFF mc

unread,
May 24, 2010, 2:01:29 PM5/24/10
to feed...@googlegroups.com
thanks for the tips. I had thought about spawn & god but what's the
point of a daemon if the connection to Active Record goes away?! Also
how come the memory isn't freed. Can I call "something" in my loop to
"clean up"? Do I have to?

Cheers.

Samuel Lown

unread,
May 24, 2010, 2:50:29 PM5/24/10
to feed...@googlegroups.com
On 24 May 2010 20:01, StuFF mc <m...@stuffmc.com> wrote:
thanks for the tips. I had thought about spawn & god but what's the
point of a daemon if the connection to Active Record goes away?!

The DB can timeout a connection if its not being used or cut-out on network failure. This probably isn't a problem in your situation though. It would be if you were forking, as closing a fork will also close the DB connection.
 
Also
how come the memory isn't freed.

You can try some profiling techniques if you want to discover where your memory is leaking. You're reading lots of external data and passing it through several libraries, there could be tiny leaks anywhere.

I suspect most people give-up on this. Ruby makes programming easier at the cost of less-refined memory management, this is why god scripts exist to restart mongrel or unicorn processes to free-up trapped memory.

Can I call "something" in my loop to
"clean up"? Do I have to?

You can try forcing garbage collection, but I doubt that will solve much:

http://corelib.rubyonrails.org/classes/GC.html

Seriously, if you're handling lots of data in a daemon during a long period of time, fork it, process, clean.

Cheers,
sam

 

StuFF mc

unread,
May 26, 2010, 4:00:15 PM5/26/10
to feed...@googlegroups.com
Since I'm not sure you can add images to this group, it's also here:
http://emberapp.com/stuffmc/images/cpu-graph-rp

This graph shows how nuts the process is going (before to be killed I
guess) after the daemons has been running for about a day or 2.

I haven't take the time yet to look at how God can help me - the last
time I looked at God it wasn't trivial :(

But from the Feedzirra perspective (and thus, this is a question to
the main dev), where could be the problem?

cpu_graph RP.png

Paul Dix

unread,
May 26, 2010, 4:11:29 PM5/26/10
to feed...@googlegroups.com
Is the daemon failing with some sort of error? I assume that it's just
not failing quietly. The graph doesn't really do anything to help
troubleshoot. Is there a reason you're doing a require inside the
"add_entries" method? I don't think you want to be rerunning that
everytime you run it.

Log the output from that daemon.

Best,
Paul

StuFF mc

unread,
May 26, 2010, 4:20:46 PM5/26/10
to feed...@googlegroups.com
Thanks for answering so quickly Paul.

True, it's pretty dumb to have this require there. I remember having
tried to have it in my environment.rb but was unsuccessful. Anyways, I
moved it to the top of the source file (so outside the method) and
we'll see if it changes anything.

Concerning the logs, they are supposed to be in /log, right? There
nothing relevant there :(

Cheers.

StuFF mc

unread,
May 26, 2010, 5:04:53 PM5/26/10
to feed...@googlegroups.com
I just use the rubygem "daemons" and it's supposed to output in
log/name_of_the_daemon.rb.log but the last errors I had there were
from March.

Attached a chart that shows when I restarted the daemon. We'll see
over time if the require was the problem. Aside the daemon, would you
have a good tip to profile my "Episode.fetch_all"?

script/performance isn't much helpful i.m.h.o.

On Wed, May 26, 2010 at 10:59 PM, Paul Dix <pa...@pauldix.net> wrote:
> It depends on how/what you're using to daemonize the process that is
> running these updates. Whatever it is should have the ability to write
> STDOUT and STDERR to some log file. If you aren't specifying that then
> the process is probably just throwing an exception that isn't being
> captured by anything.
>
> Best,
> Paul

cpu_graph.png

Paul Dix

unread,
May 26, 2010, 5:12:50 PM5/26/10
to feed...@googlegroups.com
If it's a memory leak then digging down with memprof may be your best bet:
http://timetobleed.com/memprof-a-ruby-level-memory-profiler/

These types of errors are a real time suck. Hardest to track down. :(

Paul Dix

unread,
May 26, 2010, 4:59:56 PM5/26/10
to feed...@googlegroups.com
It depends on how/what you're using to daemonize the process that is
running these updates. Whatever it is should have the ability to write
STDOUT and STDERR to some log file. If you aren't specifying that then
the process is probably just throwing an exception that isn't being
captured by anything.

Best,
Paul

StuFF mc

unread,
May 31, 2010, 4:13:52 AM5/31/10
to feed...@googlegroups.com
Guys, discussing with a few folks while being at http://railswaycon.de
(right now) they told me using a Daemon wasn't the best way of doing
it. They recommend putting the process in a cron so it's started from
fresh everytime. The problem is that I'd like to "fetch" to fetch non
stop, this way being as synced as possible (right now worst case I
have the podcast 20 minutes after they are published).

I would need the "new/next" cron to only start when the previous is
finished. I was thinking to maybe program the next cron 1 minute after
the current is finished. Any Ideas/recommandations?

Samuel Lown

unread,
May 31, 2010, 4:43:08 AM5/31/10
to feed...@googlegroups.com
Hi,

Seriously, use spawn! Cron is not better than a daemon for things that need to be running 24/7. The only thing cron does differently is that it closes the app when it finishes releasing all its memory, then loads it up again. Think of all the CPU you're loosing as your code is re-loaded after each loop. Forking (with spawn) avoids this as it just copies itself in memory and starts running in a new process, when it finished that memory is then released allowing you to repeat efficiently.

I have a lot of experience with this. I run Planetaki.com which is currently fetching around 60,000 feeds constantly. Spawn is used to fork off multiple "feed reapers" at the same time, and Timeout is used to ensure bad feeds are stopped if they take to long to complete. (Incidentally, we'll be moving to FeedZirra for parsing this week!)

Sam

stuffmc.com

unread,
May 31, 2010, 5:05:47 AM5/31/10
to feed...@googlegroups.com
OK I will look into spawn then...

Thanks



Sent from my iPad

Paul Dix

unread,
May 31, 2010, 1:07:54 PM5/31/10
to feed...@googlegroups.com
agreed, cron is for things that get scheduled, not for things that
need to be running all the time. I haven't used spawn, but I can say
there are libraries out there to do this. Servolux
(http://github.com/TwP/servolux) is pretty good and includes stuff for
managing a pre-forking worker pool.

Best,
Paul

StuFF mc

unread,
Jun 9, 2010, 11:15:02 AM6/9/10
to feed...@googlegroups.com
Back on this thread.

Sadly like a lot of open source things, Servolux is pretty
undocumented. Watching at this: http://rdoc.info/projects/TwP/servolux
, will anyone have a code sample showing how I could call my
"fetch_episodes" from a worker?

I'm a bit confused.

ps: Obviously it still dies after a few days with the daemon.

StuFF mc

unread,
Jun 28, 2010, 9:28:05 AM6/28/10
to feed...@googlegroups.com
Dear Feedzirra guys,

I wanted to let you know everything works fine since I started using
the "delay" method from http://tobi.github.com/delayed_job/ instead of
using a daemon. It then spawns himself without any work from me.
Really, the only thing I do other than installing the plugin/gem is:

def self.fetch_all
Podcast.all.each do |podcast|
podcast.fetch_episodes
end
self.delay.fetch_all
end

then I call the first time Episode.delay.fetch_all from one way
(script/console) or another (capistrano) and boom!, it's running for
weeks without any problem.

Hope that helps anyone.

Reply all
Reply to author
Forward
0 new messages