High memory usage

71 views
Skip to first unread message

Shashwat Kumar

unread,
Feb 18, 2014, 7:18:03 AM2/18/14
to typh...@googlegroups.com
Hi guys

Sorry but i am new to typhoeus and ran into a small issue while trying to write a crawler. The memory usage seems to keep growing and my script is getting killed by running out of memory (on a 512mb ram digitalocean server). I tried testing this out by using the gem allocation_stats for ruby 2.1 and noticed that Typhoeus is creating a very large number of objects even when i have queued just 100 links. Is this normal behaviour or am i doing something wrong?


Thanks in advance/

Ryan Dewhurst

unread,
Feb 18, 2014, 9:06:51 AM2/18/14
to typh...@googlegroups.com
Hi Shahwat,

What I did was set a 'max_queue_size', once this limit (5000) has been hit I call hydra.run.

For extra measure, I reset the hydra instance variable and create a new hydra instance after hydra.run has been called.

For me this seems to have had a positive affect on memory usage.

Maybe others have some suggestions too?

Ryan




--
You received this message because you are subscribed to the Google Groups "Typhoeus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to typhoeus+u...@googlegroups.com.
To post to this group, send email to typh...@googlegroups.com.
Visit this group at http://groups.google.com/group/typhoeus.
For more options, visit https://groups.google.com/groups/opt_out.

Tasos Laskos

unread,
Feb 18, 2014, 9:20:36 AM2/18/14
to typh...@googlegroups.com
I did the same thing.

Also, make sure you don't hold on to any Request/Response objects, consume them
as soon as possible because they are quite heavy and once accumulated they could
amount to a lot of consumed memory.

Cheers

Denis Lamotte

unread,
Feb 19, 2014, 5:30:13 AM2/19/14
to typh...@googlegroups.com
disable memoization and if you need one, use an external hash because the actual memoization store the request object.

Typhoeus.configure do |config|
      config.verbose = false
      config.memoize = false
 end

I use typhoeus to crawl website, with only one @hydra.run, and i already crawl website of 30 millions of pages once so it's possible even if it became very slow :-) 

Hans Hasselberg

unread,
Feb 19, 2014, 8:16:57 AM2/19/14
to typh...@googlegroups.com
Hallo Shahwat, 

thanks for reporting. The issue you're having is not normal. Which version of libcurl are you using? Creating a new Hydra once in a while can help, but shouldn't be necessary. 
Btw you don't need to disable memoization, it is not turned on per default. It used to be some time ago, but it is no longer the case.
Reply all
Reply to author
Forward
0 new messages