High memory usage

Shashwat Kumar

unread,

Feb 18, 2014, 7:18:03 AM2/18/14

to typh...@googlegroups.com

Hi guys

Sorry but i am new to typhoeus and ran into a small issue while trying to write a crawler. The memory usage seems to keep growing and my script is getting killed by running out of memory (on a 512mb ram digitalocean server). I tried testing this out by using the gem allocation_stats for ruby 2.1 and noticed that Typhoeus is creating a very large number of objects even when i have queued just 100 links. Is this normal behaviour or am i doing something wrong?

I have put my code here https://gist.github.com/thegreatshasha/9069743

Thanks in advance/

Ryan Dewhurst

unread,

Feb 18, 2014, 9:06:51 AM2/18/14

to typh...@googlegroups.com

Hi Shahwat,

What I did was set a 'max_queue_size', once this limit (5000) has been hit I call hydra.run.

For extra measure, I reset the hydra instance variable and create a new hydra instance after hydra.run has been called.

For me this seems to have had a positive affect on memory usage.

Maybe others have some suggestions too?

Ryan

--
You received this message because you are subscribed to the Google Groups "Typhoeus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to typhoeus+u...@googlegroups.com.
To post to this group, send email to typh...@googlegroups.com.
Visit this group at http://groups.google.com/group/typhoeus.
For more options, visit https://groups.google.com/groups/opt_out.

Tasos Laskos

unread,

Feb 18, 2014, 9:20:36 AM2/18/14

to typh...@googlegroups.com

I did the same thing.

Also, make sure you don't hold on to any Request/Response objects, consume them
as soon as possible because they are quite heavy and once accumulated they could
amount to a lot of consumed memory.

Cheers

Denis Lamotte

unread,

Feb 19, 2014, 5:30:13 AM2/19/14

to typh...@googlegroups.com

disable memoization and if you need one, use an external hash because the actual memoization store the request object.

Typhoeus.configure do |config|

config.verbose = false

config.memoize = false

end

I use typhoeus to crawl website, with only one @hydra.run, and i already crawl website of 30 millions of pages once so it's possible even if it became very slow :-)

Hans Hasselberg

unread,

Feb 19, 2014, 8:16:57 AM2/19/14

to typh...@googlegroups.com

Hallo Shahwat,

thanks for reporting. The issue you're having is not normal. Which version of libcurl are you using? Creating a new Hydra once in a while can help, but shouldn't be necessary.

Btw you don't need to disable memoization, it is not turned on per default. It used to be some time ago, but it is no longer the case.

Reply all

Reply to author

Forward