Huge memory leak?

44 views
Skip to first unread message

petter....@sannsyn.com

unread,
Dec 8, 2017, 10:24:39 AM12/8/17
to DataStax Ruby Driver for Apache Cassandra User Mailing List
Hi

I am having trouble with a memory leak. I use ruby 2.3 and the latest drivers, Cassandra 3.11, Linux 64-bit.

It is fairly easy to reproduce.

Querying any large table seems to eat a lot of memory in the size 10-20GB in some minutes. The memory taken corresponds with the number of rows that is in the result of the query. 

The case can be reproduced easily, using the example from this page: http://docs.datastax.com/en/developer/ruby-driver/3.2/ and a query which results in some milion rows, in my case 3-4 millon (each rows contains very little data, no blobs or lists). Watch memory usage grow using for example 'top'.

The problem seems to be that the driver keeps a copy of all the rows while iterating over them? I have tried all combinations of sync/async/prepared/nonprepared queries, but without luck.

This problem does not seem to appear while using the java-driver - in this case memory usage is small and constant.

Any clues would be so welcome!

Cheers,

Petter Egesund

-- reproduce --

future = session.execute_async('SELECT * FROM any_large_table') 
future.on_success do |rows|
  rows.each do |row|
  end
end
future.join




Bulat Shakirzyanov

unread,
Dec 8, 2017, 10:56:07 AM12/8/17
to ruby-dri...@lists.datastax.com
Can you share you Cluster configuration, from the code you share it seems that you're using "page_size: nil" which means that your result set will contain all 3-4 millions of rows. Instead you want to set page size to some reasonable value, we use 10000 by default and iterate results using paging - http://docs.datastax.com/en/developer/ruby-driver/3.2/features/basics/result_paging/

I hope this helps.

--
You received this message because you are subscribed to the Google Groups "DataStax Ruby Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ruby-driver-user+unsubscribe@lists.datastax.com.



--
Cheers,
Bulat Shakirzyanov | @avalanche123

petter....@sannsyn.com

unread,
Dec 8, 2017, 11:06:55 AM12/8/17
to DataStax Ruby Driver for Apache Cassandra User Mailing List
Hi and thanks for answering!

Setting page_size does not matter, I have tried different values. It speeds up the query a little when it is large, but memory consumption is the same.

Petter

petter....@sannsyn.com

unread,
Dec 8, 2017, 11:11:20 AM12/8/17
to DataStax Ruby Driver for Apache Cassandra User Mailing List
The problem can be reproduced using this code as well:

result  = session.execute("SELECT * FROM test", page_size: 5)

loop do
  puts "last page? #{result.last_page?}"
  puts "page size: #{result.size}"

  result.each do |row|
    puts row
  end
  puts ""

  break if result.last_page?
  result = result.next_page
end

Bulat Shakirzyanov

unread,
Dec 8, 2017, 11:26:24 AM12/8/17
to ruby-dri...@lists.datastax.com
Hmm, looks like a leak is indeed possible, thanks for reporting this!

Can you please open a jira issue for this? That way it won't get lost and someone will get to it.
Make sure to provide as much details as possible in the ticket - ruby/driver/cassandra versions, how you're observing the leak, etc. sample data and schema might help as well.
Finally, feel free to keep investigating to speed up the resolution.

Cheers,

--
You received this message because you are subscribed to the Google Groups "DataStax Ruby Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ruby-driver-user+unsubscribe@lists.datastax.com.

petter....@sannsyn.com

unread,
Dec 8, 2017, 12:12:20 PM12/8/17
to DataStax Ruby Driver for Apache Cassandra User Mailing List

Bulat Shakirzyanov

unread,
Dec 8, 2017, 12:14:26 PM12/8/17
to ruby-dri...@lists.datastax.com
Perfect, thank you!

On Fri, Dec 8, 2017 at 12:12 PM, <petter....@sannsyn.com> wrote:

--
You received this message because you are subscribed to the Google Groups "DataStax Ruby Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ruby-driver-user+unsubscribe@lists.datastax.com.

Petter Egesund

unread,
Dec 8, 2017, 12:20:10 PM12/8/17
to ruby-dri...@lists.datastax.com
Thanks to you as well :-) Really cool to open source stuff like this!
Reply all
Reply to author
Forward
0 new messages