scylla 3.1.1, std::bad_alloc

418 views
Skip to first unread message

Michael

<micha-1@fantasymail.de>
unread,
Nov 29, 2019, 6:20:02 AM11/29/19
to scylladb-users@googlegroups.com
Hello,

I found the following in the log, many times:

exception during mutation write to 192.168.0.15: std::bad_alloc
Failed to apply mutation from .. : std::bad_alloc (std::bad_alloc)


The node has 64GB, 7TB hdd, there are 8 nodes, debian stretch, all equal
hardware.


Other nodes show:
Exeception when communicating with 192.168.0.15: std::runtime_error
(std::bad_alloc)



The node went down and tried to restart.
While restarting there were log messages about corrupt commitlog which
can not be replayed


After that I found the same message on other machines.

I have no idea whats going on here and why there are bad_allocs. Doesn't
this mean out of memory?

Thanks for helping
Michael




Glauber Costa

<glauber@scylladb.com>
unread,
Nov 29, 2019, 8:16:47 AM11/29/19
to ScyllaDB users
Yes, bad allocs means it is out of memory.
It's hard to say why. If you see this happening again, please extract a coredump during a time in which the node is having bad allocs.
We can analyze the core and see what was using that much memory. 

Thanks for helping
 Michael




--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/5d511727-83ad-8edd-fca5-a6b068d4e0a3%40fantasymail.de.

Michael

<micha-1@fantasymail.de>
unread,
Nov 29, 2019, 8:39:11 AM11/29/19
to scylladb-users@googlegroups.com, Glauber Costa


Am 29.11.19 um 14:16 schrieb Glauber Costa:
> Yes, bad allocs means it is out of memory.
> It's hard to say why. If you see this happening again, please extract a
> coredump during a time in which the node is having bad allocs.
> We can analyze the core and see what was using that much memory. 
>

now I don't know how to get the node started again. It comes to the
point where the commitlog_replayer starts, then after some time the
startup stops due to timeout. The node doesn't start anymore.

How to preceed from here?

Michael

Glauber Costa

<glauber@scylladb.com>
unread,
Nov 29, 2019, 8:56:14 AM11/29/19
to Termite Viewer, ScyllaDB users
If it gets killed on a timeout, likely it takes too long to start up on your HDD.
You can modify the timeout according to https://docs.scylladb.com/troubleshooting/scylla_wont_start/
 
 Michael

Michael

<micha-1@fantasymail.de>
unread,
Nov 29, 2019, 9:47:26 AM11/29/19
to scylladb-users@googlegroups.com, Glauber Costa
I modified the timeout, now it starts up.

I have read about the memory the in-memory components need. If this gets
too large, then bad_alloc can occur.

When I add the sizes of filter,summary and CompressionInfo files of all
tables I get the sum of

39714000000 bytes.

Is this too much with main memory of 64GB?


 Michael





Am 29.11.19 um 14:56 schrieb Glauber Costa:
> --
> You received this message because you are subscribed to the Google
> Groups "ScyllaDB users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to scylladb-user...@googlegroups.com
> <mailto:scylladb-user...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/scylladb-users/CAD-J%3DzZ3Qp7t3eebEVxGpRr8U7Zdi3sXYp6MzDCSkG2AhBDd%3DQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/scylladb-users/CAD-J%3DzZ3Qp7t3eebEVxGpRr8U7Zdi3sXYp6MzDCSkG2AhBDd%3DQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Glauber Costa

<glauber@scylladb.com>
unread,
Nov 29, 2019, 9:59:34 AM11/29/19
to Termite Viewer, ScyllaDB users
On Fri, Nov 29, 2019 at 9:47 AM Michael <mic...@fantasymail.de> wrote:
I modified the timeout, now it starts up.

I have read about the memory the in-memory components need. If this gets
too large, then bad_alloc can occur.

When I add the sizes of filter,summary and CompressionInfo files of all
tables I get the sum of

39714000000 bytes.

Is this too much with main memory of 64GB?

Yes. That's almost 40GB, so that means that most of your memory is used up by metadata files.
That means you have too much data for this amount of memory.

Michael

<micha-1@fantasymail.de>
unread,
Nov 29, 2019, 10:50:09 AM11/29/19
to Glauber Costa, ScyllaDB users

Am 29.11.19 um 15:59 schrieb Glauber Costa:
>
>
> On Fri, Nov 29, 2019 at 9:47 AM Michael <mic...@fantasymail.de
> <mailto:mic...@fantasymail.de>> wrote:
>
> 39714000000 bytes.
>
> Is this too much with main memory of 64GB?
>
>
> Yes. That's almost 40GB, so that means that most of your memory is
> used up by metadata files.
> That means you have too much data for this amount of memory.
>  
>
>

What's the amount of memory which is needed by scylla (after subtracting
the needed ram for the table files)?


 Michael


Glauber Costa

<glauber@scylladb.com>
unread,
Nov 29, 2019, 11:00:30 AM11/29/19
to Termite Viewer, ScyllaDB users
Depends on your storage amount. The recommendation is to keep yourself below 30:1 ratio.
For 64GB you should host at most 2TB, and ideally less than that due to HDD.
 


 Michael


Michael

<micha-1@fantasymail.de>
unread,
Dec 10, 2019, 5:55:07 AM12/10/19
to Glauber Costa, ScyllaDB users
Can I monitor this ram amount with nodetool or jmx call?


 Michael


Am 29.11.19 um 17:00 schrieb Glauber Costa:

Glauber Costa

<glauber@scylladb.com>
unread,
Dec 10, 2019, 10:28:59 AM12/10/19
to Termite Viewer, ScyllaDB users
It depends.

You can monitor this memory in our grafana dashboards from scylla-monitoring project. This is "non-LSA memory". (memory that is not movable)

If your problem is non-LSA exhaustion you will see it.
But if your problem is fragmentation, then you will not be able to see it.
Reply all
Reply to author
Forward
0 new messages