[Warning] WSREP: std::bad_alloc

444 views
Skip to first unread message

danycaiti

unread,
Dec 21, 2012, 3:52:53 AM12/21/12
to codersh...@googlegroups.com
Hi everybody,
we have a configuration with mysql 5.5.28 ,23.7 wsrep and galera 2.2.

We have a problem when deleting millions of record. The program crash return error in log as:
ù

21219 18:21:43 [Warning] WSREP: std::bad_alloc                                                                                        

121219 18:21:43 [Warning] WSREP: Appending row key failed: delete from ...
ERROR 1030 (HY000) at line 1: Got error 122 from storage engine


The system has 4GB of RAM. 
This is the my.cnf configuration :

[client]
port            = 3306
socket          = /tmp/mysql.sock
[mysqld]
port            = 3306
socket          = /tmp/mysql.sock
skip-external-locking
key_buffer_size = 384M
max_allowed_packet = 1M
table_open_cache = 512
sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 8M
myisam_sort_buffer_size = 64M
thread_cache_size = 8
query_cache_size = 32M
thread_concurrency = 2
innodb_buffer_pool_size = 128M
innodb_additional_mem_pool_size = 20M
innodb_log_file_size = 100M
innodb_log_buffer_size = 8M
innodb_lock_wait_timeout = 50
innodb_file_per_table = 1
[mysqldump]
quick
max_allowed_packet = 16M
[mysql]
no-auto-rehash
[myisamchk]
key_buffer_size = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M
[mysqlhotcopy]
interactive-timeout

[mysqld]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
query_cache_size=0
query_cache_type=0
innodb_doublewrite = 1
innodb_flush_log_at_trx_commit=2
bind-address=0.0.0.0
wsrep_provider=/usr/lib64/libgalera_smm.so
wsrep_provider_options="gmcast.listen_addr=tcp://10.10.0.6:4567;evs.keepalive_period = PT2S; evs.inactive_check_period = PT7S; evs.suspect_timeout = PT15S; evs.inactive_timeout = PT30S; evs.consensus_timeout = PT30S"
wsrep_cluster_name="XXXX"
wsrep_cluster_address="gcomm://"
wsrep_node_address="10.10.0.6:3306"
wsrep_node_incoming_address="10.10.0.6"
wsrep_slave_threads=1
wsrep_certify_nonPK=1
wsrep_max_ws_rows=1048576
wsrep_max_ws_size=536870912
wsrep_debug=0
wsrep_convert_LOCK_to_trx=0
wsrep_retry_autocommit=1
wsrep_auto_increment_control=1
wsrep_drupal_282555_workaround=0
wsrep_causal_reads=0
wsrep_notify_cmd="/usr/local/bin/mail.sh"
wsrep_sst_method=mysqldump
wsrep_sst_auth=galera:XXX

Thank you,
Daniele

Alex Yurchenko

unread,
Dec 21, 2012, 4:10:17 AM12/21/12
to codersh...@googlegroups.com
Hi,

Yes, this can happen on such a delete... Do you have swap enabled? What
OS are you using?

However the server should not crash. Could you post more of an error
log - starting from 5 minutes before the crash and including the
stacktrace?

Thanks,
Alex
--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

danycaiti

unread,
Dec 21, 2012, 4:21:33 AM12/21/12
to codersh...@googlegroups.com
Thanks Alexey for answer me.

OS: SLES11 SP2

Yes I have swap enabled of 1GB:

# free
             total       used       free     shared    buffers     cached
Mem:       4057600    2934572    1123028          0      48400    2489160
-/+ buffers/cache:     397012    3660588
Swap:      1051644      16460    1035184

Below another piece of log:

121221  9:21:58 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 19602)
121221  9:21:58 [Note] WSREP: Synchronized with group, ready for connections
121221  9:29:07 [Warning] WSREP: std::bad_alloc
121221  9:29:07 [Warning] WSREP: Appending row key failed: delete from LOG_DOWNLOADS where timestamp > '2012-06-30' and timestamp < '2012-12-04', 5
121221  9:50:45 [ERROR] WSREP: io cache write problem: 9633792 32768
121221  9:50:45 [ERROR] WSREP: rbr write fail, data_len: 9633792, 1026
121221 10:02:28 [Warning] WSREP: std::bad_alloc
121221 10:02:28 [Warning] WSREP: Appending row key failed: delete from LOG_DOWNLOADS where timestamp > '2012-11-25' and timestamp < '2012-12-04', 5
121221 10:09:03 [Warning] WSREP: std::bad_alloc
121221 10:09:03 [Warning] WSREP: Appending row key failed: delete from LOG_DOWNLOADS where timestamp > '2012-11-25' and timestamp < '2012-12-04', 5
121221 10:09:34 [Note] /usr/sbin/mysqld: Normal shutdown


After bad alloc appear, the node is unresponsive and I need to kill -9 the mysql process.

For stacktrace I need a little time to produce it.

Thank you very much!
Daniele

danycaiti

unread,
Dec 21, 2012, 4:44:26 AM12/21/12
to codersh...@googlegroups.com
Sorry Alexey,
how can I do the stacktrace you need? Have you a suggestion to make that?
Thank you 


Il giorno venerdì 21 dicembre 2012 10:10:17 UTC+1, Alexey Yurchenko ha scritto:
including the
stacktrace?


Alex Yurchenko

unread,
Dec 21, 2012, 6:26:56 AM12/21/12
to codersh...@googlegroups.com
On 2012-12-21 11:21, danycaiti wrote:
> Thanks Alexey for answer me.
>
> OS: SLES11 SP2

Uhhh. I'm afraid it may be unsupported:
https://bugs.launchpad.net/galera/+bug/1071933

Are you using Galera specifically built for SLES, or Codership's RPMs?

In the former case, we have never tested Galera with GCC below 4.4 and
boost below 1.41. In particular boost was known to have some bugs at
lower versions. std::bad_alloc may also be a result of the combination
of old libstdc++ and boost.
In the latter case you may be seeing some binary incompatibility.

The thing is that you don't even get to the Galera memory wasting part
- you get an error at the writeset construction, most likely on key
insertion into std::map - and that looks very much like a system issue,
not Galera's.

In other words, while Galera probably can be made to work on SLES 11,
there are good chances that at the moment it does not do it correctly.
We strongly suggest that you use RHEL/CentOS for Galera.

> Yes I have swap enabled of 1GB:

That may be too little. I'm pretty certain that it is - if you're going
for millions of rows. But see above, you may never reach that stage when
you'll need it.
Well it does not look that unresponsive: there have been 40 minutes
since the first std::bad_alloc and it clearly precessed some queries. So
to be clear about that, what is exactly the symptom:

- the server does not shutdown when told to do so
- the server does not accept new connections
- the server does not accept new queries (well, according to log it
does).

Anyway, if you get your node unresponsive once again, could you

$ sudo gdb /usr/sbin/mysqld -p $(pidof mysqld) --batch -q -ex "thr
apply all bt" > bt.txt

and send it to us?

> For stacktrace I need a little time to produce it.

Ah, nevermind, from your first mail I understood that the server
crashed. Now it appears to be merely deadlocked (it is still a bug of
course).

Regards,
Alex

danycaiti

unread,
Dec 21, 2012, 8:46:46 AM12/21/12
to codersh...@googlegroups.com
Hi Alex,
we are using an our built for SLES of Galera and MySQL.
Now I'm trying the precompiled binary but I get the same result.

If I want to build in the right way for SLES, could you give me all the parameters and the version of compilers to use for?

Thank you,
Daniele

Alex Yurchenko

unread,
Dec 21, 2012, 2:22:37 PM12/21/12
to codersh...@googlegroups.com
On 2012-12-21 15:46, danycaiti wrote:
> Hi Alex,
> we are using an our built for SLES of Galera and MySQL.
> Now I'm trying the precompiled binary but I get the same result.
>
> If I want to build in the right way for SLES, could you give me all
> the
> parameters and the version of compilers to use for?
>
> Thank you,
> Daniele
>

Daniele,

There is not much to it. GCC >=4.4 and boost >=1.41. I'd suggest going
for the latest versions - and first install GCC and then compile boost
(although I think that GCC 4.3 won't compile the latests boost, so it
won't let you to do things in wrong order). You'll also need to make
sure that it builds and links with the latest libstdc++, so I'd remove
GCC 4.3 from the system completely after new GCC is installed. And then
you'll have to install that new libstdc++ on every node.

Let's hope that SLES has prebuilt RPMs for latests GCC and boost or it
will be a pain.

Regards,
Alex

Alex Yurchenko

unread,
Dec 21, 2012, 6:33:02 PM12/21/12
to codersh...@googlegroups.com
On 2012-12-21 11:21, danycaiti wrote:
> Thanks Alexey for answer me.
>
> OS: SLES11 SP2

BTW, is it a 32-bit system by any chance?

Alex Yurchenko

unread,
Jan 3, 2013, 4:55:44 AM1/3/13
to codersh...@googlegroups.com
Hi Ettore,

On 2013-01-02 18:25, Ettore Simone wrote:
> Hi Alex,
>
> I'm a colleague of Daniele. We would like to bring the power of the
> MySQL
> Galera cluster on the SUSE Enterprise distro. Unfortunately we are
> not so
> confortable on C++.
>
> Following your suggestion we upgraded all the GCC and C++ stack to
> 4.6 from
> the official SDK, then rebuilt boost 1.91 from scratch with g++ 4.6,
> Galera
> 23.2.2, and then MySQL 5.5.28 with wsrep 23.7.
>
> For MySQL we are usign the following directives to compile:
> # mysql_config
> --cflags [-I/usr/include/mysql -g -m64
> -fmessage-length=0
> -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables
> -fasynchronous-unwind-tables -DPIC -fPIC -DWITH_WSREP
> -DWSREP_PROC_INFO
> -DMYSQL_MAX_VARIABLE_VALUE_LEN=2048 -DWITH_INNODB_DISALLOW_WRITES -g]
> --include [-I/usr/include/mysql]
> --libs [-L/usr/lib64 -lmysqlclient -lpthread -lz
> -lm -lrt
> -lssl -lcrypto -ldl]
> --libs_r [-L/usr/lib64 -lmysqlclient_r -lpthread -lz
> -lm
> -lrt -lssl -lcrypto -ldl]
> --plugindir [/usr/lib64/mysql/plugin]
> --socket [/var/run/mysql/mysql.sock]
> --port [0]
> --version [5.5.28]
> --libmysqld-libs [-L/usr/lib64 -lmysqld]
> --variable=VAR VAR is one of:
> pkgincludedir [/usr/include/mysql]
> pkglibdir [/usr/lib64]
> plugindir [/usr/lib64/mysql/plugin]
>
> Even so, in some circumstance it ends in this kind of error:
> 130102 16:27:25 [Note] WSREP: Created page
> /var/lib/mysql/gcache.page.000000 of size 307188470 bytes
> 130102 16:27:55 [ERROR] WSREP: std::bad_alloc
> 130102 16:27:55 [ERROR] WSREP: unknown connection failure
>
> The SQL command was a delete of about 2M records
> with wsrep_max_ws_size=268435456. I suspect that in a delete request
> the
> write set limit is not correctly honored.

Yes, it isn't, but for this discussion it is not important. Your
writeset is well within reasonable bounds.

> Do you have any suggestion on how to investigate further?

From what I understand std::bad_alloc means an out of memory condition.
You can't do anything about it except fixing Galera code to use less
memory - which is far from trivial, or add more memory - namely swap
space. Set up a 10Gb swap file and see if this continues to happen.

Regards,
Alex

> Is there something similar to Electric Fence to isolate memory
> allocation
> failure within g++?
>
> Best regards,
> Ettore Simone
>
> On Friday, December 21, 2012 8:22:37 PM UTC+1, Alexey Yurchenko

Ettore Simone

unread,
Jan 3, 2013, 11:32:56 AM1/3/13
to codersh...@googlegroups.com
Hi Alex,

Thanks a lot for helping us. With a swap a bit greater then RAM all is working fine.

We tested many combination of SQL command over it and it seem very stable, even with million records deletion. Thank you for the great work.

If it could help for anyone that use openSUSE and SUSE Linux Enterprise systems, here are our repository of precompiled and source rpm packages:

For now is only MySQL 5.5.28 with wsrep 23.7 and Galera 23.2.2 r138 for openSUSE 12.2 and SLES11 SP2.

Best regards,
Ettore Simone
Reply all
Reply to author
Forward
0 new messages