[Warning] WSREP: std::bad

danycaiti

unread,

Dec 21, 2012, 3:52:53 AM12/21/12

to codersh...@googlegroups.com

Hi everybody,

we have a configuration with mysql 5.5.28 ,23.7 wsrep and galera 2.2.

We have a problem when deleting millions of record. The program crash return error in log as:

ù

21219 18:21:43 [Warning] WSREP: std::bad_alloc                                                                                         
121219 18:21:43 [Warning] WSREP: Appending row key failed: delete from ...
ERROR 1030 (HY000) at line 1: Got error 122 from storage engine

The system has 4GB of RAM.

This is the my.cnf configuration :

[client]

port = 3306

socket = /tmp/mysql.sock

[mysqld]

port = 3306

socket = /tmp/mysql.sock

skip-external-locking

key_buffer_size = 384M

max_allowed_packet = 1M

table_open_cache = 512

sort_buffer_size = 2M

read_buffer_size = 2M

read_rnd_buffer_size = 8M

myisam_sort_buffer_size = 64M

thread_cache_size = 8

query_cache_size = 32M

thread_concurrency = 2

innodb_buffer_pool_size = 128M

innodb_additional_mem_pool_size = 20M

innodb_log_file_size = 100M

innodb_log_buffer_size = 8M

innodb_lock_wait_timeout = 50

innodb_file_per_table = 1

[mysqldump]

quick

max_allowed_packet = 16M

[mysql]

no-auto-rehash

[myisamchk]

key_buffer_size = 256M

sort_buffer_size = 256M

read_buffer = 2M

write_buffer = 2M

[mysqlhotcopy]

interactive-timeout

[mysqld]

binlog_format=ROW

default-storage-engine=innodb

innodb_autoinc_lock_mode=2

innodb_locks_unsafe_for_binlog=1

query_cache_size=0

query_cache_type=0

innodb_doublewrite = 1

innodb_flush_log_at_trx_commit=2

bind-address=0.0.0.0

wsrep_provider=/usr/lib64/libgalera_smm.so

wsrep_provider_options="gmcast.listen_addr=tcp://10.10.0.6:4567;evs.keepalive_period = PT2S; evs.inactive_check_period = PT7S; evs.suspect_timeout = PT15S; evs.inactive_timeout = PT30S; evs.consensus_timeout = PT30S"

wsrep_cluster_name="XXXX"

wsrep_cluster_address="gcomm://"

wsrep_node_address="10.10.0.6:3306"

wsrep_node_incoming_address="10.10.0.6"

wsrep_slave_threads=1

wsrep_certify_nonPK=1

wsrep_max_ws_rows=1048576

wsrep_max_ws_size=536870912

wsrep_debug=0

wsrep_convert_LOCK_to_trx=0

wsrep_retry_autocommit=1

wsrep_auto_increment_control=1

wsrep_drupal_282555_workaround=0

wsrep_causal_reads=0

wsrep_notify_cmd="/usr/local/bin/mail.sh"

wsrep_sst_method=mysqldump

wsrep_sst_auth=galera:XXX

Thank you,

Daniele

Alex Yurchenko

unread,

Dec 21, 2012, 4:10:17 AM12/21/12

to codersh...@googlegroups.com

Hi,

Yes, this can happen on such a delete... Do you have swap enabled? What
OS are you using?

However the server should not crash. Could you post more of an error
log - starting from 5 minutes before the crash and including the
stacktrace?

Thanks,
Alex

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

danycaiti

unread,

Dec 21, 2012, 4:21:33 AM12/21/12

to codersh...@googlegroups.com

Thanks Alexey for answer me.

OS: SLES11 SP2

Yes I have swap enabled of 1GB:

# free

total used free shared buffers cached

Mem: 4057600 2934572 1123028 0 48400 2489160

-/+ buffers/cache: 397012 3660588

Swap: 1051644 16460 1035184

Below another piece of log:

121221  9:21:58 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 19602)
121221  9:21:58 [Note] WSREP: Synchronized with group, ready for connections
121221  9:29:07 [Warning] WSREP: std::bad_alloc
121221  9:29:07 [Warning] WSREP: Appending row key failed: delete from LOG_DOWNLOADS where timestamp > '2012-06-30' and timestamp < '2012-12-04', 5
121221  9:50:45 [ERROR] WSREP: io cache write problem: 9633792 32768
121221  9:50:45 [ERROR] WSREP: rbr write fail, data_len: 9633792, 1026
121221 10:02:28 [Warning] WSREP: std::bad_alloc
121221 10:02:28 [Warning] WSREP: Appending row key failed: delete from LOG_DOWNLOADS where timestamp > '2012-11-25' and timestamp < '2012-12-04', 5
121221 10:09:03 [Warning] WSREP: std::bad_alloc
121221 10:09:03 [Warning] WSREP: Appending row key failed: delete from LOG_DOWNLOADS where timestamp > '2012-11-25' and timestamp < '2012-12-04', 5
121221 10:09:34 [Note] /usr/sbin/mysqld: Normal shutdown

After bad alloc appear, the node is unresponsive and I need to kill -9 the mysql process.

For stacktrace I need a little time to produce it.

Thank you very much!

Daniele

danycaiti

unread,

Dec 21, 2012, 4:44:26 AM12/21/12

to codersh...@googlegroups.com

Sorry Alexey,

how can I do the stacktrace you need? Have you a suggestion to make that?

Thank you

Il giorno venerdì 21 dicembre 2012 10:10:17 UTC+1, Alexey Yurchenko ha scritto:

including the
stacktrace?

Alex Yurchenko

unread,

Dec 21, 2012, 6:26:56 AM12/21/12

to codersh...@googlegroups.com

On 2012-12-21 11:21, danycaiti wrote:
> Thanks Alexey for answer me.
>
> OS: SLES11 SP2

Uhhh. I'm afraid it may be unsupported:
https://bugs.launchpad.net/galera/+bug/1071933

Are you using Galera specifically built for SLES, or Codership's RPMs?

In the former case, we have never tested Galera with GCC below 4.4 and
boost below 1.41. In particular boost was known to have some bugs at
lower versions. std::bad_alloc may also be a result of the combination
of old libstdc++ and boost.
In the latter case you may be seeing some binary incompatibility.

The thing is that you don't even get to the Galera memory wasting part
- you get an error at the writeset construction, most likely on key
insertion into std::map - and that looks very much like a system issue,
not Galera's.

In other words, while Galera probably can be made to work on SLES 11,
there are good chances that at the moment it does not do it correctly.
We strongly suggest that you use RHEL/CentOS for Galera.

> Yes I have swap enabled of 1GB:

That may be too little. I'm pretty certain that it is - if you're going
for millions of rows. But see above, you may never reach that stage when
you'll need it.

Well it does not look that unresponsive: there have been 40 minutes
since the first std::bad_alloc and it clearly precessed some queries. So
to be clear about that, what is exactly the symptom:

- the server does not shutdown when told to do so
- the server does not accept new connections
- the server does not accept new queries (well, according to log it
does).

Anyway, if you get your node unresponsive once again, could you

$ sudo gdb /usr/sbin/mysqld -p $(pidof mysqld) --batch -q -ex "thr
apply all bt" > bt.txt

and send it to us?

> For stacktrace I need a little time to produce it.

Ah, nevermind, from your first mail I understood that the server
crashed. Now it appears to be merely deadlocked (it is still a bug of
course).

Regards,
Alex

danycaiti

unread,

Dec 21, 2012, 8:46:46 AM12/21/12

to codersh...@googlegroups.com

Hi Alex,

we are using an our built for SLES of Galera and MySQL.

Now I'm trying the precompiled binary but I get the same result.

If I want to build in the right way for SLES, could you give me all the parameters and the version of compilers to use for?

Thank you,

Daniele

Alex Yurchenko

unread,

Dec 21, 2012, 2:22:37 PM12/21/12

to codersh...@googlegroups.com

On 2012-12-21 15:46, danycaiti wrote:
> Hi Alex,
> we are using an our built for SLES of Galera and MySQL.
> Now I'm trying the precompiled binary but I get the same result.
>
> If I want to build in the right way for SLES, could you give me all
> the
> parameters and the version of compilers to use for?
>
> Thank you,
> Daniele
>

Daniele,

There is not much to it. GCC >=4.4 and boost >=1.41. I'd suggest going
for the latest versions - and first install GCC and then compile boost
(although I think that GCC 4.3 won't compile the latests boost, so it
won't let you to do things in wrong order). You'll also need to make
sure that it builds and links with the latest libstdc++, so I'd remove
GCC 4.3 from the system completely after new GCC is installed. And then
you'll have to install that new libstdc++ on every node.

Let's hope that SLES has prebuilt RPMs for latests GCC and boost or it
will be a pain.

Regards,
Alex

Alex Yurchenko

unread,

Dec 21, 2012, 6:33:02 PM12/21/12

to codersh...@googlegroups.com

On 2012-12-21 11:21, danycaiti wrote:

> Thanks Alexey for answer me.
>
> OS: SLES11 SP2

BTW, is it a 32-bit system by any chance?

Alex Yurchenko

unread,

Jan 3, 2013, 4:55:44 AM1/3/13

to codersh...@googlegroups.com

Hi Ettore,

On 2013-01-02 18:25, Ettore Simone wrote:
> Hi Alex,
>
> I'm a colleague of Daniele. We would like to bring the power of the
> MySQL
> Galera cluster on the SUSE Enterprise distro. Unfortunately we are
> not so
> confortable on C++.
>
> Following your suggestion we upgraded all the GCC and C++ stack to
> 4.6 from
> the official SDK, then rebuilt boost 1.91 from scratch with g++ 4.6,
> Galera
> 23.2.2, and then MySQL 5.5.28 with wsrep 23.7.
>
> For MySQL we are usign the following directives to compile:
> # mysql_config
> --cflags [-I/usr/include/mysql -g -m64
> -fmessage-length=0
> -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables
> -fasynchronous-unwind-tables -DPIC -fPIC -DWITH_WSREP
> -DWSREP_PROC_INFO
> -DMYSQL_MAX_VARIABLE_VALUE_LEN=2048 -DWITH_INNODB_DISALLOW_WRITES -g]
> --include [-I/usr/include/mysql]
> --libs [-L/usr/lib64 -lmysqlclient -lpthread -lz
> -lm -lrt
> -lssl -lcrypto -ldl]
> --libs_r [-L/usr/lib64 -lmysqlclient_r -lpthread -lz
> -lm
> -lrt -lssl -lcrypto -ldl]
> --plugindir [/usr/lib64/mysql/plugin]
> --socket [/var/run/mysql/mysql.sock]
> --port [0]
> --version [5.5.28]
> --libmysqld-libs [-L/usr/lib64 -lmysqld]
> --variable=VAR VAR is one of:
> pkgincludedir [/usr/include/mysql]
> pkglibdir [/usr/lib64]
> plugindir [/usr/lib64/mysql/plugin]
>
> Even so, in some circumstance it ends in this kind of error:
> 130102 16:27:25 [Note] WSREP: Created page
> /var/lib/mysql/gcache.page.000000 of size 307188470 bytes
> 130102 16:27:55 [ERROR] WSREP: std::bad_alloc
> 130102 16:27:55 [ERROR] WSREP: unknown connection failure
>
> The SQL command was a delete of about 2M records
> with wsrep_max_ws_size=268435456. I suspect that in a delete request
> the
> write set limit is not correctly honored.

Yes, it isn't, but for this discussion it is not important. Your
writeset is well within reasonable bounds.

> Do you have any suggestion on how to investigate further?

From what I understand std::bad_alloc means an out of memory condition.
You can't do anything about it except fixing Galera code to use less
memory - which is far from trivial, or add more memory - namely swap
space. Set up a 10Gb swap file and see if this continues to happen.

Regards,
Alex

> Is there something similar to Electric Fence to isolate memory
> allocation
> failure within g++?
>
> Best regards,
> Ettore Simone
>
> On Friday, December 21, 2012 8:22:37 PM UTC+1, Alexey Yurchenko

Ettore Simone

unread,

Jan 3, 2013, 11:32:56 AM1/3/13

to codersh...@googlegroups.com

Hi Alex,

Thanks a lot for helping us. With a swap a bit greater then RAM all is working fine.

We tested many combination of SQL command over it and it seem very stable, even with million records deletion. Thank you for the great work.

If it could help for anyone that use openSUSE and SUSE Linux Enterprise systems, here are our repository of precompiled and source rpm packages:

http://software.opensuse.org/search?q=galera&baseproject=ALL

For now is only MySQL 5.5.28 with wsrep 23.7 and Galera 23.2.2 r138 for openSUSE 12.2 and SLES11 SP2.