Problem with a geographical galera cluster, cannot make node join after shutdown.

262 views
Skip to first unread message

Mox

unread,
Mar 12, 2018, 8:46:05 AM3/12/18
to codership
Hello group,
I am trying to have a geo cluster working with 9 nodes in 3 different locations, each of them is in its segment (1,2,3)
After the first setup, the cluster was running with 9 nodes.


Due to change of locations we switched off 2 segments keeping the main one active (segment 1 nodes A,B and C) and currently only one segment with 3 nodes is running.

This is the version I am using
mysql  Ver 15.1 Distrib 10.1.31-MariaDB, for Linux (x86_64) using readline 5.1
wsrep_protocol_version       7                                                  
wsrep_provider_name         Galera
wsrep_provider_vendor       Codership Oy <in...@codership.com>                  
wsrep_provider_version      25.3.22(r3764)


For simplicity let's say we have 1.A, 1.B, 1.C, 2.A and so on. 
Now we are ready to restart the segment 2, but as soon as I try to start the node A of segment 2, it crashes.

This is my journalctl output

max_used_connections=0
max_threads=502
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 69394619 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
2018-03-12 11:50:23 7f33b2bfa700stack_bottom = 0x0 thread_stack 0x48400
InnoDB: Assertion failure in thread 139860018964224 in file log0recv.cc line 1172
InnoDB: Failing assertion: !page || (ibool)!!page_is_comp(page) == dict_table_is_comp(index->table)
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: about forcing recovery.
/usr/bin/galera_recovery: line 71:  2338 Aborted                 /usr/sbin/mysqld --user=mysql --wsrep_recover --disable-log-error'


On this node, we also have the grastate with inconsistent values

# GALERA saved state
version: 2.1
uuid:    00000000-0000-0000-0000-000000000000
seqno:   -1
safe_to_bootstrap: 0
pid-file=/var/run/mariadb/mariadb.pid


This is the my cnf file, the ip_1.A, and similar, are just a simplification of a real ip

[mysqld]
bind-address=0.0.0.0
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mariadb/mariadb.log
skip-name-resolve

open_files_limit = 8000
max_connections = 500
thread_cache_size=2000

interactive_timeout=360
wait_timeout=360

max_allowed_packet = 32M
key_buffer_size = 3000M
read_buffer_size = 1M
read_rnd_buffer_size = 4M
bulk_insert_buffer_size = 256M
myisam_sort_buffer_size = 16M
myisam_max_sort_file_size = 256M
myisam_repair_threads = 1
max_heap_table_size = 2048M
tmp_table_size = 1024M
table_open_cache = 2000
table_cache = 2000
sort_buffer_size = 128M
join_buffer_size = 128M
query_cache_size = 128M
query_cache_limit = 128M

[innodb]
innodb_file_per_table
innodb_flush_method=O_DIRECT
innodb_log_file_size=1G
innodb_buffer_pool_size=8G
innodb_buffer_pool_instance=2
innodb_additional_mem_pool_size=20M
innodb_log_buffer_size=64M
innodb_flush_log_at_trx_commit=1
innodb_thread_concurrency=16

[myisamchk]
key_buffer_size = 2500M
sort_buffer_size = 1024M
read_buffer = 32M
write_buffer = 32M

[mysqld_safe]
pid-file=/var/run/mariadb/mariadb.pid

[galera]
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://ip_1.A,ip_1.B,ip_1.C, ip_2.A,ip_2.B,ip_2.C, ip_3.A,ip_3.B,ip_3.C"
binlog_format=row
innodb_autoinc_lock_mode=2

wsrep_cluster_name="galera_mariadb_cluster_X"
wsrep_node_address="ip_2.A"
wsrep_node_name="cp2-node1"
wsrep_sst_method=rsync_wan
wsrep_slave_threads=1
wsrep_provider_options="gmcast.segment=2;gcache.size=1G;evs.keepalive_period=PT3S;evs.suspect_timeout=PT30S;evs.inactive_timeout=PT1M;evs.install_timeout=PT1M;"

#!includedir /etc/my.cnf.d

Last info, my log file is empty as the "service start mariadb" command crashes immediately.

Thanks for any suggestion or help.

Regards

Moreno


Arjen Lentz

unread,
Mar 13, 2018, 9:54:32 PM3/13/18
to Mox, codership-team
Hi Mox
Out of interest, how much actual RAM does the machine have?
The above indicates up to 69G might be required for the main global and per-connection buffers.


> Hope that's ok; if not, decrease some variables in the equation.
> Thread pointer: 0x0
> Attempting backtrace. You can use the following information to find out
> where mysqld died. If you see no messages after this, something went
> terribly wrong...
> 2018-03-12 11:50:23 7f33b2bfa700stack_bottom = 0x0 thread_stack 0x48400
> InnoDB: Assertion failure in thread 139860018964224 in file log0recv.cc
> line 1172
> InnoDB: Failing assertion: !page || (ibool)!!page_is_comp(page) ==
> dict_table_is_comp(index->table)
> InnoDB: We intentionally generate a memory trap.
> InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
> InnoDB: If you get repeated assertion failures or crashes, even
> InnoDB: immediately after the mysqld startup, there may be
> InnoDB: corruption in the InnoDB tablespace. Please refer to
> InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
> InnoDB: about forcing recovery.
> /usr/bin/galera_recovery: line 71: 2338 Aborted
> /usr/sbin/mysqld --user=mysql --wsrep_recover --disable-log-error'

Any crash (an assert is a programmed crash, a piece of checking code found that a variable wasn't what it should be) is always a bug.
It'd be best if you filed it as a bug report in the JIRA system. I would file it as a general MariaDB bug, as it's InnoDB that's doing the crashing (not wsrep/Galera).


> open_files_limit = 8000
> max_connections = 500
> thread_cache_size=2000

If your max_connections = 500, then the thread_cache_size has no need to be any larger than that.

> key_buffer_size = 3000M

Why would you have a MyISAM index buffer size of 3G on an InnoDB-only system (which you have to be for Galera).


> max_heap_table_size = 2048M
> tmp_table_size = 1024M

For server-created tmp tables, this means the maximum is 1G as it's bound to the lower of these two.
For user created tmp tables, this means 2G limit.


> table_open_cache = 2000
> table_cache = 2000

No need for these high numbers since you're an InnoDB only system.


> sort_buffer_size = 128M
> join_buffer_size = 128M

That's very big.
Just making something bigger does not make it better/faster.


> [innodb]
> innodb_file_per_table
> innodb_flush_method=O_DIRECT
> innodb_log_file_size=1G

You'd be better off having more smaller logfiles in the group.
The default is 2, so you will now have 2x1G.
You could do 8x256M which is the same space but potentially more effective to assist background page flushing.


> innodb_flush_log_at_trx_commit=1

This is the main reason I'm commenting on the config... why =1 ?
That's not generally how Galera works best, and depending on the version it won't work at all.


Is this system happy to start without wsrep enabled? Do please test that, because if it still crashes then, you'll know it's definitely not something wsrep/Galera triggers.


Regards,
Arjen.
--
Arjen Lentz, Exec.Director @ Open Query Pty Ltd
Your databases and online infrastructure. Managed.
Web https://openquery.com.au | Twitter @openquery
Reply all
Reply to author
Forward
0 new messages