Problem with a geographical galera cluster, cannot make node join after shutdown.

262 views

Skip to first unread message

Mox

unread,

Mar 12, 2018, 8:46:05 AM3/12/18

to codership

Hello group,

I am trying to have a geo cluster working with 9 nodes in 3 different locations, each of them is in its segment (1,2,3)

After the first setup, the cluster was running with 9 nodes.

Due to change of locations we switched off 2 segments keeping the main one active (segment 1 nodes A,B and C) and currently only one segment with 3 nodes is running.

This is the version I am using

mysql Ver 15.1 Distrib 10.1.31-MariaDB, for Linux (x86_64) using readline 5.1

wsrep_protocol_version 7

wsrep_provider_name Galera

wsrep_provider_vendor Codership Oy <in...@codership.com>

wsrep_provider_version 25.3.22(r3764)

For simplicity let's say we have 1.A, 1.B, 1.C, 2.A and so on.

Now we are ready to restart the segment 2, but as soon as I try to start the node A of segment 2, it crashes.

This is my journalctl output

max_used_connections=0

max_threads=502

thread_count=0

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 69394619 K bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

2018-03-12 11:50:23 7f33b2bfa700stack_bottom = 0x0 thread_stack 0x48400

InnoDB: Assertion failure in thread 139860018964224 in file log0recv.cc line 1172

InnoDB: Failing assertion: !page || (ibool)!!page_is_comp(page) == dict_table_is_comp(index->table)

InnoDB: We intentionally generate a memory trap.

InnoDB: Submit a detailed bug report to https://jira.mariadb.org/

InnoDB: If you get repeated assertion failures or crashes, even

InnoDB: immediately after the mysqld startup, there may be

InnoDB: corruption in the InnoDB tablespace. Please refer to

InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html

InnoDB: about forcing recovery.

/usr/bin/galera_recovery: line 71: 2338 Aborted /usr/sbin/mysqld --user=mysql --wsrep_recover --disable-log-error'

On this node, we also have the grastate with inconsistent values

# GALERA saved state

version: 2.1

uuid: 00000000-0000-0000-0000-000000000000

seqno: -1

safe_to_bootstrap: 0

pid-file=/var/run/mariadb/mariadb.pid

This is the my cnf file, the ip_1.A, and similar, are just a simplification of a real ip

[mysqld]

bind-address=0.0.0.0

datadir=/var/lib/mysql

socket=/var/lib/mysql/mysql.sock

log-error=/var/log/mariadb/mariadb.log

skip-name-resolve

open_files_limit = 8000

max_connections = 500

thread_cache_size=2000

interactive_timeout=360

wait_timeout=360

max_allowed_packet = 32M

key_buffer_size = 3000M

read_buffer_size = 1M

read_rnd_buffer_size = 4M

bulk_insert_buffer_size = 256M

myisam_sort_buffer_size = 16M

myisam_max_sort_file_size = 256M

myisam_repair_threads = 1

max_heap_table_size = 2048M

tmp_table_size = 1024M

table_open_cache = 2000

table_cache = 2000

sort_buffer_size = 128M

join_buffer_size = 128M

query_cache_size = 128M

query_cache_limit = 128M

[innodb]

innodb_file_per_table

innodb_flush_method=O_DIRECT

innodb_log_file_size=1G

innodb_buffer_pool_size=8G

innodb_buffer_pool_instance=2

innodb_additional_mem_pool_size=20M

innodb_log_buffer_size=64M

innodb_flush_log_at_trx_commit=1

innodb_thread_concurrency=16

[myisamchk]

key_buffer_size = 2500M

sort_buffer_size = 1024M

read_buffer = 32M

write_buffer = 32M

[mysqld_safe]

pid-file=/var/run/mariadb/mariadb.pid

[galera]

wsrep_on=ON

wsrep_provider=/usr/lib64/galera/libgalera_smm.so

wsrep_cluster_address="gcomm://ip_1.A,ip_1.B,ip_1.C, ip_2.A,ip_2.B,ip_2.C, ip_3.A,ip_3.B,ip_3.C"

binlog_format=row

innodb_autoinc_lock_mode=2

wsrep_cluster_name="galera_mariadb_cluster_X"

wsrep_node_address="ip_2.A"

wsrep_node_name="cp2-node1"

wsrep_sst_method=rsync_wan

wsrep_slave_threads=1

wsrep_provider_options="gmcast.segment=2;gcache.size=1G;evs.keepalive_period=PT3S;evs.suspect_timeout=PT30S;evs.inactive_timeout=PT1M;evs.install_timeout=PT1M;"

#!includedir /etc/my.cnf.d

Last info, my log file is empty as the "service start mariadb" command crashes immediately.

Thanks for any suggestion or help.

Regards

Moreno

Arjen Lentz

unread,

Mar 13, 2018, 9:54:32 PM3/13/18

to Mox, codership-team

Hi Mox

Out of interest, how much actual RAM does the machine have?
The above indicates up to 69G might be required for the main global and per-connection buffers.

> Hope that's ok; if not, decrease some variables in the equation.
> Thread pointer: 0x0
> Attempting backtrace. You can use the following information to find out
> where mysqld died. If you see no messages after this, something went
> terribly wrong...
> 2018-03-12 11:50:23 7f33b2bfa700stack_bottom = 0x0 thread_stack 0x48400
> InnoDB: Assertion failure in thread 139860018964224 in file log0recv.cc
> line 1172
> InnoDB: Failing assertion: !page || (ibool)!!page_is_comp(page) ==
> dict_table_is_comp(index->table)
> InnoDB: We intentionally generate a memory trap.
> InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
> InnoDB: If you get repeated assertion failures or crashes, even
> InnoDB: immediately after the mysqld startup, there may be
> InnoDB: corruption in the InnoDB tablespace. Please refer to
> InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
> InnoDB: about forcing recovery.
> /usr/bin/galera_recovery: line 71: 2338 Aborted
> /usr/sbin/mysqld --user=mysql --wsrep_recover --disable-log-error'

Any crash (an assert is a programmed crash, a piece of checking code found that a variable wasn't what it should be) is always a bug.
It'd be best if you filed it as a bug report in the JIRA system. I would file it as a general MariaDB bug, as it's InnoDB that's doing the crashing (not wsrep/Galera).

> open_files_limit = 8000
> max_connections = 500
> thread_cache_size=2000

If your max_connections = 500, then the thread_cache_size has no need to be any larger than that.

> key_buffer_size = 3000M

Why would you have a MyISAM index buffer size of 3G on an InnoDB-only system (which you have to be for Galera).

> max_heap_table_size = 2048M
> tmp_table_size = 1024M

For server-created tmp tables, this means the maximum is 1G as it's bound to the lower of these two.
For user created tmp tables, this means 2G limit.

> table_open_cache = 2000
> table_cache = 2000

No need for these high numbers since you're an InnoDB only system.

> sort_buffer_size = 128M
> join_buffer_size = 128M

That's very big.
Just making something bigger does not make it better/faster.

> [innodb]
> innodb_file_per_table
> innodb_flush_method=O_DIRECT
> innodb_log_file_size=1G

You'd be better off having more smaller logfiles in the group.
The default is 2, so you will now have 2x1G.
You could do 8x256M which is the same space but potentially more effective to assist background page flushing.

> innodb_flush_log_at_trx_commit=1

This is the main reason I'm commenting on the config... why =1 ?
That's not generally how Galera works best, and depending on the version it won't work at all.

Is this system happy to start without wsrep enabled? Do please test that, because if it still crashes then, you'll know it's definitely not something wsrep/Galera triggers.

Regards,
Arjen.
--
Arjen Lentz, Exec.Director @ Open Query Pty Ltd
Your databases and online infrastructure. Managed.
Web https://openquery.com.au | Twitter @openquery

Reply all

Reply to author

Forward

0 new messages