getting error while trying add node2 to mysql galera cluster

Dharmendu Patel

unread,

Oct 8, 2016, 10:58:34 AM10/8/16

to codership

I am getting following error when i try to add node2 to the mysql galera cluster. Both have same my.cnf. We are getting below error when we try to start mysql.

ERROR! The server quit without updating PID file (/var/lib/mysql/s2.hidb.bimsym.com.pid).

When i check .err file it shows below error.

161008 07:22:03 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql

161008 07:22:03 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.AZGzjH' --pid-file='/var/lib/mysql/s2.xxxxx.xxxxx.com-recover.pid'

2016-10-08 07:22:04 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).

2016-10-08 07:22:04 0 [Note] /usr/sbin/mysqld (mysqld 5.6.33) starting as process 11766 ...

161008 07:22:08 mysqld_safe WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1

2016-10-08 07:22:09 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).

2016-10-08 07:22:09 0 [Note] /usr/sbin/mysqld (mysqld 5.6.33) starting as process 11807 ...

2016-10-08 07:22:09 11807 [Note] WSREP: Setting wsrep_ready to 0

2016-10-08 07:22:09 11807 [Note] WSREP: Read nil XID from storage engines, skipping position init

2016-10-08 07:22:09 11807 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera-3/libgalera_smm.so'

2016-10-08 07:22:09 11807 [Note] WSREP: wsrep_load(): Galera 3.18(r3b41103) by Codership Oy <in...@codership.com> loaded successfully.

2016-10-08 07:22:09 11807 [Note] WSREP: CRC-32C: using hardware acceleration.

2016-10-08 07:22:09 11807 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1

2016-10-08 07:22:09 11807 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 67.xxx.xxx.xxx.xxx; base_port = 4567; cert.log_conflicts = ON; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false

2016-10-08 07:22:09 11807 [Note] WSREP: Service thread queue flushed.

2016-10-08 07:22:09 11807 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1

2016-10-08 07:22:09 11807 [Note] WSREP: wsrep_sst_grab()

2016-10-08 07:22:09 11807 [Note] WSREP: Start replication

2016-10-08 07:22:09 11807 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1

2016-10-08 07:22:09 11807 [Note] WSREP: protonet asio version 0

2016-10-08 07:22:09 11807 [Note] WSREP: Using CRC-32C for message checksums.

2016-10-08 07:22:09 11807 [Note] WSREP: backend: asio

2016-10-08 07:22:09 11807 [Note] WSREP: gcomm thread scheduling priority set to other:0

2016-10-08 07:22:09 11807 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)

2016-10-08 07:22:09 11807 [Note] WSREP: restore pc from disk failed

2016-10-08 07:22:09 11807 [Note] WSREP: GMCast version 0

2016-10-08 07:22:09 11807 [Note] WSREP: (738de7af, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567

2016-10-08 07:22:09 11807 [Note] WSREP: (738de7af, 'tcp://0.0.0.0:4567') multicast: , ttl: 1

2016-10-08 07:22:09 11807 [Note] WSREP: EVS version 0

2016-10-08 07:22:09 11807 [Note] WSREP: gcomm: connecting to group 'xxxxx_cluster', peer '69.xxx.xxx.xxx.xxx:,67.xxx.xxx.xxx.xxx:,208.xxx.xxx.xxx.xxx:'

2016-10-08 07:22:09 11807 [Note] WSREP: (738de7af, 'tcp://0.0.0.0:4567') connection established to 738de7af tcp://67.xxx.xxx.xxx.xxx:4567

2016-10-08 07:22:09 11807 [Warning] WSREP: (738de7af, 'tcp://0.0.0.0:4567') address 'tcp://67.xxx.xxx.xxx.xxx:4567' points to own listening address, blacklisting

2016-10-08 07:22:09 11807 [Note] WSREP: (738de7af, 'tcp://0.0.0.0:4567') connection established to 738de7af tcp://67.xxx.xxx.xxx.xxx:4567

2016-10-08 07:22:09 11807 [Note] WSREP: (738de7af, 'tcp://0.0.0.0:4567') connection established to 318a4221 tcp://69.xxx.xxx.xxx.xxx:4567

2016-10-08 07:22:09 11807 [Note] WSREP: (738de7af, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:

2016-10-08 07:22:09 11807 [Note] WSREP: (738de7af, 'tcp://0.0.0.0:4567') connection established to 18805553 tcp://208.xxx.xxx.xxx.xxx:4567

2016-10-08 07:22:09 11807 [Note] WSREP: declaring 18805553 at tcp://208.xxx.xxx.xxx.xxx:4567 stable

2016-10-08 07:22:09 11807 [Note] WSREP: declaring 318a4221 at tcp://69.xxx.xxx.xxx.xxx:4567 stable

2016-10-08 07:22:09 11807 [Note] WSREP: Node 18805553 state prim

2016-10-08 07:22:10 11807 [Note] WSREP: view(view_id(PRIM,18805553,37) memb {

18805553,0

318a4221,0

738de7af,0

} joined {

} left {

} partitioned {

})

2016-10-08 07:22:10 11807 [Note] WSREP: save pc into disk

2016-10-08 07:22:10 11807 [Note] WSREP: gcomm: connected

2016-10-08 07:22:10 11807 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636

2016-10-08 07:22:10 11807 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)

2016-10-08 07:22:10 11807 [Note] WSREP: Opened channel 'xxxxx_cluster'

2016-10-08 07:22:10 11807 [Note] WSREP: Waiting for SST to complete.

2016-10-08 07:22:10 11807 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3

2016-10-08 07:22:10 11807 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.

2016-10-08 07:22:10 11807 [Note] WSREP: STATE EXCHANGE: sent state msg: e9acb8da-8d27-11e6-b2d0-6310c7f6d02c

2016-10-08 07:22:10 11807 [Note] WSREP: STATE EXCHANGE: got state msg: e9acb8da-8d27-11e6-b2d0-6310c7f6d02c from 0 (s3.xxxxx.xxxxx.com)

2016-10-08 07:22:10 11807 [Note] WSREP: STATE EXCHANGE: got state msg: e9acb8da-8d27-11e6-b2d0-6310c7f6d02c from 1 (s1.xxxxx.xxxxx.com)

2016-10-08 07:22:10 11807 [Note] WSREP: STATE EXCHANGE: got state msg: e9acb8da-8d27-11e6-b2d0-6310c7f6d02c from 2 (s2.xxxx.xxxxx.com)

2016-10-08 07:22:10 11807 [Note] WSREP: Quorum results:

version = 4,

component = PRIMARY,

conf_id = 176,

members = 2/3 (joined/total),

act_id = 92667,

last_appl. = -1,

protocols = 0/7/3 (gcs/repl/appl),

group UUID = aab00320-83e1-11e6-9065-6bee750e7c1c

2016-10-08 07:22:10 11807 [Note] WSREP: Flow-control interval: [28, 28]

2016-10-08 07:22:10 11807 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 92667)

2016-10-08 07:22:10 11807 [Note] WSREP: State transfer required:

Group state: aab00320-83e1-11e6-9065-6bee750e7c1c:92667

Local state: 00000000-0000-0000-0000-000000000000:-1

2016-10-08 07:22:10 11807 [Note] WSREP: New cluster view: global state: aab00320-83e1-11e6-9065-6bee750e7c1c:92667, view# 177: Primary, number of nodes: 3, my index: 2, protocol version 3

2016-10-08 07:22:10 11807 [Warning] WSREP: Gap in state sequence. Need state transfer.

2016-10-08 07:22:10 11807 [Note] WSREP: Setting wsrep_ready to 0

2016-10-08 07:22:10 11807 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '67.xxx.xxx.xxx.xxx' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '11807' '' '

2016-10-08 07:22:10 11807 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync --role 'joiner' --address '67.xxx.xxx.xxx.xxx' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '11807' ''

Read: '(null)'

2016-10-08 07:22:10 11807 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '67.xxx.xxx.xxx.xxx' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '11807' '' : 2 (No such file or directory)

2016-10-08 07:22:10 11807 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.

2016-10-08 07:22:10 11807 [ERROR] Aborting

2016-10-08 07:22:12 11807 [Note] WSREP: waiting for client connections to close: 2

2016-10-08 07:22:12 11807 [Note] WSREP: Closing send monitor...

2016-10-08 07:22:12 11807 [Note] WSREP: Closed send monitor.

2016-10-08 07:22:12 11807 [Note] WSREP: gcomm: terminating thread

2016-10-08 07:22:12 11807 [Note] WSREP: gcomm: joining thread

2016-10-08 07:22:12 11807 [Note] WSREP: gcomm: closing backend

2016-10-08 07:22:12 11807 [Note] WSREP: view(view_id(NON_PRIM,18805553,37) memb {

738de7af,0

} joined {

} left {

} partitioned {

18805553,0

318a4221,0

})

2016-10-08 07:22:12 11807 [Note] WSREP: view((empty))

2016-10-08 07:22:12 11807 [Note] WSREP: gcomm: closed

2016-10-08 07:22:12 11807 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1

2016-10-08 07:22:12 11807 [Note] WSREP: Flow-control interval: [16, 16]

2016-10-08 07:22:12 11807 [Note] WSREP: Received NON-PRIMARY.

2016-10-08 07:22:12 11807 [Note] WSREP: Shifting PRIMARY -> OPEN (TO: 92667)

2016-10-08 07:22:12 11807 [Note] WSREP: Received self-leave message.

2016-10-08 07:22:12 11807 [Note] WSREP: Flow-control interval: [0, 0]

2016-10-08 07:22:12 11807 [Note] WSREP: Received SELF-LEAVE. Closing connection.

2016-10-08 07:22:12 11807 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 92667)

2016-10-08 07:22:12 11807 [Note] WSREP: RECV thread exiting 0: Success

2016-10-08 07:22:12 11807 [Note] WSREP: recv_thread() joined.

2016-10-08 07:22:12 11807 [Note] WSREP: Closing replication queue.

2016-10-08 07:22:12 11807 [Note] WSREP: Closing slave action queue.

2016-10-08 07:22:12 11807 [Note] WSREP: Service disconnected.

2016-10-08 07:22:12 11807 [Note] WSREP: closing wsrep thread 2

2016-10-08 07:22:12 11807 [Note] WSREP: closing wsrep thread 1

2016-10-08 07:22:12 11807 [Note] WSREP: WSREP rollback thread wakes for signal

2016-10-08 07:22:12 11807 [Note] WSREP: WSREP rollback thread has empty abort queue

2016-10-08 07:22:12 11807 [Note] WSREP: rollbacker thread exiting

2016-10-08 07:22:13 11807 [Note] WSREP: Some threads may fail to exit.

2016-10-08 07:22:13 11807 [Note] Binlog end

2016-10-08 07:22:13 11807 [Note] /usr/sbin/mysqld: Shutdown complete

Error in my_thread_global_end(): 2 threads didn't exit

161008 07:22:18 mysqld_safe mysqld from pid file /var/lib/mysql/s2.xxxxx.xxxxx.com.pid ended

Please help me to resolve this error.

Thanks

Dharmendu

Philip Stoev

unread,

Oct 9, 2016, 7:48:08 AM10/9/16

to Dharmendu Patel, codership

Hello,

Please check the following two things:

1. That selinux on both machines is disabled

2. That the wsrep_sst_rsync script is in the $PATH and is executable.

Philip Stoev

--
You received this message because you are subscribed to the Google Groups
"codership" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to codership-tea...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dharmendu Patel

unread,

Oct 10, 2016, 4:49:53 AM10/10/16

to Philip Stoev, codership

Thanks for your reply.

We have disabled selinux from both the server. When we try to start mysql after this, it is giving following error.

service mysql start

Starting MySQL......../usr/bin/mysqld_safe: line 166: 2926 Aborted

nohup /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/lib/mysql/s2.hidb.bimsym.com.err --pid-file=/var/lib/mysql/s2.xxxx.xxxxx.com.pid --wsrep_start_position=0 0000000-0000-0000-0000-000000000000:-1 < /dev/null >> /var/lib/mysql/s2.hidb.bim sym.com.err 2>&1

ERROR! The server quit without updating PID file (/var/lib/mysql/s2.xxxx.xxxxx.com.pid).

Please advise.

Thanks

To unsubscribe from this group and stop receiving emails from it, send an email to codership-team+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Dharmendu Patel

Director, Operations

BimSym eBusiness Solutions

BimSym is certified to ISO 27001 Information Security Management System & ISO 9001:2008 Quality Management System.

James Wang

unread,

Oct 10, 2016, 6:55:02 AM10/10/16

to codership, philip...@galeracluster.com

You may miss some modules.

try strace

On Monday, 10 October 2016 09:49:53 UTC+1, Dharmendu Patel wrote:

Thanks for your reply.

Dharmendu Patel

unread,

Oct 11, 2016, 8:41:31 AM10/11/16

to Philip Stoev, codership

Thanks philip for your help.

It was the sulinux issue, we have disabled it in the servers and it is working fine.

Now we have 3 node cluster name s1,s2 and s3. What we have done is our existing live database with the size of around 95GB, we have make it master server.

We have make s1 as the slave of master server and replicate the data from master server. It is working fine. Whatever changes we do in the master it is there in the s1.

Now, when we added s2 and s3 at that time we have stopped the slave on s1 so when s1 and s2 join the cluster it takes all the databases and tables from s1 upto the time we have given stop slave.

now we have given start slave on s1 so s1 taken all the updates from master BUT s2 and s3 is not updating the data.

we have run all the command on s2 and s3 to check the cluster and all the commands are giving result like cluster is working fine. but the issue is there the data updating on s1 from master database is not updating in s2 and s3.

we directly can not update data on s2 or s3 otherwise it will updates data on s1 and replication will break.

Please help us in resolve this issue - how s2 and s3 updates the updated data on s1 from master. we have taken your following youtube video as reference and setup according to it.

Migrating to Multi-Master Galera Cluster from Master Slave MySQL Replication (https://youtu.be/u7H7C12vRpU)

it shows same result for cluster size on all three nodes.

(root@localhost:(none)) mysql> show status like 'wsrep_cluster_size';

+--------------------+-------+

| Variable_name | Value |

+--------------------+-------+

| wsrep_cluster_size | 3 |

+--------------------+-------+

my.cnf details on all the three nodes. same my.cnf on all three nodes except the server-id.

[mysqld]

server-id=3

log-error=/var/log/mysqld.log

wsrep_on=ON

wsrep_provider=/usr/lib64/galera-3/libgalera_smm.so

wsrep_provider_options="gcache.size=300M; gcache.page_size=300M"

wsrep_cluster_name="hiidb_cluster"

wsrep_cluster_address="gcomm://69.xxx.xxx.xxx,67.xxx.xxx.xxx,208.xxx.xxx.xxx"

wsrep_sst_method=rsync

binlog_format=row

bind-address=0.0.0.0

# innodb settings

default_storage_engine=innodb

innodb_autoinc_lock_mode=2

innodb_flush_log_at_trx_commit=0

innodb_buffer_pool_size=25G

thread_cache_size=24

query_cache_limit=64M

query_cache_size=1024M

query_cache_type=1

key_buffer_size=1024M

table_open_cache=4000

innodb_log_file_size=1G

net_read_timeout=300

net_write_timeout=300

# Disabling symbolic-links is recommended to prevent assorted security risks

symbolic-links=0

skip-name-resolve

max_allowed_packet=512M

max_connections=1000

tmp_table_size=4096M

max_heap_table_size=4096M

sql_mode=NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE

# settings for replication

#gtid_mode=ON

#log_bin=binlog

#log_slave_updates=1

#enforce_gtid_consistency

#expire_logs_days=7

#replicate_do_db=agent_contract

#slave_net_timeout=60

# wsrep Log Options

wsrep_log_conflicts=ON

wsrep_provider_options="cert.log_conflicts=ON"

wsrep_debug=ON

datadir=/var/lib/mysql/

socket=/var/lib/mysql/mysql.sock

Please help us to resolve this issue ASAP.

Thanks

On Mon, Oct 10, 2016 at 2:19 PM, Dharmendu Patel <dhar...@bimsym.com> wrote:

Thanks for your reply.

We have disabled selinux from both the server. When we try to start mysql after this, it is giving following error.

service mysql start
Starting MySQL......../usr/bin/mysqld_safe: line 166: 2926 Aborted
nohup /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/lib/mysql/s2.hidb.bimsym.com.err --pid-file=/var/lib/mysql/s2.xxxx.xxxxx.com.pid --wsrep_start_position=0 0000000-0000-0000-0000-000000000000:-1 < /dev/null >> /var/lib/mysql/s2.hidb.bim sym.com.err 2>&1
ERROR! The server quit without updating PID file (/var/lib/mysql/s2.xxxx.xxxxx.com.pid).

Please advise.

Thanks

James Wang

unread,

Oct 11, 2016, 9:31:51 AM10/11/16

to codership, philip...@galeracluster.com

log_slave_updates has to be 1

On Tuesday, 11 October 2016 13:41:31 UTC+1, Dharmendu Patel wrote:

Thanks philip for your help.

It was the sulinux issue, we have disabled it in the servers and it is working fine.

[snip]

#log_bin=binlog
#log_slave_updates=1

[snip]

Reply all

Reply to author

Forward