getting error while trying add node2 to mysql galera cluster

369 views
Skip to first unread message

Dharmendu Patel

unread,
Oct 8, 2016, 10:58:34 AM10/8/16
to codership
I am getting following error when i try to add node2 to the mysql galera cluster. Both have same my.cnf. We are getting below error when we try to start mysql.

 ERROR! The server quit without updating PID file (/var/lib/mysql/s2.hidb.bimsym.com.pid).


When i check .err file it shows below error.



161008 07:22:03 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
161008 07:22:03 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.AZGzjH' --pid-file='/var/lib/mysql/s2.xxxxx.xxxxx.com-recover.pid'
2016-10-08 07:22:04 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2016-10-08 07:22:04 0 [Note] /usr/sbin/mysqld (mysqld 5.6.33) starting as process 11766 ...
161008 07:22:08 mysqld_safe WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
2016-10-08 07:22:09 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2016-10-08 07:22:09 0 [Note] /usr/sbin/mysqld (mysqld 5.6.33) starting as process 11807 ...
2016-10-08 07:22:09 11807 [Note] WSREP: Setting wsrep_ready to 0
2016-10-08 07:22:09 11807 [Note] WSREP: Read nil XID from storage engines, skipping position init
2016-10-08 07:22:09 11807 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera-3/libgalera_smm.so'
2016-10-08 07:22:09 11807 [Note] WSREP: wsrep_load(): Galera 3.18(r3b41103) by Codership Oy <in...@codership.com> loaded successfully.
2016-10-08 07:22:09 11807 [Note] WSREP: CRC-32C: using hardware acceleration.
2016-10-08 07:22:09 11807 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
2016-10-08 07:22:09 11807 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 67.xxx.xxx.xxx.xxx; base_port = 4567; cert.log_conflicts = ON; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false
2016-10-08 07:22:09 11807 [Note] WSREP: Service thread queue flushed.
2016-10-08 07:22:09 11807 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2016-10-08 07:22:09 11807 [Note] WSREP: wsrep_sst_grab()
2016-10-08 07:22:09 11807 [Note] WSREP: Start replication
2016-10-08 07:22:09 11807 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2016-10-08 07:22:09 11807 [Note] WSREP: protonet asio version 0
2016-10-08 07:22:09 11807 [Note] WSREP: Using CRC-32C for message checksums.
2016-10-08 07:22:09 11807 [Note] WSREP: backend: asio
2016-10-08 07:22:09 11807 [Note] WSREP: gcomm thread scheduling priority set to other:0 
2016-10-08 07:22:09 11807 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2016-10-08 07:22:09 11807 [Note] WSREP: restore pc from disk failed
2016-10-08 07:22:09 11807 [Note] WSREP: GMCast version 0
2016-10-08 07:22:09 11807 [Note] WSREP: (738de7af, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2016-10-08 07:22:09 11807 [Note] WSREP: (738de7af, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2016-10-08 07:22:09 11807 [Note] WSREP: EVS version 0
2016-10-08 07:22:09 11807 [Note] WSREP: gcomm: connecting to group 'xxxxx_cluster', peer '69.xxx.xxx.xxx.xxx:,67.xxx.xxx.xxx.xxx:,208.xxx.xxx.xxx.xxx:'
2016-10-08 07:22:09 11807 [Note] WSREP: (738de7af, 'tcp://0.0.0.0:4567') connection established to 738de7af tcp://67.xxx.xxx.xxx.xxx:4567
2016-10-08 07:22:09 11807 [Warning] WSREP: (738de7af, 'tcp://0.0.0.0:4567') address 'tcp://67.xxx.xxx.xxx.xxx:4567' points to own listening address, blacklisting
2016-10-08 07:22:09 11807 [Note] WSREP: (738de7af, 'tcp://0.0.0.0:4567') connection established to 738de7af tcp://67.xxx.xxx.xxx.xxx:4567
2016-10-08 07:22:09 11807 [Note] WSREP: (738de7af, 'tcp://0.0.0.0:4567') connection established to 318a4221 tcp://69.xxx.xxx.xxx.xxx:4567
2016-10-08 07:22:09 11807 [Note] WSREP: (738de7af, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2016-10-08 07:22:09 11807 [Note] WSREP: (738de7af, 'tcp://0.0.0.0:4567') connection established to 18805553 tcp://208.xxx.xxx.xxx.xxx:4567
2016-10-08 07:22:09 11807 [Note] WSREP: declaring 18805553 at tcp://208.xxx.xxx.xxx.xxx:4567 stable
2016-10-08 07:22:09 11807 [Note] WSREP: declaring 318a4221 at tcp://69.xxx.xxx.xxx.xxx:4567 stable
2016-10-08 07:22:09 11807 [Note] WSREP: Node 18805553 state prim
2016-10-08 07:22:10 11807 [Note] WSREP: view(view_id(PRIM,18805553,37) memb {
18805553,0
318a4221,0
738de7af,0
} joined {
} left {
} partitioned {
})
2016-10-08 07:22:10 11807 [Note] WSREP: save pc into disk
2016-10-08 07:22:10 11807 [Note] WSREP: gcomm: connected
2016-10-08 07:22:10 11807 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2016-10-08 07:22:10 11807 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2016-10-08 07:22:10 11807 [Note] WSREP: Opened channel 'xxxxx_cluster'
2016-10-08 07:22:10 11807 [Note] WSREP: Waiting for SST to complete.
2016-10-08 07:22:10 11807 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3
2016-10-08 07:22:10 11807 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2016-10-08 07:22:10 11807 [Note] WSREP: STATE EXCHANGE: sent state msg: e9acb8da-8d27-11e6-b2d0-6310c7f6d02c
2016-10-08 07:22:10 11807 [Note] WSREP: STATE EXCHANGE: got state msg: e9acb8da-8d27-11e6-b2d0-6310c7f6d02c from 0 (s3.xxxxx.xxxxx.com)
2016-10-08 07:22:10 11807 [Note] WSREP: STATE EXCHANGE: got state msg: e9acb8da-8d27-11e6-b2d0-6310c7f6d02c from 1 (s1.xxxxx.xxxxx.com)
2016-10-08 07:22:10 11807 [Note] WSREP: STATE EXCHANGE: got state msg: e9acb8da-8d27-11e6-b2d0-6310c7f6d02c from 2 (s2.xxxx.xxxxx.com)
2016-10-08 07:22:10 11807 [Note] WSREP: Quorum results:
version    = 4,
component  = PRIMARY,
conf_id    = 176,
members    = 2/3 (joined/total),
act_id     = 92667,
last_appl. = -1,
protocols  = 0/7/3 (gcs/repl/appl),
group UUID = aab00320-83e1-11e6-9065-6bee750e7c1c
2016-10-08 07:22:10 11807 [Note] WSREP: Flow-control interval: [28, 28]
2016-10-08 07:22:10 11807 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 92667)
2016-10-08 07:22:10 11807 [Note] WSREP: State transfer required: 
Group state: aab00320-83e1-11e6-9065-6bee750e7c1c:92667
Local state: 00000000-0000-0000-0000-000000000000:-1
2016-10-08 07:22:10 11807 [Note] WSREP: New cluster view: global state: aab00320-83e1-11e6-9065-6bee750e7c1c:92667, view# 177: Primary, number of nodes: 3, my index: 2, protocol version 3
2016-10-08 07:22:10 11807 [Warning] WSREP: Gap in state sequence. Need state transfer.
2016-10-08 07:22:10 11807 [Note] WSREP: Setting wsrep_ready to 0
2016-10-08 07:22:10 11807 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '67.xxx.xxx.xxx.xxx' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '11807'  '' '
2016-10-08 07:22:10 11807 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync --role 'joiner' --address '67.xxx.xxx.xxx.xxx' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '11807'  '' 
Read: '(null)'
2016-10-08 07:22:10 11807 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '67.xxx.xxx.xxx.xxx' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '11807'  '' : 2 (No such file or directory)
2016-10-08 07:22:10 11807 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.
2016-10-08 07:22:10 11807 [ERROR] Aborting

2016-10-08 07:22:12 11807 [Note] WSREP: waiting for client connections to close: 2
2016-10-08 07:22:12 11807 [Note] WSREP: Closing send monitor...
2016-10-08 07:22:12 11807 [Note] WSREP: Closed send monitor.
2016-10-08 07:22:12 11807 [Note] WSREP: gcomm: terminating thread
2016-10-08 07:22:12 11807 [Note] WSREP: gcomm: joining thread
2016-10-08 07:22:12 11807 [Note] WSREP: gcomm: closing backend
2016-10-08 07:22:12 11807 [Note] WSREP: view(view_id(NON_PRIM,18805553,37) memb {
738de7af,0
} joined {
} left {
} partitioned {
18805553,0
318a4221,0
})
2016-10-08 07:22:12 11807 [Note] WSREP: view((empty))
2016-10-08 07:22:12 11807 [Note] WSREP: gcomm: closed
2016-10-08 07:22:12 11807 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2016-10-08 07:22:12 11807 [Note] WSREP: Flow-control interval: [16, 16]
2016-10-08 07:22:12 11807 [Note] WSREP: Received NON-PRIMARY.
2016-10-08 07:22:12 11807 [Note] WSREP: Shifting PRIMARY -> OPEN (TO: 92667)
2016-10-08 07:22:12 11807 [Note] WSREP: Received self-leave message.
2016-10-08 07:22:12 11807 [Note] WSREP: Flow-control interval: [0, 0]
2016-10-08 07:22:12 11807 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2016-10-08 07:22:12 11807 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 92667)
2016-10-08 07:22:12 11807 [Note] WSREP: RECV thread exiting 0: Success
2016-10-08 07:22:12 11807 [Note] WSREP: recv_thread() joined.
2016-10-08 07:22:12 11807 [Note] WSREP: Closing replication queue.
2016-10-08 07:22:12 11807 [Note] WSREP: Closing slave action queue.
2016-10-08 07:22:12 11807 [Note] WSREP: Service disconnected.
2016-10-08 07:22:12 11807 [Note] WSREP: closing wsrep thread 2
2016-10-08 07:22:12 11807 [Note] WSREP: closing wsrep thread 1
2016-10-08 07:22:12 11807 [Note] WSREP: WSREP rollback thread wakes for signal
2016-10-08 07:22:12 11807 [Note] WSREP: WSREP rollback thread has empty abort queue
2016-10-08 07:22:12 11807 [Note] WSREP: rollbacker thread exiting
2016-10-08 07:22:13 11807 [Note] WSREP: Some threads may fail to exit.
2016-10-08 07:22:13 11807 [Note] Binlog end
2016-10-08 07:22:13 11807 [Note] /usr/sbin/mysqld: Shutdown complete

Error in my_thread_global_end(): 2 threads didn't exit
161008 07:22:18 mysqld_safe mysqld from pid file /var/lib/mysql/s2.xxxxx.xxxxx.com.pid ended




Please help me to resolve this error.

Thanks
Dharmendu


Philip Stoev

unread,
Oct 9, 2016, 7:48:08 AM10/9/16
to Dharmendu Patel, codership
Hello,

Please check the following two things:

1. That selinux on both machines is disabled

2. That the wsrep_sst_rsync script is in the $PATH and is executable.

Philip Stoev
--
You received this message because you are subscribed to the Google Groups
"codership" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to codership-tea...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dharmendu Patel

unread,
Oct 10, 2016, 4:49:53 AM10/10/16
to Philip Stoev, codership
Thanks for your reply.

We have disabled selinux from both the server. When we try to start mysql after this, it is giving following error.

service mysql start
Starting MySQL......../usr/bin/mysqld_safe: line 166:  2926 Aborted
nohup /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/lib/mysql/s2.hidb.bimsym.com.err --pid-file=/var/lib/mysql/s2.xxxx.xxxxx.com.pid --wsrep_start_position=0                           0000000-0000-0000-0000-000000000000:-1 < /dev/null >> /var/lib/mysql/s2.hidb.bim                           sym.com.err 2>&1
 ERROR! The server quit without updating PID file (/var/lib/mysql/s2.xxxx.xxxxx.com.pid).


Please advise.

Thanks

To unsubscribe from this group and stop receiving emails from it, send an email to codership-team+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Dharmendu Patel
Director, Operations
BimSym eBusiness Solutions

BimSym is certified to  ISO 27001 Information Security Management System & ISO 9001:2008 Quality Management System.

          

James Wang

unread,
Oct 10, 2016, 6:55:02 AM10/10/16
to codership, philip...@galeracluster.com
You may miss some modules.

try strace

On Monday, 10 October 2016 09:49:53 UTC+1, Dharmendu Patel wrote:
Thanks for your reply.


Dharmendu Patel

unread,
Oct 11, 2016, 8:41:31 AM10/11/16
to Philip Stoev, codership
Thanks philip for your help.

It was the sulinux issue, we have disabled it in the servers and it is working fine.

Now we have 3 node cluster name s1,s2 and s3. What we have done is our existing live database with the size of around 95GB, we have make it master server.

We have make s1 as the slave of master server and replicate the data from master server. It is working fine. Whatever changes we do in the master it is there in the s1.

Now, when we added s2 and s3 at that time we have stopped the slave on s1 so when s1 and s2 join the cluster it takes all the databases and tables from s1 upto the time we have given stop slave.

now we have given start slave on s1 so s1 taken all the updates from master BUT s2 and s3 is not updating the data.

we have run all the command on s2 and s3 to check the cluster and all the commands are giving result like cluster is working fine. but the issue is there the data updating on s1 from master database is not updating in s2 and s3.

we directly can not update data on s2 or s3 otherwise it will updates data on s1 and replication will break. 

Please help us in resolve this issue - how s2 and s3 updates the updated data on s1 from master. we have taken your following youtube video as reference and setup according to it.

Migrating to Multi-Master Galera Cluster from Master Slave MySQL Replication (https://youtu.be/u7H7C12vRpU)


it shows same result for cluster size on all three nodes.

(root@localhost:(none)) mysql> show status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+


my.cnf details on all the three nodes. same my.cnf on all three nodes except the server-id.



[mysqld]
server-id=3
log-error=/var/log/mysqld.log
wsrep_on=ON
wsrep_provider=/usr/lib64/galera-3/libgalera_smm.so
wsrep_provider_options="gcache.size=300M; gcache.page_size=300M"
wsrep_cluster_name="hiidb_cluster"
wsrep_cluster_address="gcomm://69.xxx.xxx.xxx,67.xxx.xxx.xxx,208.xxx.xxx.xxx"
wsrep_sst_method=rsync
binlog_format=row
bind-address=0.0.0.0

# innodb settings 
default_storage_engine=innodb
innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit=0
innodb_buffer_pool_size=25G
thread_cache_size=24
query_cache_limit=64M
query_cache_size=1024M
query_cache_type=1
key_buffer_size=1024M
table_open_cache=4000
innodb_log_file_size=1G
net_read_timeout=300
net_write_timeout=300

# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
skip-name-resolve
max_allowed_packet=512M
max_connections=1000
tmp_table_size=4096M
max_heap_table_size=4096M
sql_mode=NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE

# settings for replication
#gtid_mode=ON
#log_bin=binlog
#log_slave_updates=1
#enforce_gtid_consistency
#expire_logs_days=7
#replicate_do_db=agent_contract
#slave_net_timeout=60

# wsrep Log Options
wsrep_log_conflicts=ON
wsrep_provider_options="cert.log_conflicts=ON"
wsrep_debug=ON

datadir=/var/lib/mysql/
socket=/var/lib/mysql/mysql.sock




Please help us to resolve this issue ASAP.

Thanks





On Mon, Oct 10, 2016 at 2:19 PM, Dharmendu Patel <dhar...@bimsym.com> wrote:
Thanks for your reply.

We have disabled selinux from both the server. When we try to start mysql after this, it is giving following error.

service mysql start
Starting MySQL......../usr/bin/mysqld_safe: line 166:  2926 Aborted
nohup /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/lib/mysql/s2.hidb.bimsym.com.err --pid-file=/var/lib/mysql/s2.xxxx.xxxxx.com.pid --wsrep_start_position=0                           0000000-0000-0000-0000-000000000000:-1 < /dev/null >> /var/lib/mysql/s2.hidb.bim                           sym.com.err 2>&1
 ERROR! The server quit without updating PID file (/var/lib/mysql/s2.xxxx.xxxxx.com.pid).


Please advise.

Thanks

James Wang

unread,
Oct 11, 2016, 9:31:51 AM10/11/16
to codership, philip...@galeracluster.com
log_slave_updates has to be 1


On Tuesday, 11 October 2016 13:41:31 UTC+1, Dharmendu Patel wrote:
Thanks philip for your help.

It was the sulinux issue, we have disabled it in the servers and it is working fine.

[snip]
 
#log_bin=binlog
#log_slave_updates=1


[snip] 
Reply all
Reply to author
Forward
0 new messages