Galera changing root password during state transfer

2,142 views
Skip to first unread message

Jase

unread,
Oct 18, 2010, 12:52:32 AM10/18/10
to codership
Hi,
I am trying to test Galera. I have found the test setup to be very
good. This is what I have done:
1. Using Ubuntu 9.10, 10.10 and Debian Lenny on three separate
machines I have installed mysql-server using apt-get
2. Got mysql to stop, and prevented it from auto-starting. This was a
bit tricky in Ubuntu 10.10, here is the post on how to do it:
http://ubuntuforums.org/showthread.php?p=9422111
3. Unpacked mysql-5.1.44-galera-0.7.4-i386.tgz in the directory /home/
gen (gen is general user)
4. My machines are 60, 61, 62 and 63 (the IP addresses are
192.168.0.60, 192.168.0.61 and so on). On 60 I ran

gen@Test60:~$ /home/gen/mysql-5.1.44-galera-0.7.4-i386/mysql-galera -g
gcomm:// start

5. This started the first node, then on 62 I ran:

gen@Test62:~$ /home/gen/mysql-5.1.44-galera-0.7.4-i386/mysql-galera -g
gcomm://192.168.0.60 start

6. This started the cluster. I then added another machine, loaded in
my data file and all was well. The third machine crashed, and I
replaced it with another, so the status was: 60 and 62 working in the
cluster, 61 and 63 not in the cluster. I had to go home, and I left a
system inserting transactions (rows in a table) into 62, which is
still working, and 62 and 60 remain in sync, that is transactions
inserted on 62 can be seen in 60.

Now I get to the problem:

I am trying to add machine 63 to the cluster, while it is running and
while inserts are under way. Both 61 and 63 are 64 bit machines, with
Ubuntu 10.10 64 bit server edition, while 60 is an old machine with 32
bit Debian Lenny and 62 is an old machine with 32 bit Ubuntu 10.10
(the one that crashed was Ubuntu 9.10 32 bit, and it did work in the
cluster until the machine crashed (nothing to do with the cluster)).

When I try to add 63, I get the following:

gen@Test63:~$ /home/gen/mysql-5.1.44-galera-0.7.4-x86_64/mysql-galera -
g gcomm://192.168.0.60 start
Starting mysqld instance with data dir /home/gen/mysql-5.1.44-
galera-0.7.4-x86_64/mysql/var and listening at port 3306 and socket /
home/gen/mysql-5.1.44-galera-0.7.4-x86_64/mysql/var/mysqld.sock....
Done (PID:3465)
Waiting for wsrep_ready.................ERROR 1045 (28000): Access
denied for user 'root'@'localhost' (using password: YES)

I have tried this multiple times, with a fresh unpack of this tgz:
mysql-5.1.44-galera-0.7.4-x86_64.tgz. If I start 63 without connecting
it to the cluster on a fresh unpack, ie run:

gen@Test63:~$ /home/gen/mysql-5.1.44-galera-0.7.4-x86_64/mysql-galera -
g gcomm:// start

The node starts and I can connect to mysql using:

gen@Test63:~$ mysql -u root -p --protocol=TCP

with the password rootpass.

If I stop it and try to connect it to the cluster I get the above
error. The strange thing is that in the logs it appears to have
successfully joined the cluster:

101018 14:23:49 [Warning] The syntax '--skip-locking' is deprecated
and will be removed in a future release. Please use --skip-external-
locking instead.
101018 14:23:50 InnoDB: Started; log sequence number 0 434350254
101018 14:23:50 [Note] WSREP: wsrep_load(): loading provider library '/
home/gen/mysql-5.1.44-galera-0.7.4-x86_64/galera/lib/libmmgalera.so'
101018 14:23:50 [Note] WSREP: wsrep_load(): provider loaded
succesfully.
101018 14:23:50 [Note] Event Scheduler: Loaded 0 events
101018 14:23:50 [Note] WSREP: start replication
101018 14:23:50 [Note] WSREP: Found stored state: e98a3735-
d1ab-11df-0800-3b59052d1c54:199710
101018 14:23:50 [Note] WSREP: Configured state: e98a3735-
d1ab-11df-0800-3b59052d1c54:199710
101018 14:23:50 [Note] WSREP: gcomm: connecting to group
'my_wsrep_cluster', peer '192.168.0.60:'
101018 14:23:51 [Note] WSREP: gcomm: connected
101018 14:23:51 [Note] WSREP: Changing maximum message size 0 -> 32636
101018 14:23:51 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
101018 14:23:51 [Note] WSREP: Opened channel 'my_wsrep_cluster'
101018 14:23:51 [Note] WSREP: Successfully opened GCS connection to
my_wsrep_cluster
101018 14:23:51 [Note] WSREP: New COMPONENT: primary = yes, my_idx =
0, memb_num = 3
101018 14:23:51 [Note] /home/gen/mysql-5.1.44-galera-0.7.4-x86_64/
mysql/libexec/mysqld: ready for connections.
Version: '5.1.44' socket: '/home/gen/mysql-5.1.44-galera-0.7.4-x86_64/
mysql/var/mysqld.sock' port: 3306 wsrep_0.7.4
101018 14:23:51 [Note] WSREP: STATE_EXCHANGE: sent state UUID:
213f7a89-da67-11df-0800-24078121bfe5
101018 14:23:51 [Note] WSREP: STATE EXCHANGE: sent state msg: 213f7a89-
da67-11df-0800-24078121bfe5
101018 14:23:51 [Note] WSREP: STATE EXCHANGE: got state msg: 213f7a89-
da67-11df-0800-24078121bfe5 from 0 (Test63)
101018 14:23:51 [Note] WSREP: STATE EXCHANGE: got state msg: 213f7a89-
da67-11df-0800-24078121bfe5 from 1 (Test62)
101018 14:23:51 [Note] WSREP: STATE EXCHANGE: got state msg: 213f7a89-
da67-11df-0800-24078121bfe5 from 2 (Test60)
101018 14:23:51 [Note] WSREP: Quorum results:
PRIMARY,
act_id = 199716,
conf_id = 20,
last_appl. = 0,
protocol = 0,
group UUID = e98a3735-d1ab-11df-0800-3b59052d1c54
101018 14:23:51 [Note] WSREP: Flow-control interval: [23, 45]
101018 14:23:51 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 199716)
101018 14:23:51 [Note] WSREP: New cluster view: group UUID: e98a3735-
d1ab-11df-0800-3b59052d1c54, conf# 21: Primary, number of nodes: 3, my
index: 0, first seqno: 199717
101018 14:23:51 [Warning] WSREP: Gap in state sequence. Need state
transfer.
101018 14:23:53 [Note] WSREP: State transfer required:
Group state: e98a3735-d1ab-11df-0800-3b59052d1c54:199716
Local state: e98a3735-d1ab-11df-0800-3b59052d1c54:199710
101018 14:23:53 [Note] [DEBUG] WSREP: Prepared SST request: mysqldump|
192.168.0.63:3306
101018 14:23:53 [Note] WSREP: Node 0 (Test63) requested State Transfer
from '*any*'. Selected 1 (Test62)(SYNCED) as donor.
101018 14:23:53 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 199717)
101018 14:23:53 [Note] WSREP: Requesting state transfer: success,
donor 1
101018 14:26:18 [Note] WSREP: wsrep_start_position var submitted:
'e98a3735-d1ab-11df-0800-3b59052d1c54:199717'
101018 14:26:18 [Note] WSREP: Application state transfer complete:
e98a3735-d1ab-11df-0800-3b59052d1c54:199717
101018 14:26:18 [Note] WSREP: 0(Test63): State transfer from 1(Test62)
complete.
101018 14:26:18 [Note] WSREP: Shifting JOINER -> JOINED (TO: 199719)
101018 14:26:18 [Note] WSREP: Member 0 (Test63) synced with group.
101018 14:26:18 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 199719)
101018 14:26:21 [Warning] WSREP: 1(Test62): State transfer to
0(Test63) failed: -1 (Operation not permitted)
101018 14:26:21 [Note] WSREP: Member 1 (Test62) synced with group.

But I cannot connect to mysql any more:

gen@Test63:~$ mysql -u root -p --protocol=TCP
Enter password:
ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using
password: YES)

The other strange thing is that I connected to 60, but in the logs it
appears that the job of providing the state transfer has been
delegated to 62, which is not what I want. I have 62 receiving
transactions, and I don't want it to block them. 60 is acting as a
reserve node, and I want it to provide the state transfer so that 62
can keep serving requests. Here are the logs from 60:

101018 14:23:45 [Note] WSREP: New COMPONENT: primary = yes, my_idx =
1, memb_num = 2
101018 14:23:45 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
101018 14:23:45 [Note] WSREP: STATE EXCHANGE: sent state msg: fdb7b298-
da66-11df-0800-3b011810a24a
101018 14:23:45 [Note] WSREP: STATE EXCHANGE: got state msg: fdb7b298-
da66-11df-0800-3b011810a24a from 0 (Test62)
101018 14:23:45 [Note] WSREP: STATE EXCHANGE: got state msg: fdb7b298-
da66-11df-0800-3b011810a24a from 1 (Test60)
101018 14:23:45 [Note] WSREP: Quorum results:
PRIMARY,
act_id = 199710,
conf_id = 19,
last_appl. = 199518,
protocol = 0,
group UUID = e98a3735-d1ab-11df-0800-3b59052d1c54
101018 14:23:45 [Note] WSREP: Flow-control interval: [16, 32]
101018 14:23:45 [Note] WSREP: New cluster view: group UUID: e98a3735-
d1ab-11df-0
eqno: 199711
101018 14:24:01 [Note] WSREP: New COMPONENT: primary = yes, my_idx =
2, memb_num
101018 14:24:01 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
101018 14:24:02 [Note] WSREP: STATE EXCHANGE: sent state msg: 213f7a89-
da67-11df
101018 14:24:02 [Note] WSREP: STATE EXCHANGE: got state msg: 213f7a89-
da67-11df-
101018 14:24:02 [Note] WSREP: STATE EXCHANGE: got state msg: 213f7a89-
da67-11df-
101018 14:24:02 [Note] WSREP: STATE EXCHANGE: got state msg: 213f7a89-
da67-11df-
101018 14:24:02 [Note] WSREP: Quorum results:
PRIMARY,
act_id = 199716,
conf_id = 20,
last_appl. = 0,
protocol = 0,
group UUID = e98a3735-d1ab-11df-0800-3b59052d1c54
101018 14:24:02 [Note] WSREP: Flow-control interval: [23, 45]
101018 14:24:02 [Note] WSREP: New cluster view: group UUID: e98a3735-
d1ab-11df-0
eqno: 199717
101018 14:24:04 [Note] WSREP: Node 0 (Test63) requested State Transfer
from '*an
101018 14:26:29 [Note] WSREP: 0(Test63): State transfer from 1(Test62)
complete.
101018 14:26:29 [Note] WSREP: Member 0 (Test63) synced with group.
101018 14:26:32 [Warning] WSREP: 1(Test62): State transfer to
0(Test63) failed:
101018 14:26:32 [Note] WSREP: Member 1 (Test62) synced with group.


These are the logs from 62:

101018 14:22:52 [Note] WSREP: New COMPONENT: primary = yes, my_idx =
0, memb_num = 2
101018 14:22:52 [Note] WSREP: STATE_EXCHANGE: sent state UUID:
fdb7b298-da66-11df-0800-3b011810a24a
101018 14:22:52 [Note] WSREP: STATE EXCHANGE: sent state msg: fdb7b298-
da66-11df-0800-3b011810a24a
101018 14:22:52 [Note] WSREP: STATE EXCHANGE: got state msg: fdb7b298-
da66-11df-0800-3b011810a24a from 0 (Test62)
101018 14:22:52 [Note] WSREP: STATE EXCHANGE: got state msg: fdb7b298-
da66-11df-0800-3b011810a24a from 1 (Test60)
101018 14:22:52 [Note] WSREP: Quorum results:
PRIMARY,
act_id = 199710,
conf_id = 19,
last_appl. = 199518,
protocol = 0,
group UUID = e98a3735-d1ab-11df-0800-3b59052d1c54
101018 14:22:52 [Note] WSREP: Flow-control interval: [16, 32]
101018 14:22:52 [Note] WSREP: New cluster view: group UUID: e98a3735-
d1ab-11df-0800-3b59052d1c54, conf# 20: Primary, number of nodes: 2, my
index: 0, first seqno: 199711
101018 14:23:08 [Note] WSREP: New COMPONENT: primary = yes, my_idx =
1, memb_num = 3
101018 14:23:08 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
101018 14:23:09 [Note] WSREP: STATE EXCHANGE: sent state msg: 213f7a89-
da67-11df-0800-24078121bfe5
101018 14:23:09 [Note] WSREP: STATE EXCHANGE: got state msg: 213f7a89-
da67-11df-0800-24078121bfe5 from 0 (Test63)
101018 14:23:09 [Note] WSREP: STATE EXCHANGE: got state msg: 213f7a89-
da67-11df-0800-24078121bfe5 from 1 (Test62)
101018 14:23:09 [Note] WSREP: STATE EXCHANGE: got state msg: 213f7a89-
da67-11df-0800-24078121bfe5 from 2 (Test60)
101018 14:23:09 [Note] WSREP: Quorum results:
PRIMARY,
act_id = 199716,
conf_id = 20,
last_appl. = 0,
protocol = 0,
group UUID = e98a3735-d1ab-11df-0800-3b59052d1c54
101018 14:23:09 [Note] WSREP: Flow-control interval: [23, 45]
101018 14:23:09 [Note] WSREP: New cluster view: group UUID: e98a3735-
d1ab-11df-0800-3b59052d1c54, conf# 21: Primary, number of nodes: 3, my
index: 1, first seqno: 199717
101018 14:23:11 [Note] WSREP: Node 0 (Test63) requested State Transfer
from '*any*'. Selected 1 (Test62)(SYNCED) as donor.
101018 14:23:11 [Note] WSREP: Got GCS_ACT_STATE_REQ to 1, my idx: 1
101018 14:23:11 [Note] WSREP: Shifting SYNCED -> DONOR (TO: 199717)
101018 14:23:11 [Note] WSREP: Got state transfer request.
101018 14:23:11 [Note] [DEBUG] WSREP: Running: 'wsrep_sst_mysqldump
'root' 'rootpass' '192.168.0.63' '3306' '3306' 'e98a3735-
d1ab-11df-0800-3b59052d1c54' '199717''
101018 14:25:36 [Note] WSREP: 0(Test63): State transfer from 1(Test62)
complete.
101018 14:25:36 [Note] WSREP: Member 0 (Test63) synced with group.
ERROR 1045 (28000): Access denied for user
'root'@'192.168.0.62' (using password: YES)
101018 14:25:36 [ERROR] WSREP: Process completed with error:
wsrep_sst_mysqldump 'root' 'rootpass' '192.168.0.63' '3306' '3306'
'e98a3735-d1ab-11df-0800-3b59052d1c54' '199717': 1 (Operation not
permitted)
101018 14:25:36 [ERROR] WSREP: Try 1/3: 'wsrep_sst_mysqldump 'root'
'rootpass' '192.168.0.63' '3306' '3306' 'e98a3735-
d1ab-11df-0800-3b59052d1c54' '199717'' failed: 1 (Operation not
permitted)
ERROR 1045 (28000): Access denied for user
'root'@'192.168.0.62' (using password: YES)
101018 14:25:37 [ERROR] WSREP: Process completed with error:
wsrep_sst_mysqldump 'root' 'rootpass' '192.168.0.63' '3306' '3306'
'e98a3735-d1ab-11df-0800-3b59052d1c54' '199717': 1 (Operation not
permitted)
101018 14:25:37 [ERROR] WSREP: Try 2/3: 'wsrep_sst_mysqldump 'root'
'rootpass' '192.168.0.63' '3306' '3306' 'e98a3735-
d1ab-11df-0800-3b59052d1c54' '199717'' failed: 1 (Operation not
permitted)
ERROR 1045 (28000): Access denied for user
'root'@'192.168.0.62' (using password: YES)
101018 14:25:38 [ERROR] WSREP: Process completed with error:
wsrep_sst_mysqldump 'root' 'rootpass' '192.168.0.63' '3306' '3306'
'e98a3735-d1ab-11df-0800-3b59052d1c54' '199717': 1 (Operation not
permitted)
101018 14:25:38 [ERROR] WSREP: Try 3/3: 'wsrep_sst_mysqldump 'root'
'rootpass' '192.168.0.63' '3306' '3306' 'e98a3735-
d1ab-11df-0800-3b59052d1c54' '199717'' failed: 1 (Operation not
permitted)
101018 14:25:39 [Warning] WSREP: 1(Test62): State transfer to
0(Test63) failed: -1 (Operation not permitted)
101018 14:25:39 [Note] WSREP: Shifting DONOR -> JOINED (TO: 199719)
101018 14:25:39 [Note] WSREP: Member 1 (Test62) synced with group.
101018 14:25:39 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 199719)


Next I tried connecting another machine to the cluster, a 32 bit one
in case there was a problem with the 64 bit version. This was machine
70 (32 bit ubuntu 10.10). Here are the logs from 70:

gen@Test70:~$ cat mysql-5.1.44-galera-0.7.4-i386/mysql/var/Test70.err
101018 15:40:27 [Warning] The syntax '--skip-locking' is deprecated
and will be removed in a future release. Please use --skip-external-
locking instead.
InnoDB: The first specified data file ./ibdata1 did not exist:
InnoDB: a new database to be created!
101018 15:40:28 InnoDB: Setting file ./ibdata1 size to 10 MB
InnoDB: Database physically writes the file full: wait...
101018 15:40:28 InnoDB: Log file ./ib_logfile0 did not exist: new to
be created
InnoDB: Setting log file ./ib_logfile0 size to 256 MB
InnoDB: Database physically writes the file full: wait...
InnoDB: Progress in MB: 100 200
101018 15:40:33 InnoDB: Log file ./ib_logfile1 did not exist: new to
be created
InnoDB: Setting log file ./ib_logfile1 size to 256 MB
InnoDB: Database physically writes the file full: wait...
InnoDB: Progress in MB: 100 200
InnoDB: Doublewrite buffer not found: creating new
InnoDB: Doublewrite buffer created
InnoDB: Creating foreign key constraint system tables
InnoDB: Foreign key constraint system tables created
101018 15:40:39 InnoDB: Started; log sequence number 0 0
101018 15:40:39 [Note] WSREP: wsrep_load(): loading provider library '/
home/gen/mysql-5.1.44-galera-0.7.4-i386/galera/lib/libmmgalera.so'
101018 15:40:39 [Note] WSREP: wsrep_load(): provider loaded
succesfully.
101018 15:40:39 [Note] Event Scheduler: Loaded 0 events
101018 15:40:39 [Note] WSREP: start replication
101018 15:40:39 [ERROR] WSREP: could not open state file: /home/gen/
mysql-5.1.44-galera-0.7.4-i386/mysql/var//grastate.dat
101018 15:40:39 [Warning] WSREP: GALERA state restore failed
101018 15:40:39 [Note] WSREP: Found stored state:
00000000-0000-0000-0000-000000000000:-1
101018 15:40:39 [Note] WSREP: Configured state:
00000000-0000-0000-0000-000000000000:-1
101018 15:40:39 [Note] WSREP: gcomm: connecting to group
'my_wsrep_cluster', peer '192.168.0.60:'
101018 15:40:41 [Note] WSREP: gcomm: connected
101018 15:40:41 [Note] WSREP: Changing maximum message size 0 -> 32636
101018 15:40:41 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
101018 15:40:41 [Note] WSREP: Opened channel 'my_wsrep_cluster'
101018 15:40:41 [Note] WSREP: Successfully opened GCS connection to
my_wsrep_cluster
101018 15:40:41 [Note] /home/gen/mysql-5.1.44-galera-0.7.4-i386/mysql/
libexec/mysqld: ready for connections.
Version: '5.1.44' socket: '/home/gen/mysql-5.1.44-galera-0.7.4-i386/
mysql/var/mysqld.sock' port: 3306 wsrep_0.7.4
101018 15:40:41 [Note] WSREP: New COMPONENT: primary = yes, my_idx =
3, memb_num = 4
101018 15:40:41 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
101018 15:40:41 [Note] WSREP: STATE EXCHANGE: sent state msg: dc0911aa-
da71-11df-0800-ce9730cb3f38
101018 15:40:41 [Note] WSREP: STATE EXCHANGE: got state msg: dc0911aa-
da71-11df-0800-ce9730cb3f38 from 0 (Test63)
101018 15:40:41 [Note] WSREP: STATE EXCHANGE: got state msg: dc0911aa-
da71-11df-0800-ce9730cb3f38 from 1 (Test62)
101018 15:40:41 [Note] WSREP: STATE EXCHANGE: got state msg: dc0911aa-
da71-11df-0800-ce9730cb3f38 from 2 (Test60)
101018 15:40:41 [Note] WSREP: STATE EXCHANGE: got state msg: dc0911aa-
da71-11df-0800-ce9730cb3f38 from 3 (Test70)
101018 15:40:41 [Note] WSREP: Quorum results:
PRIMARY,
act_id = 201424,
conf_id = 21,
last_appl. = 0,
protocol = 0,
group UUID = e98a3735-d1ab-11df-0800-3b59052d1c54
101018 15:40:41 [Note] WSREP: Flow-control interval: [28, 55]
101018 15:40:41 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 201424)
101018 15:40:41 [Note] WSREP: New cluster view: group UUID: e98a3735-
d1ab-11df-0800-3b59052d1c54, conf# 22: Primary, number of nodes: 4, my
index: 3, first seqno: 201425
101018 15:40:41 [Warning] WSREP: Gap in state sequence. Need state
transfer.
101018 15:40:43 [Note] WSREP: State transfer required:
Group state: e98a3735-d1ab-11df-0800-3b59052d1c54:201424
Local state: 00000000-0000-0000-0000-000000000000:-1
101018 15:40:43 [Note] [DEBUG] WSREP: Prepared SST request: mysqldump|
192.168.0.70:3306
101018 15:40:43 [Note] WSREP: Node 3 (Test70) requested State Transfer
from '*any*'. Selected 0 (Test63)(SYNCED) as donor.
101018 15:40:43 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 201425)
101018 15:40:43 [Note] WSREP: Requesting state transfer: success,
donor 0
101018 15:40:47 [Warning] WSREP: 0(Test63): State transfer to
3(Test70) failed: -1 (Operation not permitted)
101018 15:40:47 [ERROR] WSREP: Will never receive state. Need to
abort.
101018 15:40:47 [Note] WSREP: gcomm: terminating thread
101018 15:40:47 [Note] WSREP: gcomm: joining thread
101018 15:40:47 [Note] WSREP: gcomm: closing backend
101018 15:40:47 [Warning] WSREP: failed socket 36 in poll set
101018 15:40:47 [Note] WSREP: gcomm: closed
101018 15:40:47 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this
binary
or one of the libraries it was linked against is corrupt, improperly
built,
or misconfigured. This error can also be caused by malfunctioning
hardware.
We will try our best to scrape up some info that will hopefully help
diagnose
the problem, but since we have already crashed, something is
definitely wrong
and this may fail.

key_buffer_size=8384512
read_buffer_size=131072
max_used_connections=3
max_threads=151
threads_connected=2
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads =
337741 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x0
Attempting backtrace. You can use the following information to find
out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = (nil) thread_stack 0x30000
/home/gen/mysql-5.1.44-galera-0.7.4-i386/mysql/libexec/
mysqld(my_print_stacktrace+0x21)[0x84e5031]
/home/gen/mysql-5.1.44-galera-0.7.4-i386/mysql/libexec/
mysqld(handle_segfault+0x3d9)[0x81f7f29]
[0x1b2400]
/lib/libc.so.6(abort+0x182)[0x468e42]
/home/gen/mysql-5.1.44-galera-0.7.4-i386/galera/lib/libgcs.so.
15(gcs_core_recv+0x11b6)[0x6164b6]
/home/gen/mysql-5.1.44-galera-0.7.4-i386/galera/lib/libgcs.so.
15(+0x11173)[0x619173]
/lib/libpthread.so.0(+0x5cc9)[0x5e5cc9]
/lib/libc.so.6(clone+0x5e)[0x50b6ae]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html
contains
information that should help you find out what is causing the crash.


This seems to show that 63 is in the cluster, but I just cannot
connect to it due to the password. 70 can't connect, but this seems
that it may be because it was receiving its state transfer from 63,
which did not work, possible for the same reason that I cannot connect
to 63.

I think the basic problem is in 63, could you suggest anything I can
try to work out what is going wrong? I can arrange access to the
machines if you like.

Alex Yurchenko

unread,
Oct 21, 2010, 12:01:13 PM10/21/10
to Jase, codership
Hi Jase,

Sorry for the late reply.

On Sun, 17 Oct 2010 21:52:32 -0700 (PDT), Jase <jasej...@gmail.com>
wrote:


>
> When I try to add 63, I get the following:
>
> gen@Test63:~$ /home/gen/mysql-5.1.44-galera-0.7.4-x86_64/mysql-galera -
> g gcomm://192.168.0.60 start
> Starting mysqld instance with data dir /home/gen/mysql-5.1.44-
> galera-0.7.4-x86_64/mysql/var and listening at port 3306 and socket /
> home/gen/mysql-5.1.44-galera-0.7.4-x86_64/mysql/var/mysqld.sock....
> Done (PID:3465)
> Waiting for wsrep_ready.................ERROR 1045 (28000): Access
> denied for user 'root'@'localhost' (using password: YES)
>
> I have tried this multiple times, with a fresh unpack of this tgz:
> mysql-5.1.44-galera-0.7.4-x86_64.tgz. If I start 63 without connecting
> it to the cluster on a fresh unpack, ie run:
>
> gen@Test63:~$ /home/gen/mysql-5.1.44-galera-0.7.4-x86_64/mysql-galera -
> g gcomm:// start
>
> The node starts and I can connect to mysql using:
>
> gen@Test63:~$ mysql -u root -p --protocol=TCP
>
> with the password rootpass.
>
> If I stop it and try to connect it to the cluster I get the above
> error. The strange thing is that in the logs it appears to have
> successfully joined the cluster:

Yes, the log shows that joining was a success. Apparently initially you
had different root passwords on the nodes, so when the snapshot
transferred mysql system tables, old password became obsolete. This was
initially viewed as a cluster feature: all nodes are supposed to be
identical, including permissions, to be fully interchangeable. Also SST
may fail if root passwords change during SST. So you have to manually
configure proper root access on the node before joining to cluster.

>
> The other strange thing is that I connected to 60, but in the logs it
> appears that the job of providing the state transfer has been
> delegated to 62, which is not what I want. I have 62 receiving
> transactions, and I don't want it to block them. 60 is acting as a
> reserve node, and I want it to provide the state transfer so that 62
> can keep serving requests.

There is wsrep_node_name option that lets you assign a logical name to
a node. Then you can use this value in wsrep_donor_name on another node
to request a specific donor.

> 101018 14:23:11 [Note] WSREP: Node 0 (Test63) requested State Transfer


> from '*any*'. Selected 1 (Test62)(SYNCED) as donor.

This is where it says that no donor was specified and the group
selected the first suitable.

> 101018 14:23:11 [Note] WSREP: Got GCS_ACT_STATE_REQ to 1, my idx: 1
> 101018 14:23:11 [Note] WSREP: Shifting SYNCED -> DONOR (TO: 199717)
> 101018 14:23:11 [Note] WSREP: Got state transfer request.
> 101018 14:23:11 [Note] [DEBUG] WSREP: Running: 'wsrep_sst_mysqldump
> 'root' 'rootpass' '192.168.0.63' '3306' '3306' 'e98a3735-
> d1ab-11df-0800-3b59052d1c54' '199717''
> 101018 14:25:36 [Note] WSREP: 0(Test63): State transfer from 1(Test62)
> complete.
> 101018 14:25:36 [Note] WSREP: Member 0 (Test63) synced with group.
> ERROR 1045 (28000): Access denied for user
> 'root'@'192.168.0.62' (using password: YES)
> 101018 14:25:36 [ERROR] WSREP: Process completed with error:
> wsrep_sst_mysqldump 'root' 'rootpass' '192.168.0.63' '3306' '3306'
> 'e98a3735-d1ab-11df-0800-3b59052d1c54' '199717': 1 (Operation not
> permitted)

And here we have two things:

1. A bug in the joiner side of a script which makes joiner to believe
that SST was a success (actually for most purposes it was). Should be
fixed in 0.7.5.
2. Donor-side part of the script fails on some "post-snapshot" actions
because root password has changed.

>
> Next I tried connecting another machine to the cluster, a 32 bit one
> in case there was a problem with the 64 bit version. This was machine
> 70 (32 bit ubuntu 10.10). Here are the logs from 70:
>

> 101018 15:40:43 [Note] WSREP: Requesting state transfer: success,
> donor 0
> 101018 15:40:47 [Warning] WSREP: 0(Test63): State transfer to
> 3(Test70) failed: -1 (Operation not permitted)
> 101018 15:40:47 [ERROR] WSREP: Will never receive state. Need to
> abort.

What it means is that donor tried to perform SST (and that's roughly
mysqldump -h<donor> | mysql -h<joiner>) and its script failed with -1.
Most usually this is due to wsrep_sst_auth option not reflecting real
access rights on the joiner (in this case 70). In other words, when 70
sent SST request, it included wrong authentication information in it.

> This seems to show that 63 is in the cluster, but I just cannot
> connect to it due to the password.

You should be able to connect there using the password from 62.

> 70 can't connect, but this seems
> that it may be because it was receiving its state transfer from 63,
> which did not work, possible for the same reason that I cannot connect
> to 63.
>
> I think the basic problem is in 63, could you suggest anything I can
> try to work out what is going wrong? I can arrange access to the
> machines if you like.

I believe 63 is working just fine. It now should have the same root
password as other nodes in the cluster. Most probably wsrep_sst_auth
option on 70 didn't contain the right root password.

The bottom line is:

For simplicity, when you want to add a node to a cluster, you first
must make sure it has the same root password as the other nodes and that
wsrep_sst_auth is configured accordingly.

As for the

> Waiting for wsrep_ready.................ERROR 1045 (28000): Access
> denied for user 'root'@'localhost' (using password: YES)

mysql-galera script defaults to use root:rootpass. When this changes
during the snapshot, it fails. To use another user/password use
MYSQL_USER=-umyuser MYSQL_PSWD=-pmypass env variables.

Regards,
Alex

Jase

unread,
Oct 23, 2010, 8:27:32 PM10/23/10
to codership
Hi Alex,
Sorry for the late reply, and thanks for your comprehensive response.

> Yes, the log shows that joining was a success. Apparently initially you
> had different root passwords on the nodes, so when the snapshot
> transferred mysql system tables, old password became obsolete. This was
> initially viewed as a cluster feature: all nodes are supposed to be
> identical, including permissions, to be fully interchangeable. Also SST
> may fail if root passwords change during SST. So you have to manually
> configure proper root access on the node before joining to cluster.

The thing is, I didn't have a different root password on the other
machines. I do run a grants script on the machines to give proper
access to my client application, but this does not set the root
password. It did originally, but I ran into a similar problem with
galera, and since the client application does not use the mysql root
account I decided to let the root password be rootpass for purposes of
testing galera. So, when I connect to either 60 or 62 I do so using
rootpass (and the username root). Therefore it is a bit strange that
an SST would reset the root password on the new node to something
other than rootpass.

Also, I thought that since the mysql table is MyIsam (I think, I can't
log into my machines just now for various reasons to check this) I
didn't think its contents would be replicated as I thought only InnoDB
tables were.

Since the problem came up I have a work around, which is that I start
the new node without connecting to the cluster and then stop it. Next
I copy the mysql directory (containing the mysql database) to another
location, and then restart the node, connecting it to the cluster.
Once it has finished the state transfer, and is, according to the
logs, in sync, I then stop it, copy the original mysql directory back
in over the top of the now modified (and with a root password that I
don't know) one, and restart, joining the cluster again. This works
for the moment, but still doesn't answer the question of why the
password is being changed to something other than rootpass.

> There is wsrep_node_name option that lets you assign a logical name to
> a node. Then you can use this value in wsrep_donor_name on another node
> to request a specific donor.

Thanks for this info, I will make these settings and report if there
is any problem (unlikely).

> > This seems to show that 63 is in the cluster, but I just cannot
> > connect to it due to the password.
>
> You should be able to connect there using the password from 62.

The password from 62 is rootpass.

> I believe 63 is working just fine. It now should have the same root
> password as other nodes in the cluster. Most probably wsrep_sst_auth
> option on 70 didn't contain the right root password.

Yes, 63 is working just fine, having copied in the mysql database with
correct password.

> mysql-galera script defaults to use root:rootpass. When this changes
> during the snapshot, it fails. To use another user/password use
> MYSQL_USER=-umyuser MYSQL_PSWD=-pmypass env variables.

If I reset these values (I assume these are the ones in the mysql-
galera startup script), then these will be the login details for the
machine being started (the joiner) right? Or in other words, the local
maching, not the machine that it will connect to?

Thanks very much,
Jase

Seppo Jaakola

unread,
Oct 24, 2010, 5:04:21 AM10/24/10
to codership

On 24 loka, 03:27, Jase <jasejon...@gmail.com> wrote:
>
> The thing is, I didn't have a different root password on the other
> machines. I do run a grants script on the machines to give proper
> access to my client application, but this does not set the root
> password. It did originally, but I ran into a similar problem with
> galera, and since the client application does not use the mysql root
> account I decided to let the root password be rootpass for purposes of
> testing galera. So, when I connect to either 60 or 62 I do so using
> rootpass (and the username root). Therefore it is a bit strange that
> an SST would reset the root password on the new node to something
> other than rootpass.
>
Galera does not specifically require to have root:rootpass account
available. MySQL accounts are needed just to run 'mysqldump' during
state snapshot transfer (SST) phase, when a new node is joining in the
cluster. You can specify any privileged MySQL account for SST, with
the 'wsrep_sst_auth' option.

Note that the SST process will take full database dump in the donor
and load this into the joining node. The database dump includes
'mysql' database as well, so joiner will inherit all MySQL user
accounts from donor.

You can check your actual accounting, .e.g. by:
mysql> select user, host, password from mysql.user;

Sometimes SST fails if SST account is not allowed to login remotely
from donor node. So, pay attention to 'host' part in the
authentication.

> Also, I thought that since the mysql table is MyIsam (I think, I can't
> log into my machines just now for various reasons to check this) I
> didn't think its contents would be replicated as I thought only InnoDB
> tables were.
>

Galera indeed provides InnoDB replication. But besides InnoDB, also
all DDL and DCL statements are replicated directly as SQL strings.
This is just for convenience: you can create an InnoDB table and start
using it and everything replicates immediately. You can also create
new users or grant privileges (.e.g. grant all on...) and the SQL
statement of this operation will be replicated and applied in all
cluster nodes.

However, the 'mysql' database is MyISAM and therefore direct
manipulation of mysql tables are not replicated. .i.e. statement
like: "update mysql.user...", will not be replicated.

-Seppo

Jase

unread,
Oct 24, 2010, 9:29:40 PM10/24/10
to codership
Ok, here is the current status of this issue. I could not use
wsrep_set_auth option because to do so it needs to be the same on all
nodes, and this would mean restarting them all, and using my work
around above of copying in the mysql database and restarting doesn't
work on a database that is having data added to it constantly (see:
http://groups.google.com/group/codership-team/browse_thread/thread/57648441c29f218)
as far as I know. I did not want to stop the constant flow of data in,
as this would be catastrophic in a real situation, so I was keen to
find a way to add a new node without having to set wsrep_set_auth to
be the same on all nodes (well, different from the default setting).

Using Seppo's answer I started the node, allowed it to get it's SST,
and then, as usual, could not log into it. Then on a working node I
ran:

mysql>grant all privileges on *.* to 'root'@'%' identified by
'rootpass';

This copied across to the new node, and I was then able to log in to
the new node, and my data has been adding the whole time without a
hitch.
Reply all
Reply to author
Forward
0 new messages