Called not_in_right in state 0 at /usr/bin/pt-table-sync line 4403

999 views
Skip to first unread message

Erik Osterman

unread,
Oct 24, 2011, 1:48:22 PM10/24/11
to Percona Discussion
PROBLEM:
We're running a little bit of an unconventional setup due to certain
restrictions outside of our control. I want to run pk-table-sync to
ensure tables are the same since we just replicated about 9GB of bin
logs. Running pt-table-sync results in the error "not_int_right in
state 0" and I was hoping someone could shed some light on the error.

SETUP:
* Master is running Mysql 5.0.44 (and is itself a slave of another
master)
* Slave is running Mysql 5.1.42
* Replication is running over an SSH tunnel bound to localhost:3307
* Slave is caught up (0 seconds behind master) and both
Master_Log_File, Exec_Master_Log_Pos match the master
* Percona Toolkit percona-toolkit-1.0.1-1

COMMAND:
pt-table-sync --verbose --print --wait 0 --no-check-master --execute
--noforeign-key-checks --sync-to-master
D=tv,t=comment_detail,u=root,p=,h=localhost,P=3306;

ERRORS:

# Syncing D=tv,P=3306,h=localhost,p=...,t=comment_detail,u=root
# DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT
DATABASE.TABLE
Called not_in_right in state 0 at /usr/bin/pt-table-sync line 4403.
while doing tv.comment_detail on localhost


Due to the ssh-tunnel, we are using "--no-check-master" because
without master verification fails (seems to be known issue).

Without "--wait 0" it complains:
MASTER_POS_WAIT() returned NULL. Verify that the slave is running.
Sleeping HASH(0xffb190) seconds then retrying 2 more times. at /usr/
bin/pt-table-sync line 5442.

I know replication is running both because "Seconds_Behind_Master" is
zero, but more empirically because very recent data has been
replicated successfully to the slave.

I've used mk-table-sync extensively in the past (about 3 years ago),
but this is my first time using the new Percona Toolkit.


Thanks,


Erik Osterman



Baron Schwartz

unread,
Oct 25, 2011, 5:26:24 AM10/25/11
to percona-d...@googlegroups.com
Erik,

There is no code difference from mk-table-sync to pt-table-sync in this
initial PT release, so I suspect something like a race condition is
happening. However, it is also probably a bug in the tool. To
diagnose the problem we would probably need to study the output of
MKDEBUG=1 pt-table-sync <options> very carefully. Can you look at that
and see if you can gain any insight?


--
Chief Performance Architect at Percona <http://www.percona.com/>
+1 (888) 401-3401 x507
Calendar: <http://bit.ly/baron-percona-cal> (Eastern Time)
Percona Live Conference comes to London! <http://www.percona.com/live>

Erik Osterman

unread,
Oct 25, 2011, 7:28:05 PM10/25/11
to percona-d...@googlegroups.com
Baron,

> There is no code difference from mk-table-sync to pt-table-sync in this
> initial PT release, so I suspect something like a race condition is

For clarification, this was a long time ago on an old release and traditional master-slave setup without ssh tunnels.


>
> happening. However, it is also probably a bug in the tool. To
> diagnose the problem we would probably need to study the output of
> MKDEBUG=1 pt-table-sync <options> very carefully. Can you look at that
> and see if you can gain any insight?
>

Thanks for the tip about MKDEBUG=1. Now I can see what's going on!

The problem stems from the fact we'd set MASTER_HOST="localhost" instead of MASTER_HOST="127.0.0.1".

The mysql driver (libmysqlclient) treats localhost implicitly as a socket (e.g. /var/lib/mysql/mysql.sock), ignoring the port declarations. So, when the pt-table-sync establishes a connection to "localhost:3307" based on Master_Host and Master_Port, it's really using the local socket and never connecting to the remote host.

So, for example, MASTER_POS_WAIT was caught waiting to catch up to itself (the slave), but since the slave was obviously not replicating with itself, it would never catch up.

The error "Called not_in_right in state 0" was due to a dead lock trying to lock a chunk of rows on itself, instead of on the true master.

This also explains some other oddities we were seeing with the replication.


Thanks,

Erik Osterman


Reply all
Reply to author
Forward
0 new messages