Status: Accepted
Owner:
robert.h...@continuent.com
Labels: Type-Defect Priority-High FoundIn-2.2.1
New issue 983 by
robert.h...@continuent.com: trepctl flush command can hang
if heartbeat event is logged at end of MySQL binlog
http://code.google.com/p/tungsten-replicator/issues/detail?id=983
What steps will reproduce the problem?
1. Set up a MySQL server with a very small binlog size, e.g., 65K. Here's
the my.cnf setting:
max_binlog_size = 65K
2. Configure Tungsten master/slave replication with the aforesaid MySQL
server as the master.
3. Issue a series of flush commands using 'trepctl flush' or by calling the
flush() JMX API until the binlog turns over.
What is the expected output?
Flush commands should return the sequence number of the corresponding
heartbeat or a higher value.
What do you see instead?
Flush logged at the end of the binlog files results in the following error
when accessing through the JMX API:
[junit] junit.framework.AssertionFailedError: failed to exception:
java.lang.Exception: Flush operation failed: State transition failed
causing emergency recovery: state=ONLINE transition=FLUSH event=FlushEvent
What is the possible cause?
It appears that the logic to wait on log position in the replicator
pipeline is flawed.
What is the proposed solution?
Fix it!
Additional information
This error is fairly reproducible in system tests for the replicator, as
they use flush commands extensively. It could cause planned failover to
hang and/or crash, which means it has impact when Tungsten is used for
clustering.
Use labels and text to provide additional information.
--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings