High Hopes

50 views
Skip to first unread message

Guy Riddle

unread,
Aug 10, 2017, 3:17:24 PM8/10/17
to Bedrock
So I carefully got 3 nodes up and running, then tried seeing how resilient it was to things like taking one node down to upgrade some software bits, then restarting it.  Or rebooting the Docker node hosting a Bedrock container.

Unfortunately most of the time it ended up stuck in a state complaining like this:

Aug  8 00:55:02 : xxxxx (SQLiteNode.cpp:747) update [sync] [warn] {us-east-2a/MASTERING} Rolling back transaction because everybody responded but not consistent enough.(This transaction would likely have conflicted.)
Aug  8 00:55:02 : xxxxx (BedrockConflictMetrics.cpp:46) recordConflict [sync] [info] Multi-write conflict recorded for Query
Aug  8 00:55:02 : xxxxx (BedrockServer.cpp:247) sync [sync] [info] [performance] Conflict committing in sync thread, requeueing command Query. Sync thread has 0 queued commands.
Aug  7 22:08:35 : xxxxx (SQLiteNode.cpp:1922) _queueSynchronize [sync] [warn] {us-east-2a/SEARCHING} [TY5] Hash mismatch. Peer at commit:15511 with hash C5E28D729715D1BC15A4A35B1196E4731575FDA0, but we have hash: 629A2E6C86E2488B7F1F2CAC6FF3E99C2559F5
Aug  7 22:08:35 : xxxxx (STCPNode.cpp:195) postPoll [sync] [warn] {us-east-2a} ->{us-east-1a} Error processing message 'SYNCHRONIZE' (hash mismatch), reconnecting:SYNCHRONIZE^M CommitCount: 15511^M Hash: C5E28D729715D1BC15A4A35B1196E4731575FDA0^M Transfe

Aug  7 21:37:39 : xxxxx (libstuff.cpp:2150) SQuery [sync] [dbug] SELECT query, hash FROM journal WHERE id = 16123 UNION SELECT query, hash FROM journal0000 WHERE id = 16123
Aug  7 21:37:39 : xxxxx (SQLiteNode.cpp:1922) _queueSynchronize [sync] [warn] {us-east-1a/SYNCHRONIZING} [TY5] Hash mismatch. Peer at commit:16123 with hash 66E930E06B631FD66F910A1B3BB81A15149C3F4D, but we have hash: A473943139B459136492FD57374CC3BF8C
Aug  7 21:37:39 : xxxxx (STCPNode.cpp:195) postPoll [sync] [warn] {us-east-1a} ->{us-east-2a} Error processing message 'SYNCHRONIZE' (hash mismatch), reconnecting:SYNCHRONIZE^M CommitCount: 16123^M Hash: 66E930E06B631FD66F910A1B3BB81A15149C3F4D^M Transfe
Aug  7 22:07:09 : xxxxx (SQLiteNode.cpp:1229) _onMESSAGE [sync] [hmmm] {us-east-1a/WAITING} ->{us-west-2c} Denying standup request because peer '100.121.34.215' is 'STANDINGUP'
Aug  7 22:07:09 : xxxxx (SQLiteNode.cpp:501) update [sync] [hmmm] {us-east-1a/WAITING} ->{us-west-2c} Multiple peers trying to stand up (also '100.121.34.215'), let's hope they sort it out.
Aug  7 21:37:47 : xxxxx (STCPNode.cpp:169) postPoll [sync] [dbug] {us-east-1a} ->{us-east-2a} Received 'SYNCHRONIZE': SYNCHRONIZE^M CommitCount: 16123^M Hash: 66E930E06B631FD66F910A1B3BB81A15149C3F4D^M Transfer-Encoding: ^M Content-Length: 0^M ^M
Aug  7 21:37:47 : xxxxx (libstuff.cpp:2150) SQuery [sync] [dbug] SELECT query, hash FROM journal WHERE id = 16123 UNION SELECT query, hash FROM journal0000 WHERE id = 16123
Aug  7 21:37:47 : xxxxx (SQLiteNode.cpp:1922) _queueSynchronize [sync] [warn] {us-east-1a/SYNCHRONIZING} [TY5] Hash mismatch. Peer at commit:16123 with hash 66E930E06B631FD66F910A1B3BB81A15149C3F4D, but we have hash: A473943139B459136492FD57374CC3BF8C
Aug  7 21:37:47 : xxxxx (STCPNode.cpp:195) postPoll [sync] [warn] {us-east-1a} ->{us-east-2a} Error processing message 'SYNCHRONIZE' (hash mismatch), reconnecting:SYNCHRONIZE^M CommitCount: 16123^M Hash: 66E930E06B631FD66F910A1B3BB81A15149C3F4D^M Transfe

Aug  7 21:06:33 : xxxxx (STCPNode.cpp:169) postPoll [sync] [dbug] {us-west-2c} ->{100.121.44.132} Received 'SYNCHRONIZE': SYNCHRONIZE^M CommitCount: 16132^M Hash: F437D3A6A8E373DD9BCA29A6BE1C263FB0A3036E^M Transfer-Encoding: ^M Content-Length: 0^M ^M
Aug  7 21:06:33 : xxxxx (libstuff.cpp:2150) SQuery [sync] [dbug] SELECT query, hash FROM journal WHERE id = 16132 UNION SELECT query, hash FROM journal0000 WHERE id = 16132
Aug  7 21:06:33 : xxxxx (SQLiteNode.cpp:1922) _queueSynchronize [sync] [warn] {us-west-2c/MASTERING} [TY5] Hash mismatch. Peer at commit:16132 with hash F437D3A6A8E373DD9BCA29A6BE1C263FB0A3036E, but we have hash: 5B17EBCDDD00C63ABCE1DA91C23582351E547D
Aug  7 21:06:33 : xxxxx (STCPNode.cpp:195) postPoll [sync] [warn] {us-west-2c} ->{100.121.44.132} Error processing message 'SYNCHRONIZE' (hash mismatch), reconnecting:SYNCHRONIZE^M CommitCount: 16132^M Hash: F437D3A6A8E373DD9BCA29A6BE1C263FB0A3036E^M Tra
Aug  7 21:12:37 : xxxxx (SQLiteNode.cpp:747) update [sync] [warn] {us-west-2c/MASTERING} Rolling back transaction because everybody responded but not consistent enough.(This transaction would likely have conflicted.)
Aug  7 21:12:37 : xxxxx (libstuff.cpp:2150) SQuery [sync] [dbug] ROLLBACK
Aug  7 21:12:37 : xxxxx (SQLite.cpp:446) rollback [sync] [info] Rollback successful.
Aug  7 21:12:37 : xxxxx (BedrockConflictMetrics.cpp:46) recordConflict [sync] [info] Multi-write conflict recorded for Query
Aug  7 21:12:37 : xxxxx (BedrockServer.cpp:247) sync [sync] [info] [performance] Conflict committing in sync thread, requeueing command Query. Sync thread has 4 queued commands.


I found no obvious way to fix it and get things working again.  None of the three Bedrock databases would process any database transactions.  All three ended up causing 500/503/504 errors to be returned for all HTTPS requests that interacted with the database.  They never recovered.  Oh well…
Reply all
Reply to author
Forward
0 new messages