Hi, i am seeing the following warning while doing reshard and VDiff
Vitess: 12.0
i had total 6 nodes running 3 shards(1 primary,1 replica).
for testing resharding i initialised 4 more tablets on these nodes.(2 shards; 1 primary,1 replica)
On running reshard i see the following error every time.
-bash-4.2$ vtctlclient Reshard -source_shards=-55,55-aa,aa- -target_shards=-80,80- Create reverie.3_2_reshard_2
Waiting for workflow to start:
0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... 0% ... E0308 13:40:17.667613 5636 main.go:67] E0308 08:10:17.667309 vtctl.go:2564] workflow did not start within 30s
Reshard Error: rpc error: code = Unknown desc = workflow did not start within 30s
E0308 13:40:17.669435 5636 main.go:76] remote error: rpc error: code = Unknown desc = workflow did not start within 30s
On some of the vttablet i see the following warning
-bash-4.2$ W0308 13:39:47.646650 1673 component.go:41] [core] grpc: addrConn.createTransport failed to connect to {ip-10-24-194-115.ap-south-1.compute.internal:16100 ip-10-24-194-115.ap-south-1.compute.internal:16100 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: operation was canceled". Reconnecting...
W0308 13:39:47.646737 1673 component.go:41] [core] grpc: addrConn.createTransport failed to connect to {ip-10-24-203-179.ap-south-1.compute.internal:16100 ip-10-24-203-179.ap-south-1.compute.internal:16100 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: operation was canceled". Reconnecting...
tried telnet ip-10-24-194-115.ap-south-1.compute.internal 16100, it was working. Resharding operation continued progressing in the background and completed eventually. However, i did count(*) on one of the table on both source and destination shards and the sum was not matching.
-bash-4.2$ vtctlclient VDiff reverie.3_2_reshard_2
VDiff Error: rpc error: code = Unknown desc = diff: vttablet: rpc error: code = Unknown desc = unexpected EOF
io.ReadFull(packet body of length 1625) failed (errno 2013) (sqlstate HY000)
E0308 14:22:24.172574 5879 main.go:76] remote error: rpc error: code = Unknown desc = diff: vttablet: rpc error: code = Unknown desc = unexpected EOF
io.ReadFull(packet body of length 1625) failed (errno 2013) (sqlstate HY000)
Does anybody have any clue?