I also added trace logs for com.hazelcast.internal.partition
Looks like there is 4 successful migration parts like
> 2021-12-18 18:02:59 TRACE com.hazelcast.internal.partition.operation.MigrationRequestOperation [2a02:6b8:c0b:3e1c:0:1688:f829:2]:5701 [dev] [4.2.2] Invoking MigrationOperation for namespaces [DistributedObjectNamespace{service='hz:impl:mapService', objectName='serpSetLock'}] and MigrationInfo{uuid=906bb551-ce85-48f7-816e-c81e6edde40f, partitionId=4, source=[2a02:6b8:c0b:3e1c:0:1688:f829:2]:5701 - 47d87e75-eb4f-4b47-9a9b-7106ab1f4b94, sourceCurrentReplicaIndex=0, sourceNewReplicaIndex=3, destination=[2a02:6b8:c0c:4ca0:0:1688:f8f6:0]:5701 - fac9cfd8-0c35-496f-a2e6-fb95ebc85c01, destinationCurrentReplicaIndex=-1, destinationNewReplicaIndex=0, master=[2a02:6b8:c0b:3e1c:0:1688:f829:2]:5701, initialPartitionVersion=318, partitionVersionIncrement=2, status=ACTIVE}, lastFragment: false
> 2021-12-18 18:02:59 TRACE com.hazelcast.internal.partition.operation.MigrationOperation [2a02:6b8:c0c:4ca0:0:1688:f8f6:0]:5701 [dev] [4.2.2] ReplicaVersions are set after migration. MigrationInfo{uuid=906bb551-ce85-48f7-816e-c81e6edde40f, partitionId=4, source=[2a02:6b8:c0b:3e1c:0:1688:f829:2]:5701 - 47d87e75-eb4f-4b47-9a9b-7106ab1f4b94, sourceCurrentReplicaIndex=0, sourceNewReplicaIndex=3, destination=[2a02:6b8:c0c:4ca0:0:1688:f8f6:0]:5701 - fac9cfd8-0c35-496f-a2e6-fb95ebc85c01, destinationCurrentReplicaIndex=-1, destinationNewReplicaIndex=0, master=[2a02:6b8:c0b:3e1c:0:1688:f829:2]:5701, initialPartitionVersion=318, partitionVersionIncrement=2, status=ACTIVE}, namespace=DistributedObjectNamespace{service='hz:impl:mapService', objectName='serpSetLock'}, replicaVersions=[10896, 0, 0, 0, 0, 0]
And one that fails
>2021-12-18 18:02:59 TRACE com.hazelcast.internal.partition.operation.MigrationRequestOperation [2a02:6b8:c0b:3e1c:0:1688:f829:2]:5701 [dev] [4.2.2] Invoking MigrationOperation for namespaces [DistributedObjectNamespace{service='hz:impl:mapService', objectName='observationResultCache'}] and MigrationInfo{uuid=906bb551-ce85-48f7-816e-c81e6edde40f, partitionId=4, source=[2a02:6b8:c0b:3e1c:0:1688:f829:2]:5701 - 47d87e75-eb4f-4b47-9a9b-7106ab1f4b94, sourceCurrentReplicaIndex=0, sourceNewReplicaIndex=3, destination=[2a02:6b8:c0c:4ca0:0:1688:f8f6:0]:5701 - fac9cfd8-0c35-496f-a2e6-fb95ebc85c01, destinationCurrentReplicaIndex=-1, destinationNewReplicaIndex=0, master=[2a02:6b8:c0b:3e1c:0:1688:f829:2]:5701, initialPartitionVersion=318, partitionVersionIncrement=2, status=ACTIVE}, lastFragment: false
>2021-12-18 18:27:59 WARN com.hazelcast.internal.partition.operation.MigrationRequestOperation [2a02:6b8:c0b:3e1c:0:1688:f829:2]:5701 [dev] [4.2.2] Failure while executing MigrationInfo{uuid=906bb551-ce85-48f7-816e-c81e6edde40f, partitionId=4, source=[2a02:6b8:c0b:3e1c:0:1688:f829:2]:5701 - 47d87e75-eb4f-4b47-9a9b-7106ab1f4b94, sourceCurrentReplicaIndex=0, sourceNewReplicaIndex=3, destination=[2a02:6b8:c0c:4ca0:0:1688:f8f6:0]:5701 - fac9cfd8-0c35-496f-a2e6-fb95ebc85c01, destinationCurrentReplicaIndex=-1, destinationNewReplicaIndex=0, master=[2a02:6b8:c0b:3e1c:0:1688:f829:2]:5701, initialPartitionVersion=318, partitionVersionIncrement=2, status=ACTIVE} com.hazelcast.core.OperationTimeoutException: MigrationOperation invocation failed to complete due to operation-heartbeat-timeout. Current time: 2021-12-18 18:27:59.477. Start time: 2021-12-18 18:02:59.385. Total elapsed time: 1500092 ms. Last operation heartbeat: never. Last operation heartbeat from member: 2021-12-18 18:27:48.967
Is there any way to find where last migration hangs? I can collect stack traces, but I have no idea what to look for.
воскресенье, 19 декабря 2021 г. в 16:00:45 UTC+3, Volkov Sergey: