Error on cluster re-join after node shutdown

83 views
Skip to first unread message

Alexander Sahler

unread,
Dec 3, 2014, 7:37:27 AM12/3/14
to orient-...@googlegroups.com
Hello everyone.

I'm evaluating orientdb 2.0-SNAPSHOT (20141202.120339-430) and experienced a problem when working in a distributed environment:
I've got three servers running orientdb which are sharing one cluster of a certain class.

While doing an insert of 100.000 records I killed the first server (the one, the pool factory was configured with). All async db threads then throw an error causing the current insert operation to fail! After that the failover connection to the other server members took place and the inserts worked fine again.

Now, after bringing the killed server up again I'm experiencing errors, causing ALL of the subsequent transactions to fail:

Exception is: 
Sent run-time exception to the client /172.17.4.149:56076: com.orientechnologies.orient.core.exception.OStorageException: Cannot route CREATE_RECORD operation for #15:-1 to the distributed node
com.orientechnologies.orient.core.exception.OStorageException: Cannot route CREATE_RECORD operation for #15:-1 to the distributed node
        at com.orientechnologies.orient.server.distributed.ODistributedStorage.handleDistributedException(ODistributedStorage.java:1169)
        at com.orientechnologies.orient.server.distributed.ODistributedStorage.createRecord(ODistributedStorage.java:470)
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.executeSaveRecord(ODatabaseDocumentTx.java:1637)
        at com.orientechnologies.orient.core.tx.OTransactionNoTx.saveRecord(OTransactionNoTx.java:93)
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.save(ODatabaseDocumentTx.java:2176)
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.save(ODatabaseDocumentTx.java:2021)
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.save(ODatabaseDocumentTx.java:123)
        at com.orientechnologies.orient.server.network.protocol.binary.OBinaryNetworkProtocolAbstract.createRecord(OBinaryNetworkProtocolAbstract.java:333)
        at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.createRecord(ONetworkProtocolBinary.java:1295)
        at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.executeRequest(ONetworkProtocolBinary.java:303)
        at com.orientechnologies.orient.server.network.protocol.binary.OBinaryNetworkProtocolAbstract.execute(OBinaryNetworkProtocolAbstract.java:201)
        at com.orientechnologies.common.thread.OSoftThread.run(OSoftThread.java:65)
Caused by: com.orientechnologies.orient.server.distributed.ODistributedException: Error on inserting into cluster 'pin' where local node 'orientdb-p1' is not the master of it, but it's 'orientdb-p2'
        at com.orientechnologies.orient.server.distributed.ODistributedStorage.createRecord(ODistributedStorage.java:423)
        ... 10 more


Questions:
1. Shouldn't the driver transparently failover to other nodes in the cluster and retry the transaction when a node fails?
2. I think, bringing up the failed node again should not lead to errors and subsequent transactions to fail

Regards, Alexander

Alexander Sahler

unread,
Dec 3, 2014, 8:02:18 AM12/3/14
to orient-...@googlegroups.com
I found that the restarted node is not able to persist any items of the class in question any more after it came up again. Now I always get 

2014-12-03 14:01:01,340 | ERROR | ool-14-thread-21 | PinImporter                      | 196 - com.brodos.pinvault-orientdb - 1.1.0.SNAPSHOT | Error storing pin.
com.orientechnologies.orient.core.exception.OStorageException: Error on create record in cluster: 15
        at com.orientechnologies.orient.client.remote.OStorageRemote.handleException(OStorageRemote.java:1462)[212:com.orientechnologies.orientdb-client:2.0.0.SNAPSHOT]
        at com.orientechnologies.orient.client.remote.OStorageRemote.createRecord(OStorageRemote.java:393)[212:com.orientechnologies.orientdb-client:2.0.0.SNAPSHOT]
        at com.orientechnologies.orient.client.remote.OStorageRemoteThread.createRecord(OStorageRemoteThread.java:241)[212:com.orientechnologies.orientdb-client:2.0.0.SNAPSHOT]
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.executeSaveRecord(ODatabaseDocumentTx.java:1637)[213:com.orientechnologies.orientdb-core:2.0.0.SNAPSHOT]
        at com.orientechnologies.orient.core.tx.OTransactionNoTx.saveRecord(OTransactionNoTx.java:93)[213:com.orientechnologies.orientdb-core:2.0.0.SNAPSHOT]
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.save(ODatabaseDocumentTx.java:2176)[213:com.orientechnologies.orientdb-core:2.0.0.SNAPSHOT]
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.save(ODatabaseDocumentTx.java:2084)[213:com.orientechnologies.orientdb-core:2.0.0.SNAPSHOT]
        at com.brodos.pinvault.persistence.orient.PinvaultRepositoryImpl.save(PinvaultRepositoryImpl.java:145)[196:com.brodos.pinvault-orientdb:1.1.0.SNAPSHOT]
        at com.brodos.pinimport.business.manager.control.PinvaultImporter$PinImportJob.run(PinvaultImporter.java:91)[196:com.brodos.pinvault-orientdb:1.1.0.SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_55]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_55]
        at java.lang.Thread.run(Thread.java:745)[:1.7.0_55]
Caused by: java.lang.RuntimeException: com.orientechnologies.orient.core.exception.OStorageException: Cannot route CREATE_RECORD operation for #15:-1 to the distributed node
        at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.handleStatus(OChannelBinaryAsynchClient.java:404)[214:com.orientechnologies.orientdb-enterprise:2.0.0.SNAPSHOT]
        at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:279)[214:com.orientechnologies.orientdb-enterprise:2.0.0.SNAPSHOT]
        at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:171)[214:com.orientechnologies.orientdb-enterprise:2.0.0.SNAPSHOT]
        at com.orientechnologies.orient.client.remote.OStorageRemote.beginResponse(OStorageRemote.java:1800)[212:com.orientechnologies.orientdb-client:2.0.0.SNAPSHOT]
        at com.orientechnologies.orient.client.remote.OStorageRemote.createRecord(OStorageRemote.java:336)[212:com.orientechnologies.orientdb-client:2.0.0.SNAPSHOT]
        ... 10 more
Caused by: com.orientechnologies.orient.core.exception.OStorageException: Cannot route CREATE_RECORD operation for #15:-1 to the distributed node
        at sun.reflect.GeneratedConstructorAccessor68.newInstance(Unknown Source)[:1.7.0_55]
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)[:1.7.0_55]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)[:1.7.0_55]
        at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.createException(OChannelBinaryAsynchClient.java:145)[214:com.orientechnologies.orientdb-enterprise:2.0.0.SNAPSHOT]
        at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.handleStatus(OChannelBinaryAsynchClient.java:400)[214:com.orientechnologies.orientdb-enterprise:2.0.0.SNAPSHOT]
        ... 14 more
Caused by: com.orientechnologies.orient.core.exception.OStorageException: Error on inserting into cluster 'pin' where local node 'orientdb-p1' is not the master of it, but it's 'orientdb-p2'
        at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.createException(OChannelBinaryAsynchClient.java:138)[214:com.orientechnologies.orientdb-enterprise:2.0.0.SNAPSHOT]
        ... 15 more

on that node...

Regards, Alexander.

Luca Garulli

unread,
Dec 3, 2014, 2:08:14 PM12/3/14
to orient-database
Hi Alexander,
Please retry with last snapshot.

Lvc@

--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alexander Sahler

unread,
Dec 4, 2014, 4:50:47 AM12/4/14
to orient-...@googlegroups.com
Hi Lvc@.

Ya, still the same with orientdb-community-2.0-20141203.211002-441-distribution.zip.

Alexander.

--

---
You received this message because you are subscribed to a topic in the Google Groups "OrientDB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/orient-database/a2SCIg_nV6o/unsubscribe.
To unsubscribe from this group and all its topics, send an email to orient-databa...@googlegroups.com.

Luca Garulli

unread,
Dec 5, 2014, 12:49:49 PM12/5/14
to orient-database
Alexander,
Did you receive this exception with last snapshot?

 Caused by: com.orientechnologies.orient.server.distributed.ODistributedException: Error on inserting into cluster 'pin' where local node 'orientdb-p1' is not the master of it, but it's 'orientdb-p2'

Lvc@

Reply all
Reply to author
Forward
0 new messages