Queue full error coming in distribution mode

120 views
Skip to first unread message

Gurpreet Singh

unread,
Feb 26, 2016, 12:01:17 AM2/26/16
to orient-...@googlegroups.com
Hi Guys,

I have setup 3 nodes orinetdb environment. In this, i have set 1-read quorum and 2 write quorum. At the beginning, everything working fine in distributed mode but after while it is showing error that is  "queue has too many messages (101), treating the node as in stall: trying to restart it... [OHazelcastDistributedDatabase]".

My two nodes automatically shutdown by orientdb and one node is still alive but its not accepting insert/update commands because of 2 write quorum.

i have restarted all servers but i didn't get success. 

Can you suggest valid configuration for that issue..?  how i can come out.

Luca Garulli

unread,
Feb 26, 2016, 1:37:32 AM2/26/16
to OrientDB
Hi,
Could you please upgrade to the last 2.1.11?

Best Regards,

Luca Garulli
Founder & CEO


On 26 February 2016 at 06:01, Gurpreet Singh <guro...@gmail.com> wrote:
Hi Guys,

I have setup 3 nodes orinetdb environment. In this, i have set 1-read quorum and 2 write quorum. At the beginning, everything working fine in distributed mode but after while it is showing error that is  "queue has too many messages (101), treating the node as in stall: trying to restart it... [OHazelcastDistributedDatabase]".

My two nodes automatically shutdown by orientdb and one node is still alive but its not accepting insert/update commands.

i have restarted all servers and started again but i didn't get success. 

Can you suggest valid configuration for that issue..?  how i can come out.

--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gurpreet Singh

unread,
Feb 27, 2016, 2:03:37 AM2/27/16
to OrientDB
Garulli,

I am using 2.1.5 right now, is there any problem with this version..?. Although its a big decision to migrate all data from 2.1.5 to 2.1.11. I have around 1 billion data.

Thanks,

Luca Garulli

unread,
Feb 27, 2016, 8:10:27 AM2/27/16
to OrientDB
Hi,
You don't need to migrate the database when you upgrade to a more recent hotfix (http://semver.org/).

Best Regards,

Luca Garulli
Founder & CEO


--

amit tomar

unread,
Mar 1, 2016, 5:38:15 AM3/1/16
to OrientDB
Hi Luca,

I am adding more information on same issue. We change the configuration and make read and write quorum both 1. we still hit the same issue.  Due to this we currently remove the two nodes from it. So now we have only one node and we are trying to put data on single node and once we have data on one node. We will add more nodes and replicate. btw we are upgrading to 2.1.11.

Now we are hitting one more issue. which seems to me very surprising to me. we have a unique SB tree index on one field(phoneNumber). We use java driver to connect to orientdb. We use transactions for create and update. after some time I see warning message in orientdb logs that select query based on phoneNumber is not using the index and doing a full scan, So we try to rebuild the index. The index rebuild does not complete with below message.


2016-03-01 06:57:47:505 INFO  --> 17.90% progress, 5,203,177 indexed so far (13,502 items/sec) [OIndexRebuildOutputListener]
2016-03-01 06:57:57:506 INFO  --> 18.42% progress, 5,353,955 indexed so far (15,077 items/sec) [OIndexRebuildOutputListener]
2016-03-01 06:58:07:507 INFO  --> 18.94% progress, 5,505,477 indexed so far (15,152 items/sec) [OIndexRebuildOutputListener]
2016-03-01 06:58:17:508 INFO  --> 19.46% progress, 5,655,608 indexed so far (15,013 items/sec) [OIndexRebuildOutputListener]
2016-03-01 06:58:21:807 INFO  --> OK, indexed 5,720,799 items in 365,708 ms [OIndexRebuildOutputListener]Exception during index 'contact_info.phoneNumber' creation
com
.orientechnologies.orient.core.index.OIndexException: Error on rebuilding the index for clusters: [contact_info_virt02, contact_info_virt102, contact_info_virt101, contact_info]
        at com
.orientechnologies.orient.core.index.OIndexAbstract.rebuild(OIndexAbstract.java:452)
        at com
.orientechnologies.orient.core.index.OIndexAbstract.create(OIndexAbstract.java:253)
        at com
.orientechnologies.orient.core.index.OIndexOneValue.create(OIndexOneValue.java:128)
        at com
.orientechnologies.orient.core.index.OIndexOneValue.create(OIndexOneValue.java:45)
        at com
.orientechnologies.orient.core.index.OIndexManagerShared.createIndex(OIndexManagerShared.java:169)
        at com
.orientechnologies.orient.core.index.OIndexManagerProxy.createIndex(OIndexManagerProxy.java:79)
        at com
.orientechnologies.orient.core.metadata.schema.OClassImpl.createIndex(OClassImpl.java:1671)
        at com
.orientechnologies.orient.core.sql.OCommandExecutorSQLCreateIndex.execute(OCommandExecutorSQLCreateIndex.java:266)
        at com
.orientechnologies.orient.core.sql.OCommandExecutorSQLDelegate.execute(OCommandExecutorSQLDelegate.java:90)
        at com
.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.executeCommand(OAbstractPaginatedStorage.java:1538)
        at com
.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.command(OAbstractPaginatedStorage.java:1519)
        at com
.orientechnologies.orient.server.distributed.ODistributedStorage.command(ODistributedStorage.java:315)
        at com
.orientechnologies.orient.core.command.OCommandRequestTextAbstract.execute(OCommandRequestTextAbstract.java:63)
        at com
.orientechnologies.orient.server.network.protocol.http.command.post.OServerCommandPostCommand.execute(OServerCommandPostCommand.java:81)
        at com
.orientechnologies.orient.server.network.protocol.http.ONetworkProtocolHttpAbstract.service(ONetworkProtocolHttpAbstract.java:180)
        at com
.orientechnologies.orient.server.network.protocol.http.ONetworkProtocolHttpAbstract.execute(ONetworkProtocolHttpAbstract.java:627)
        at com
.orientechnologies.common.thread.OSoftThread.run(OSoftThread.java:77)
Caused by: com.orientechnologies.orient.core.storage.ORecordDuplicatedException: Cannot index record #11:2574995: found duplicated key '966532799170' in index 'contact_info.phoneNumber' previously assigned to the record #11:1351058 RID=#11:1351058
        at com
.orientechnologies.orient.core.index.OIndexUnique.put(OIndexUnique.java:70)
        at com
.orientechnologies.orient.core.index.OIndexUnique.put(OIndexUnique.java:34)
        at com
.orientechnologies.orient.core.index.OIndexAbstract.populateIndex(OIndexAbstract.java:962)
        at com
.orientechnologies.orient.core.index.OIndexAbstract.indexCluster(OIndexAbstract.java:1049)
        at com
.orientechnologies.orient.core.index.OIndexAbstract.rebuild(OIndexAbstract.java:434)
       
... 16 more


The error message simply says we have a duplicate record. I checked the data. The duplicate record exists. The only difference in two record is @version. The value of version in two records are 2 and 6. When we do insert and update if some issue happened we rollback the transaction, But the database does not seems to do that. now the situation is we can not rebuild the index because we have duplicate record, And without index it a full scan. I don't feel upgrading to 2.1.11 will help in this.

And when we already have a unique index in schema. How this duplicate records save in our db at first place. Your prompt response will be a great help.

Amit

Gurpreet Singh

unread,
Mar 2, 2016, 7:05:25 AM3/2/16
to OrientDB
Hi Garulli,

i have upgraded to 2.1.11 for 3-node cluster setup. I have set read/write quorum=1. I am facing new issue "split-brain" problem.

Here is logs:

2016-03-02 10:08:49:191 SEVER [virt102] detected possible split brain network where 2 groups of servers A[virt02] and B[virt102] have different contents. Cannot decide who is the winner even if the quorum (1) has been reached. Request (id=4254258 from=virt102 task=tx user=#5:3) responses: A=TX[result=3, locks=2], B=TX[result=3, locks=2] [ODistributedResponseManager]
2016-03-02 10:08:49:198 SEVER [virt102] detected possible split brain network where 2 groups of servers A[virt02] and B[virt102] have different contents. Cannot decide who is the winner even if the quorum (1) has been reached. Request (id=4254260 from=virt102 task=tx user=#5:3) responses: A=TX[result=3, locks=2], B=TX[result=3, locks=2] [ODistributedResponseManager].


Any suggestion on this ..?



On Friday, 26 February 2016 10:31:17 UTC+5:30, Gurpreet Singh wrote:

Luca Garulli

unread,
Mar 2, 2016, 6:53:38 PM3/2/16
to OrientDB
Hi,
To avoid any inconsistency, please set the writeQuorum to the majority of nodes:
  • With 1 node it's 1
  • With 2 nodes it's 2
  • With 3 nodes it's 2
  • With 4 nodes it's 3
  • With 5 nodes it's 3



Best Regards,

Luca Garulli
Founder & CEO


--

Gurpreet Singh

unread,
Mar 3, 2016, 10:54:23 PM3/3/16
to orient-...@googlegroups.com
Hi Garulli,

i did as you said, i set write quorum=2. i am facing two issues after those changes:

1.  Reached maximum number of concurrent connections (max=1000, current=1000), reject incoming connection from /0:0:0:0:0:0:0:1:45679 [OServerNetworkListener]. My application closing connection properly by OrientBaseGraph.Shutdown() method and i have set max pool = 50. Its very awkward error.

2.  timeout (15201ms) on waiting for synchronous responses from nodes.


Its coming up very frequently.



On Friday, 26 February 2016 10:31:17 UTC+5:30, Gurpreet Singh wrote:
Reply all
Reply to author
Forward
0 new messages