Replication doesn't work even for demo db - GratefulDeadConcerts - version 1.7.4

176 views
Skip to first unread message

galina manashirova

unread,
Jul 2, 2014, 3:44:22 PM7/2/14
to orient-...@googlegroups.com
Started from scratch:
1. Downloaded version 1.7.4
2. Started server node1 in distributed mode (dserver)
3. Copied node1 directory as node2
4. changed nodeName in orientdb-dserver-config.xml on both nodes giving different names.
5. Started node2
    Both nodes see each other. I see in the console for one node:
    Members [2] {
        Member [10.32.10.72]:2434 this
        Member [10.32.10.72]:2435
    }


    And on the console of another node:
    Members [2] {
        Member [10.32.10.72]:2434
        Member [10.32.10.72]:2435 this
    }


they are definitely talk to each other. Except one of the nodes gave me an error:

2014-07-02 12:12:56:234 WARN [node2]->[[node1]] requesting deploy of database 'GratefulDeadConcerts' on local server... [OHazelcastPlugin]
2014-07-02 12:32:56:266 WARN [node2] timeout (1200001ms) on waiting for synchronous responses from nodes=[node1] responsesSoFar=[] request=id=0 from=n
ode2 task=deploy_db [OHazelcastDistributedDatabase]Exception in thread "main" com.orientechnologies.orient.server.distributed.ODistributedException: E
rror on sending distributed request against database 'GratefulDeadConcerts' to nodes [node1]

        at com.orientechnologies.orient.server.hazelcast.OHazelcastDistributedDatabase.send2Nodes(OHazelcastDistributedDatabase.java:194)
        at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.sendRequest(OHazelcastPlugin.java:364)
        at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.installDatabase(OHazelcastPlugin.java:813)
        at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.installNewDatabases(OHazelcastPlugin.java:767)
        at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.startup(OHazelcastPlugin.java:191)
        at com.orientechnologies.orient.server.OServer.registerPlugins(OServer.java:720)
        at com.orientechnologies.orient.server.OServer.activate(OServer.java:241)
        at com.orientechnologies.orient.server.OServerMain.main(OServerMain.java:32)
Caused by: com.orientechnologies.orient.server.distributed.ODistributedException: No response received from any of nodes [node1] for request id=0 from
=node2 task=deploy_db
        at com.orientechnologies.orient.server.distributed.ODistributedResponseManager.getFinalResponse(ODistributedResponseManager.java:395)
        at com.orientechnologies.orient.server.hazelcast.OHazelcastDistributedDatabase.waitForResponse(OHazelcastDistributedDatabase.java:422)
        at com.orientechnologies.orient.server.hazelcast.OHazelcastDistributedDatabase.send2Nodes(OHazelcastDistributedDatabase.java:191)
        ... 7 more


Even though right above that I see a log message saying that GratefulDatabase distributed configuration sees 2 nodes:

2014-07-02 12:12:56:216 INFO updated distributed configuration for database: GratefulDeadConcerts:
----------
{
  "version":2,
  "autoDeploy":true,
  "hotAlignment":false,
  "readQuorum":1,
  "writeQuorum":2,
  "failureAvailableNodesLessQuorum":false,
  "readYourWrites":true,"clusters":{
    "internal":null,
    "index":null,
    "*":{
  "servers":["<NEW_NODE>","node1","node2"]
}
    }
}
When I try to add or remove something from one node on that database nothing happens to another one.
Nothing gets replicated on database level.
Can someone please tell me what I am doing wrong?
I am not trying anything fancy with replication. This is just a basic replication task.
I tried replication in some earlier versions (don't remember now which one ) and it worked. Now I can't make it work.
We are trying to implement OrientDb for the one of our company product and if replication is not going to work we would have to look for something else.
Please let me know if I am doing something wrong.

Thank you.
-galina
   

galina manashirova

unread,
Jul 3, 2014, 2:05:44 PM7/3/14
to orient-...@googlegroups.com
Can anybody please help me with this or at least come up with a better tutorial in regards of replication.

-Galina

Chris Wilper

unread,
Jul 3, 2014, 5:30:51 PM7/3/14
to orient-...@googlegroups.com
Another data point:

I just tried configuring replication with two nodes on the same host with a fresh install of 1.7.4 on Windows and OSX, and I was also not successful. But I saw different problems than you did.

Steps I followed:
  1) Unpack the official distribution in two separate directories on the same host, one for node1 and one for node2
  2) Start node1 immediately by going into bin and running the dserver script
  3) Modify node2's config/hazelcast.xml file, changing the port element's value from 2434 to 2435
  4) Start node2

After this, from the console output I could see that both nodes recognized that they were part of the cluster and could see the other one.

But then I ran console.sh:

orientdb> connect remote:localhost/GratefulDeadConcerts admin admin

On Windows:
-------------------

It successfully connected, then showed me the DISTRIBUTED CONFIGURATION, which looked correct. Then I ran a simple query (SELECT COUNT(*) FROM V) successfully. Next, I tried stopping node2 to simulate node failure. Queries still worked fine. Then I restarted node2, and queries still worked as expected. Next, I tried stopping node1 and suddenly queries from the console failed with messages about not being able to connect. Then I exited and restarted the console. Same problem. Finally, I decided to stop the other node, restart both nodes, and restart the console. Immediately upon attempting to connect, I got the following:

Connecting to database [remote:localhost/GratefulDeadConcerts] with user 'admin'...
Error: com.orientechnologies.orient.core.exception.OConfigurationException: Database 'GratefulDeadConcerts' is not configured on server (home=C:\Users\user
\Downloads\cluster\node1/databases/)

Next I looked in the databases\GratefulDeadConcerts\ directory and saw there was a single file in there, distributed-config.json, but no data files. For either node. Uh oh...

On OS X:
--------------

It successfully connected, then said:
DISTRIBUTED CONFIGURATION: none (OrientDB is running in standalone mode)

...even though the nodes seem to think they're running in distributed mode.

--

Can anyone else reproduce these behaviors with a fresh 1.7.4 install?

Thanks,
Chris



--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

galina manashirova

unread,
Jul 3, 2014, 6:04:27 PM7/3/14
to orient-...@googlegroups.com
Chris;
I  also saw what you described. I didn't change port on node 2 to 2435 though. 
I know  for sure  multicast-port needs to stay the same for both nodes, but I am not sure about <port>. It maybe not needed to be set because of "auto-increment" ????
 And exactly the same - it seems like they are talking to each other , but nothing gets replicated.
What strange is that I tried replication in much earlier versions and it worked.
Maybe  I am doing something wrong and I would appreciate if someone would send  detail tutorial  how to make it work.
In  some tutorial pages it says that database MUST  to be copied from one node to another prior replication.
In other places someone was showing that when 2 nodes been set up for replication creating database on one node created it in another node.
It doesn't work for me.
Luca? Artem? Andrey?  - need help!!!

-Galina

galina manashirova

unread,
Jul 3, 2014, 7:59:41 PM7/3/14
to orient-...@googlegroups.com
Another test of replication :

1. Started node1
2. Started node2
Log file tells me that they are talking to each other.
I logged to the database (from console) in node1. Created a new class :

CREATE CLASS CUSTOMER EXTENDS  V
Nothing happened on node2.
Since it is Master to Master replication shouldn't it replicate right away?
I killed node1, then restarted node1 and only after that I could see my new CUSTOMER class on the console of node2.
So, replication happens only if one of the nodes is going down?

Is this expected behavior?

-Galina
 

On Thursday, July 3, 2014 2:30:51 PM UTC-7, Chris Wilper wrote:

Chris Wilper

unread,
Jul 4, 2014, 3:28:40 AM7/4/14
to orient-...@googlegroups.com
Update:

Ok, I haven't determined why I saw the odd behavior in Windows, but I *have* been able to successfully set up multiple nodes w/replication on OSX. After looking more carefully at the console output, I noticed on the Mac that orient was binding to an unfamiliar IP address. It turns out it was trying to connect via a virtual software network device (VMWare), and I believe this explains why I saw the odd behavior; after I shut down vmware, I was successful.

Here is a screecast showing how I got it working with two nodes: http://screencast.com/t/IiC5SIlUAk

I basically created two empty nodes, then connected and created a database and class, and added a record. It shows that the database was definitely created on both nodes (the database directory), and that if one node goes down, the other still provides access to the replicated record.

One thing I realized in this process was that it seems the first node you start on a given network device seems to have special status. I guess it is the one responsible for communicating which nodes it knows are available (including itself). So if you start node1, node2, and node3 all on the same host in that order, you can shut down nodes 2 and 3 just fine, but if you instead keep those running and try to shut down node1, you can't subsequently connect.However, if you restart any node, it will take over the role that node1 had and you can then connect to the cluster again. At least that's the behavior I think I'm observing. Does that sound right to anybody familiar with this? Any way to get around it?

Thanks,
Chris

Luca Garulli

unread,
Jul 4, 2014, 5:57:04 AM7/4/14
to orient-database
Hi Chris,
The screencast is cool, maybe we should provide a couple of them in the documentation.

Node1 is like the others, so this behavior is unexpected. May you open a new issue with a copy&paste of your email?

Lvc@

galina manashirova

unread,
Jul 7, 2014, 7:04:22 PM7/7/14
to orient-...@googlegroups.com
Chirs;
Thank you so much for screencast - great stuff. Helped a lot!
I followed the same steps, but on windows machine.
At the point when I created database People node1 throw Exception about node2 (see bellow)
Database been created only on node1 , node 2 has only one JSON file.
Is that the same issue you were able to fix by shutting VMWare?
I don't think I have VMWare running anywhere.
Does anyone know if there is another work around this problem?
Using version 1.7.4.

2014-07-07 15:54:01:645 INFO Sent updated cluster configuration to the remote client 127.0.0.1:50895 [OClientConnectionManager]Exception in thread "hz
._hzInstance_1_orientdb.cached.thread-1" java.lang.NullPointerException
        at com.orientechnologies.orient.server.OClientConnection.getRemoteAddress(OClientConnection.java:68)
        at com.orientechnologies.orient.server.OClientConnectionManager.pushDistribCfg2Clients(OClientConnectionManager.java:257)
        at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.entryUpdated(OHazelcastPlugin.java:575)
        at com.hazelcast.map.MapService.dispatchEvent(MapService.java:906)
        at com.hazelcast.map.MapService.dispatchEvent(MapService.java:70)
        at com.hazelcast.spi.impl.EventServiceImpl$EventPacketProcessor.process(EventServiceImpl.java:509)
        at com.hazelcast.spi.impl.EventServiceImpl$RemoteEventPacketProcessor.run(EventServiceImpl.java:535)
        at com.hazelcast.util.executor.StripedExecutor$Worker.run(StripedExecutor.java:142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
        at java.lang.Thread.run(Thread.java:662)
        at com.hazelcast.util.executor.PoolExecutorThreadFactory$ManagedThread.run(PoolExecutorThreadFactory.java:59)
[node1]<-[node2] error on reading distributed request: deploy_db

Thanks.
-galina

galina manashirova

unread,
Jul 7, 2014, 7:09:04 PM7/7/14
to orient-...@googlegroups.com
Hi Luca;

I was following the same steps as in Chris sreencast but  on Windows machine and getting this exception when I am trying to create database.
Exception is showing on the node1 , but talks about node 2.
Any idea how to make it work?

Thank you!
-Galina


2014-07-07 15:54:01:645 INFO Sent updated cluster configuration to the remote client 127.0.0.1:50895 [OClientConnectionManager]Exception in thread "hz
._hzInstance_1_orientdb.cached.thread-1" java.lang.NullPointerException
        at com.orientechnologies.orient.server.OClientConnection.getRemoteAddress(OClientConnection.java:68)
        at com.orientechnologies.orient.server.OClientConnectionManager.pushDistribCfg2Clients(OClientConnectionManager.java:257)
        at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.entryUpdated(OHazelcastPlugin.java:575)
        at com.hazelcast.map.MapService.dispatchEvent(MapService.java:906)
        at com.hazelcast.map.MapService.dispatchEvent(MapService.java:70)
        at com.hazelcast.spi.impl.EventServiceImpl$EventPacketProcessor.process(EventServiceImpl.java:509)
        at com.hazelcast.spi.impl.EventServiceImpl$RemoteEventPacketProcessor.run(EventServiceImpl.java:535)
        at com.hazelcast.util.executor.StripedExecutor$Worker.run(StripedExecutor.java:142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
        at java.lang.Thread.run(Thread.java:662)
        at com.hazelcast.util.executor.PoolExecutorThreadFactory$ManagedThread.run(PoolExecutorThreadFactory.java:59)
[node1]<-[node2] error on reading distributed request: deploy_db





Chris Wilper

unread,
Jul 10, 2014, 11:54:27 AM7/10/14
to orient-...@googlegroups.com
Hi Galina,

I finally got back to trying this in Windows and saw the exact same error (the stack trace followed by "error on reading distributed request: deploy_db". Then on a whim I searched the issues for windows and came up with this: 


So I tried setting orientdb_home as suggested (using forward slashes), and the final message "error on reading distributed request" no longer occurs, and things continue as expected after that. I also noticed that #2347 was just closed today, so it looks like the orientdb_home workaround will no longer be necessary with 1.7.5.

Note however that I still saw the stack trace. In fact, the same stack trace occurs when running in Windows, Mac, and Linux, and creating a class in a distributed configuration. On the surface, it doesn't appear to have a negative consequence. I've reported it as a separate issue with a screencast demo here:


- Chris

Luca Garulli

unread,
Jul 10, 2014, 12:50:05 PM7/10/14
to orient-database
Hi,
I've just closed this issue as last for 1.7.5 before the release.

Lvc@

galina manashirova

unread,
Jul 10, 2014, 6:26:18 PM7/10/14
to orient-...@googlegroups.com
Yes. 
Thanks to Luca this issue been fixed with latest 1.7.5 hot fix.
I was testing it all day today - several nodes on the same machine, several nodes on different machines. Works great,as expected. Everything  gets replicated !
Many thanks to Luca for fixing it in such a short time!
Chris, thank you for taking time to test in on Windows. And your screencasts are great!
So, in case if anyone has any issues with replications - first try the latest 1.7.5 Hot fix.

-Galina

Luca Garulli

unread,
Jul 11, 2014, 6:49:32 AM7/11/14
to orient-database
Thanks you all to helped me to fix replication on Windows.

Lvc@

Reply all
Reply to author
Forward
0 new messages