Cluster Creation in Janus - Using Node JS

116 views
Skip to first unread message

Shanmugapriya M

unread,
Nov 8, 2021, 8:43:26 AM11/8/21
to Gremlin-users
Hi , 

We are trying to create Janus cluster. We are following the below link for creating Janus cluster- https://tinkerpop.apache.org/docs/current/reference/#sparql-gremlin
This documentation has only details on Java & Groovy. However we are using Node JS as our application language . Could you please guide us how we can create and use clusters using Node JS.

Adding More information - We are reading data from Big query and using Node JS for the business logic and using Janus graph and Big table to store the data and gremlin visualizer to visualize the data. For improvising the performance , we wanted to create Janus cluster , but we dont find any documentation related to that. Kindly guide us 

HadoopMarc

unread,
Nov 8, 2021, 11:52:42 AM11/8/21
to Gremlin-users
Hi,

BigTable support is confirmed here:

Deployment options are found here:

Best wishes,    Marc

Op maandag 8 november 2021 om 14:43:26 UTC+1 schreef sppri...@gmail.com:

Shanmugapriya M

unread,
Nov 9, 2021, 12:06:53 AM11/9/21
to Gremlin-users
Thank you so much .We found the second link useful and we were referring to it . Could you please guide us the node JS libraries that has the features to support the clusters?

HadoopMarc

unread,
Nov 9, 2021, 2:28:25 AM11/9/21
to Gremlin-users
Information about using javascript to connect to Gremlin Server (also the Gremlin Server shipped with JanusGraph):


Marc

Op dinsdag 9 november 2021 om 06:06:53 UTC+1 schreef sppri...@gmail.com:

Shanmugapriya M

unread,
Nov 9, 2021, 5:29:21 AM11/9/21
to Gremlin-users
Hi Marc, 

Thank you ! This is exactly what we are using for single Janus graph. 
The point we are stuck with is -clustering of Janus graph. The idea is to scale up the current architecture with multiple JanusGraph servers to speed up the execution. We have three google VM instances where we have have configured the same JanusGraph server setup. Now, how can I distribute the data to these three JanusGraph Servers.We are using "gremlin" npm module, but it doesn't have the option to connect with all three servers at the same time.

Shanmugapriya M

unread,
Nov 9, 2021, 5:45:17 AM11/9/21
to Gremlin-users
Just to add , in the method you shared , how do we add  more than 1 server?

HadoopMarc

unread,
Nov 9, 2021, 11:18:05 AM11/9/21
to Gremlin-users
It is surprising, indeed, that load balancing over Gremlin Server instances never ended up in the JanusGraph documentation. The user list thread below, more or less, covers the subject:


Note that the gremlin npm module uses websockets for connecting, so load balancing can only be so granular as a websocket connection.

Best wishes,    Marc
Op dinsdag 9 november 2021 om 11:45:17 UTC+1 schreef sppri...@gmail.com:

Shanmugapriya M

unread,
Nov 10, 2021, 5:22:33 AM11/10/21
to Gremlin-users
Hi, 
We followed the suggestion provided ,as mentioned we tried to distribute the load between 2 Janus servers in 2 different VM's both pointing to same big table. We are using Janus  server 1  to create dynamically the graph and the other 2 servers are updating data into it . Despite pointing to same big table the other 2 servers are not able to access the graph 

HadoopMarc

unread,
Nov 10, 2021, 11:04:19 AM11/10/21
to Gremlin-users
Hi, how does the stacktrace for your connection attempt to BigTable look like?

Marc

Op woensdag 10 november 2021 om 11:22:33 UTC+1 schreef sppri...@gmail.com:

Shanmugapriya M

unread,
Nov 11, 2021, 5:27:20 AM11/11/21
to Gremlin-users
Thank you for the patience and help , i would share the stack trace but have another query now :-) ..
When we create the graph using one Janus server and create vertices and edges using the other .We observe that write operations are happening. But when we try to visualize the graph using the Gremlin visualizer from the second server it says the graph does not exist , while the first server it is able to read the graph.When we restart the server for the second one then try to visualize the graph then we are able to view it. 

So, the point is, we assume that graph is being rendered from the in-house memory of Janus server and not from the big table. Could you please advise how do we make the Janus server refer to the big table while reading the graph instead of the in-house memory 

HadoopMarc

unread,
Nov 11, 2021, 11:38:06 AM11/11/21
to Gremlin-users
On first sight I do not see a reason how you could experience this behavior:
  • using gremlin-javascript with Gremlin Server, writing transactions are committed automatically and immediately, so new data should be visible to other JanusGraph instances
  • Indeed, JanusGraph uses caching but this is at the level of vertex ids. If you do a full graph scan (g.V()), the backend is queried without using the cache. Even if you ask for specific vertices using an indexed property, the indexing and storage backend are accessed again if a vertex is not in the cache.
So it does not make sense to me how a restart of the reading server could make a difference. Even in a clean sheet situation with a cluster being spun up, the reading server would create a graph it there was none and the writing server would have to use it, if it came later. Strange things could maybe happen, if the writing server comes second and creates a schema on the graph not seen by the reading server?

Things to check anyway are:
  • do the writing and the reading server really have the same configs? In particular, check the graph names  and indexing backend configs.
  • ...
Marc

Op donderdag 11 november 2021 om 11:27:20 UTC+1 schreef sppri...@gmail.com:

Shanmugapriya M

unread,
Nov 22, 2021, 2:36:24 AM11/22/21
to Gremlin-users
Hi , 

We were able to cross the above issue removing the in-house memory from the configurations and also using configured graph factory command , but we are facing different issue now .Require your guidance here 
We tested the cluster portion with 3 Janus graph servers with same configurations. All three are hosted on Client VMs bound to one load balancer (nginx). Also, we deployed our NodeJS server with nginx and when we start the process after 3-4 batches it throws this error:-

/home/user_ak/nodejs_code/node_modules/ws/lib/websocket.js:270
throw err;
Error: WebSocket is not open: readyState 2 (CLOSING)

at WebSocket.ping (/home/user_ak/nodejs_code/node_modules/ws/lib/websocket.js:264:19)

at Timeout._onTimeout (/home/user_ak/nodejs_code/node_modules/gremlin/lib/driver/connection.js:229:16)
at listOnTimeout (internal/timers.js:557:17)
at processTimers (internal/timers.js:500:7)
[nodemon] app crashed - waiting for file changes before starting...



Reply all
Reply to author
Forward
0 new messages