SimplePath query is slower in 6 node vs 3 node Cassandra cluster

Varun Ganesh

unread,

Nov 24, 2020, 4:35:22 PM11/24/20

to JanusGraph users

Hello,

I am currently using Janusgraph version 0.5.2. I have a graph with about 18 million vertices and 25 million edges.

I have two versions of this graph, one backed by a 3 node Cassandra cluster and another backed by 6 Cassandra nodes (both with 3x replication factor)

I am running the below query on both of them:

g.V().hasLabel('label_A').has('some_id', 123).has('data.name', 'value1').repeat(both('sample_edge').simplePath()).until(has('data.name', 'value2')).path().by('data.name').next()

The issue is that this query takes ~130ms on the 3 node cluster whereas it takes ~400ms on the 6 node cluster.

I have tried running ".profile()" on both versions and the outputs are almost identical in terms of the steps and time taken.

g.V().hasLabel('label_A').has('some_id', 123).has('data.name', 'value1').repeat(both('sample_edge').simplePath()).until(has('data.name', 'value2')).path().by('data.name').limit(1).profile()

==>Traversal Metrics

Step Count Traversers Time (ms) % Dur

=============================================================================================================

JanusGraphStep([],[~label.eq(label_A), o... 1 1 4.582 0.39

\_condition=(~label = label_A AND some_id = 123 AND data.name = value1)

\_orders=[]

\_isFitted=true

\_isOrdered=true

\_query=multiKSQ[1]@8000

\_index=someVertexByNameComposite

optimization 0.028

optimization 0.907

backend-query 1 3.012

\_query=someVertexByNameComposite:multiKSQ[1]@8000

\_limit=8000

RepeatStep([JanusGraphVertexStep(BOTH,[... 2 2 1167.493 99.45

HasStep([data.name.eq(... 803.247

JanusGraphVertexStep(BOTH,[... 12934 12934 334.095

\_condition=type[sample_edge]

\_orders=[]

\_isFitted=true

\_isOrdered=true

\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311c

\_multi=true

\_vertices=264

optimization 0.073

backend-query 266 5.640

\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311c

optimization 0.028

backend-query 12689 312.544

\_query=org.janusgraph.diskstorage.keycolumnvalue.SliceQuery@812d311c

PathFilterStep(simple) 12441 12441 10.980

JanusGraphMultiQueryStep(RepeatEndStep) 1187 1187 11.825

RepeatEndStep 2 2 810.468

RangeGlobalStep(0,1) 1 1 0.419 0.04

PathStep([value(data.name)]) 1 1 1.474 0.13

>TOTAL - - 1173.969 -

I'd really appreciate some input on figuring out why the query is 3x slower on 6 nodes.

I realise that you may require more context. Happy to provide more information as required!

Thank you!

Varun Ganesh

unread,

Nov 24, 2020, 5:07:58 PM11/24/20

to JanusGraph users

Just an additional note, you may have noticed that the profile step above shows a time taken of >1000ms. I do not know why this is the case.

When run on the console without profile, it reflects the true time taken:

gremlin> clockWithResult(10) { graph.tx().rollback(); g.V().hasLabel('label_A').has('some_id', 123).has('data.name', 'value1').repeat(both('sample_edge').simplePath()).until(has('data.name', 'value2')).path().by('data.name').limit(1).next() }

==>130.9545608

Thanks!

BO XUAN LI

unread,

Nov 26, 2020, 11:19:32 AM11/26/20

to janusgra...@googlegroups.com

Hi,

> why the query is 3x slower on 6 nodes

Did you check the hardware differences? Probably the 6-node cluster has slower network, less memory, slower disk, etc.

Another possibility that I can think of is, the data involved in your query is probably distributed across nodes. Since your 3-node cassandra cluster has 3x replication factor, I would presume all data you have is available on every node. Then there would be fewer round-trips happening within the 3-node cluster.

Generally it makes sense to me that the latency of a small cluster is shorter than that of a large cluster, as long as both clusters are not fully loaded. Of course with larger cluster you can achieve higher throughput.

> the profile step above shows a time taken of >1000ms

This can be a bug in profiling. If you can provide a minimal example to reproduce, that would be very helpful.

Best regards,

Boxuan

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/6d2483f7-062a-4a95-98b2-6b4aafa87cd3n%40googlegroups.com.

Varun Ganesh

unread,

Nov 30, 2020, 2:23:59 PM11/30/20

to JanusGraph users

Hi Boxuan,

Thank you for getting back to me. Please find my responses below:

> Did you check the hardware differences?

Yes I can confirm that the two clusters are identical except for the number of nodes.

> the data involved in your query is probably distributed across nodes

This was our initial guess as well. However, if that was the case, we should technically observe this slowness for all the queries that we try. But it is only observed for "path" queries.

For instance, here's an example of another traversal query where we observe the SAME latency across the 3 and 6 node clusters:

g.V().hasLabel('label_B').has('some_id', 123).has('data.name', 1234567).both('sample_edge').valueMap('data.field1', 'data.field2').next(10)

> Then there would be fewer round-trips happening within the 3-node cluster

I also want to point out that we are not running the Janusgraph in embedded mode (where it is colocated with Cassandra), instead it is running separately on its own server nodes

> Of course with larger cluster you can achieve higher throughput

Interestingly we are not observing any difference in the throughput (i.e. the maximum queries per second that can be handled without seeing timeouts) between the two clusters

Would appreciate any input on where/how we could possibly investigate further.

Thank you!

Varun

Varun Ganesh

unread,

Dec 9, 2020, 8:46:34 AM12/9/20

to janusgra...@googlegroups.com

(I had previously posted this on the forum: https://groups.google.com/g/janusgraph-users/c/nkNFaFzdr4I. But I was hoping that I might get a bit more traction through the mailing list)

Thank you!

Reply all

Reply to author

Forward