change gremlin query titangraph

160 views
Skip to first unread message

Mustafa Tinwala

unread,
Mar 31, 2017, 6:52:35 AM3/31/17
to Gremlin-users
first of all i have created a  binary tree using titan graph having

code like this of 10000 users

{

       
TitanGraph graph = TitanFactory.open(conf);
       
       
        System.out.println("titan graph open");
       
        System.out.println("time= "+ (System.currentTimeMillis()-q));
       
        graph.tx().rollback();
       
        TitanManagement mgmt = graph.openManagement();
       
        EdgeLabel childs = mgmt.makeEdgeLabel("childs").multiplicity(Multiplicity.MULTI).make();
       
        PropertyKey nodeId = mgmt.makePropertyKey("NodeId").dataType(String.class).make();
       
        mgmt.buildIndex("byNodeId", Vertex.class).addKey(nodeId).unique().buildCompositeIndex();
       
        mgmt.commit();
       
        System.out.println(graph);
       
        GraphTraversalSource g = graph.traversal();
       
        System.out.println(g);
       
        System.out.println("vertex creation started");
       
        int n = 10000;

        for (int i = 0; i < n; i++)
        {
            System.out.println(i);
            graph.addVertex("NodeId", i);
            graph.tx().commit();
        }

        int k = (n / 2);

        for (int j = 0; j < k; j++)
        {
            Vertex v = g.V().has("NodeId", j).next();
            System.out.println(v);
           
            v.addEdge("childs", g.V().has("NodeId", (j * 2 + 1)).next());
            v.addEdge("childs", g.V().has("NodeId", (j * 2 + 2)).next());

            graph.tx().commit();
        }



        System.out.println("Graph is created.....");
       
        System.out.println("graph is " + graph);
       
        Vertex fromNode = g.V().has("NodeId", 0).next();
       
        System.out.println("Node for calculating its child nodes : " + fromNode.property("NodeId"));
       
        long st = System.currentTimeMillis();

        System.out.println(g.V(fromNode).repeat(out("childs").dedup()).emit().count().next());
        //System.out.println(g.V(fromNode).repeat(out("childs")).emit().until(out("childs").count().is(0)).count().next());

        System.out.println("time : " + (System.currentTimeMillis() - st) + " milliseconds");

        graph.close();
        System.exit(0);

}



Now there is a code to retrieve only children node in dynamodb so here is the code with query i have to just change in query highligted with bold letter to decrease for througput



TitanGraph graph = TitanFactory.open(conf);
       
       
       
        System.out.println("titan graph open");
       
        System.out.println("time= "+ (System.currentTimeMillis()-q));

        graph.tx().rollback();
       
        System.out.println(graph);
       
        GraphTraversalSource g = graph.traversal();
       
        System.out.println(g);
       
        Vertex fromNode = g.V().has("NodeId", 0).next();
       
        System.out.println("Node for calculating its child nodes : " + fromNode.property("NodeId"));

System.out.println(g.V(fromNode).repeat(out("childs").dedup()).emit().count().next());



need some help
thank you in advcance

HadoopMarc

unread,
Mar 31, 2017, 3:08:09 PM3/31/17
to Gremlin-users
Hi Mustafa,

I see two possible problems in your code:

1. g.V() takes id's as argument, so I would expect:
  Int fromNodeId = g.V().has("NodeId", 0).id().next();

2. Infinite loops with repeat() are allowed, but then you must be sure that your traversal does not hit two vertices with mutual out links. Adding a times(10) step won't hurt, just for safety.

HTH,   Marc

Op vrijdag 31 maart 2017 12:52:35 UTC+2 schreef Mustafa Tinwala:

Mustafa Tinwala

unread,
Apr 1, 2017, 3:39:02 AM4/1/17
to Gremlin-users
Retrieval code:--

 

TitanGraph graph = TitanFactory.open(conf);
       
       
       
        System.out.println("titan graph open");
       
        System.out.println("time= "+ (System.currentTimeMillis()-q));

        graph.tx().rollback();
       
        System.out.println(graph);
       
        GraphTraversalSource g = graph.traversal();
       
        System.out.println(g);
       
        Vertex fromNode = g.V().has("NodeId", 0).next();
       
        System.out.println("Node for calculating its child nodes : " + fromNode.property("NodeId"));

System.out.println(g.V(fromNode).repeat(out("childs").dedup()).emit().count().next());



 

need some help
thank you in advcance


 int fromNodeId = (int) g.V().has("NodeId", 0).id().next();

if i run this code in place of

Vertex fromNode = g.V().has("NodeId", 0).next();



i'm getting the following error

Exception in thread "main" java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer
    at com.graph_retrieve.main(graph_retrieve.java:178)

Please help me how to sought it...!

Mustafa Tinwala

unread,
Apr 1, 2017, 4:36:50 AM4/1/17
to Gremlin-users


also i have another question

if there is n number of nodes means my  binary graph is increasing so how to use times() ?

HadoopMarc

unread,
Apr 1, 2017, 7:17:49 AM4/1/17
to Gremlin-users
Hi Mustafa,

My error (I onlychecked in TinkerGraph which returns int id's). From the error message you that fromNodeId needs type Long for Titan.

Cheers,    Marc

Op zaterdag 1 april 2017 09:39:02 UTC+2 schreef Mustafa Tinwala:

HadoopMarc

unread,
Apr 1, 2017, 7:25:27 AM4/1/17
to Gremlin-users
HI Mustafa,

times() is an optional part of the repeat() step:

http://tinkerpop.apache.org/docs/current/reference/#repeat-step

So, you can protect your query against unwanted infinite loops by writing:

System.out.println(g.V(fromNodeId).repeat(out("childs").dedup()).times(10).emit().count().next());


Cheers,    Marc

Op zaterdag 1 april 2017 10:36:50 UTC+2 schreef Mustafa Tinwala:

Mustafa Tinwala

unread,
Apr 1, 2017, 8:11:47 AM4/1/17
to Gremlin-users
if there is n number of nodes means my  binary graph is increasing so how to use times() ?.



Surely i will check it

btw thanks a lot for this..! :)
 

Mustafa Tinwala

unread,
Apr 1, 2017, 8:40:12 AM4/1/17
to Gremlin-users


Actually i was telling i have used this query in gremlin console

like this


clockWithResult{ g.V(fromNode).repeat(__.out("childs").dedup()).times(13).emit().count().next()}

==>31.84179384
==>9999


and it gives child vertex count  =9999   in  31 sec.

Now my question again is  how to decrease time


my system configuration for your information is

pentium g2020

ram = 5.5 gb

o.s = ubuntu 16.04

HadoopMarc

unread,
Apr 1, 2017, 10:38:24 AM4/1/17
to Gremlin-users
Hi Mustafa,

I still miss the fromNodeId instead of the fromNode. The results look like Titan interpreted your query as:

clockWithResult{ g.V().repeat(__.out("childs").dedup()).times(13).emit().count().next()}

Still, I do not expect to wait 30 secs for just 10.000 edges traversed once. What are the results for:

clockWithResult{ g.V(fromNodeId).repeat(__.out("childs").dedup()).times(13).emit().count().next()}
clockWithResult{ g.V(fromNodeId).out("childs").count().next()}
clockWithResult{ g.V().out("childs").count().next()}

Cheers,    Marc


Op zaterdag 1 april 2017 14:40:12 UTC+2 schreef Mustafa Tinwala:

Mustafa Tinwala

unread,
Apr 3, 2017, 12:57:14 AM4/3/17
to Gremlin-users



As your post says fromNodeId

but how to declare in gremlin console i dont know

hey but yeah im still trying with this query
 
clockWithResult{ g.V(fromNode).repeat(out("childs")).times(12).emit().count().next()}
==>18.389357519999997
==>8190


 
 

Mustafa Tinwala

unread,
Apr 3, 2017, 1:03:24 AM4/3/17
to Gremlin-users
and yeah as you say i have tried like this also in java

 long fromNodeId = (long) g.V().has("NodeId", 0).id().next();


but it shows me an error as follows

Exception in thread "main" java.lang.NullPointerException
    at org.apache.tinkerpop.gremlin.process.traversal.step.branch.RepeatStep.standardAlgorithm(RepeatStep.java:143)
    at org.apache.tinkerpop.gremlin.process.traversal.step.util.ComputerAwareStep.processNextStart(ComputerAwareStep.java:47)
    at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:140)
    at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:54)
    at org.apache.tinkerpop.gremlin.process.traversal.step.branch.RepeatStep$RepeatEndStep.standardAlgorithm(RepeatStep.java:230)
    at org.apache.tinkerpop.gremlin.process.traversal.step.util.ComputerAwareStep.processNextStart(ComputerAwareStep.java:47)
    at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:140)
    at org.apache.tinkerpop.gremlin.process.traversal.step.branch.RepeatStep.standardAlgorithm(RepeatStep.java:143)
    at org.apache.tinkerpop.gremlin.process.traversal.step.util.ComputerAwareStep.processNextStart(ComputerAwareStep.java:47)
    at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:140)
    at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.hasNext(ExpandableStepIterator.java:44)
    at org.apache.tinkerpop.gremlin.process.traversal.step.util.ReducingBarrierStep.processNextStart(ReducingBarrierStep.java:86)
    at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:126)
    at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:37)
    at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.next(DefaultTraversal.java:127)
    at com.graph_retrieve.main(graph_retrieve.java:182)


 

 
 

Mustafa Tinwala

unread,
Apr 3, 2017, 1:55:56 AM4/3/17
to Gremlin-users


hey btw your query tells

clockWithResult{ g.V().repeat(__.out("childs")).times(12).emit().count().next()}

tells

WARN  com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
 

HadoopMarc

unread,
Apr 3, 2017, 3:59:50 AM4/3/17
to Gremlin-users
Hi Mustafa,

OK, you declared property "NodeId" as string, but you provide the addVertex method with an int 0, this is not consistent. I did not notice that before. If you correct that either way,  g.V().has("NodeId", 0).id().next() or  g.V().has("NodeId", "0").id().next() should return an id.

Titan is very generous with the index warning, it does not sense ones intent. For a g.V() query you just expect Titan not to use any index and the warning can be ignored.

Cheers,     Marc

Op maandag 3 april 2017 07:55:56 UTC+2 schreef Mustafa Tinwala:

Jason Plurad

unread,
Apr 3, 2017, 9:19:18 AM4/3/17
to Gremlin-users
> clockWithResult{ g.V(fromNode).repeat(__.out("childs").dedup()).times(13).emit().count().next()}
> ==>31.84179384
> ==>9999
> and it gives child vertex count  =9999   in  31 sec.

You are incorrect on the units here. That is 31 milliseconds. For example:

gremlin> clockWithResult{ Thread.sleep(10) }
==>11.67529047
==>null

The method clockWithResults runs the operation 100 times if you don't explicitly specify a number of loops.

gremlin> clockWithResult(5, { Thread.sleep(10) })
==>11.5913306
==>null

Mustafa Tinwala

unread,
Apr 3, 2017, 1:23:54 PM4/3/17
to Gremlin-users
What you say might b right but Actually Jason it takes 31 second as I noticed

But I have a question for you
Is it fine that my system configuration takes this much time as I have to deploy it on aws ?

BTW thaks

Robert Dale

unread,
Apr 3, 2017, 1:45:14 PM4/3/17
to gremli...@googlegroups.com
Correct, you would have experienced 3 seconds of user time because 31ms average per query * 100 queries = 3.1s.

Robert Dale


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/dba2e4b2-ebe7-4400-ad17-1c0db856da9c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Robert Dale

unread,
Apr 3, 2017, 1:45:57 PM4/3/17
to gremli...@googlegroups.com
Or even * 1000 queries = 31s

Robert Dale

On Mon, Apr 3, 2017 at 1:45 PM, Robert Dale <rob...@gmail.com> wrote:
Correct, you would have experienced 3 seconds of user time because 31ms average per query * 100 queries = 3.1s.

Robert Dale

Mustafa Tinwala

unread,
Apr 3, 2017, 11:28:40 PM4/3/17
to Gremlin-users
Thanks guys

But I have a problem.

What if my graph is increasing and how to set times(n)

How to use accurate times() so that the query run fast and efficiently

Thank a lot guys...! :)

Mustafa Tinwala

unread,
Apr 4, 2017, 12:33:49 AM4/4/17
to Gremlin-users



Yeah jason as you said it gives me like this

clockWithResult{ Thread.sleep(10) }
==>10.12367083
==>null
gremlin> clockWithResult(5, { Thread.sleep(10) })
==>10.1140054
==>null
 
so was i wrong   ?   in case of 31 seconds of time my query takes  ... !  :)
Message has been deleted

Mustafa Tinwala

unread,
Apr 4, 2017, 1:39:26 AM4/4/17
to Gremlin-users


On Monday, 3 April 2017 23:15:57 UTC+5:30, Robert Dale wrote:
Or even * 1000 queries = 31s

Robert Dale

On Mon, Apr 3, 2017 at 1:45 PM, Robert Dale <rob...@gmail.com> wrote:
Correct, you would have experienced 3 seconds of user time because 31ms average per query * 100 queries = 3.1s.

Robert Dale

On Mon, Apr 3, 2017 at 1:23 PM, Mustafa Tinwala <mustafa.ti...@gmail.com> wrote:
What you say might b right but Actually Jason it takes 31 second as I noticed

But I have a question for you
Is it fine that my system configuration takes this much time as I have to deploy it on aws ?

BTW thaks

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.



Oh   :)  but  i cant understand what is the actual time d query takes


but when i run this queruy it shows me like this  



 

t = System.currentTimeMillis();
==>1491284142148
gremlin> g.V(fromNode).repeat(out("childs")).times(12).emit().count().next();
==>8190
gremlin> t = System.currentTimeMillis();
==>1491284171835


well thank you for letting me know this
:)

Message has been deleted
Message has been deleted

Stephen Mallette

unread,
Apr 4, 2017, 9:06:37 AM4/4/17
to Gremlin-users
Mustafa - this thread is becoming difficult to follow as a result of these many small, follow-up emails. Please give people on the list a more time to answer you and do a better job summarizing your updates into a single post rather than lots of play-by-play updates of your progress. Thank you.

On Tue, Apr 4, 2017 at 9:00 AM, Mustafa Tinwala <mustafa.ti...@gmail.com> wrote:


On Tuesday, 4 April 2017 11:38:26 UTC+5:30, Mustafa Tinwala wrote:


On Tuesday, 4 April 2017 11:09:26 UTC+5:30, Mustafa Tinwala wrote:


On Monday, 3 April 2017 23:15:57 UTC+5:30, Robert Dale wrote:
Or even * 1000 queries = 31s

Robert Dale


plz read the last 5 conversation of mine    thank you   :)
And  hey friends i have found another link out of this may be this would be help   

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/2f02c747-0fa5-4f7b-9662-de54f08c7875%40googlegroups.com.

Mustafa Tinwala

unread,
Apr 4, 2017, 9:09:09 AM4/4/17
to Gremlin-users





Yeah sure....!  :)
 

Robert Dale

unread,
Apr 4, 2017, 10:40:58 AM4/4/17
to Gremlin-users
Mustafa,

I'm not sure if you ever posted your "clockWithResults" results.

What does clock give you?

clockWithResult{g.V(fromNode).repeat(out("childs")).times(12).emit().count().next()}

What does the profile look like?

g.V(fromNode).repeat(out("childs")).times(12).emit().count().profile()

Your connection to AWS and AWS configuration would probably be the most limiting factor here.  You could try all of this locally in the console using either Titan or TinkerGraph to try to isolate query (local) processing from network (remote).

Outside of that, there is nothing fundamentally slow about the graph structure, query, or volume.

I've created a slightly modified script to run with TinkerGraph in the gremlin console - https://gist.github.com/robertdale/8b4e604165852a5344476af940b75e30

Note that you'll have to cut-n-paste the last few statements to see the results.

You'll have to isolate and experiment with different components of your system to determine where the bottleneck is.  Let use know what you discover.
Message has been deleted

Mustafa Tinwala

unread,
Apr 5, 2017, 1:15:12 AM4/5/17
to Gremlin-users


Robert Dale

As you say

g.V(fromNode).repeat(out("
childs")).times(12).emit().count().profile() 
 

it gives me a picture like this


g.V(fromNode).repeat(out("childs")).times(12).emit().count().profile()
==>8190

 

 clockWithResult{g.V(fromNode).repeat(out("childs")).times(12).emit().count().next()}
==>39.25605984   
==>8190


Now i want to ask is it milliseconds or seconds

:)


btw  Nice discovery @ robert dale

Mustafa Tinwala

unread,
Apr 5, 2017, 10:28:43 PM4/5/17
to Gremlin-users
Is there someone who can help out ...!:)

Jason Plurad

unread,
Apr 6, 2017, 12:56:57 PM4/6/17
to Gremlin-users
> Now i want to ask is it milliseconds or seconds

Message has been deleted
Message has been deleted

Mustafa Tinwala

unread,
Apr 11, 2017, 12:35:07 AM4/11/17
to Gremlin-users
Thanks a lot jason...!    
:)


you have solve my doubt     :)


but as i say it takes this way   

gremlin> s=System.currentTimeMillis()
==>1491540194192

gremlin> clockWithResult{g.V(fromNode).repeat(out("childs")).times(12).emit().count().next()}
==>142.589291250000002
==>8190

gremlin> (System.currentTimeMillis()-s)/1000
==>38.075

How ...?    

So how to tackle this stuation  but wait yeah 

i know this that clockWithResult state the mean time between the iteration.

so Now this is my problem 

can anyone knows the solution for clockWithResult  how to 

Can anyone help me for this...!  so that i can run my query fast with no iteration or mean time added

btw thank you  guys ....!   : )

i will be happy if you have any answer :)

Stephen Mallette

unread,
Apr 11, 2017, 6:17:37 AM4/11/17
to Gremlin-users
TinkerPop's clockWithResults works how it works. If you need your own implementation that doesn't do iterations or doesn't calculate the mean the way that we do, just write your won function.

To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/89d6034a-2e14-4da4-815e-74082578a885%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages