Previewing large queries? Is there a faster database than JanusGraph?

371 views
Skip to first unread message

Russell Jurney

unread,
Feb 19, 2018, 12:01:09 AM2/19/18
to gremli...@googlegroups.com
Are there any databases out there than can preview results of large queries? i.e. return the first few results?

I just ran this query on my database with 40 million edges:

g.V().hasLabel('user').as('owner').out('owned').as('repo').in('forked').as('fan').select('owner','repo','fan').by('userName').by('repoName')

It will not return a single result for a very long time. Half an our maybe. Therefore I don't even know if it actually works.

What I actually want is to see the first few results from this operation in my console. Can any database out there pull this off? JanusGraph does not offer interactivity with even mid-sized graphs.

Thanks,

Jean-Philippe B

unread,
Feb 19, 2018, 4:27:07 AM2/19/18
to gremli...@googlegroups.com
Hello,

Maybe you can just limit your query to limited set of users to test it:

g.V().hasLabel('user').limit(50)...

JP

Excerpts from Russell Jurney's message of February 19, 2018 6:00 am:
> Russell Jurney @rjurney <http://twitter.com/rjurney>
> russell...@gmail.com LI <http://linkedin.com/in/russelljurney> FB
> <http://facebook.com/jurney> datasyndrome.com
>
> --
> You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/CANSvDjqm0qOLxNgZ%3DBahJ05RBdNHULo0hHzFMDpwVY2Qhppt_A%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.
>

HadoopMarc

unread,
Feb 19, 2018, 5:23:10 AM2/19/18
to Gremlin-users
Hi Russel,

What works for me:

g.V().has('some_indexed_property_for_user')

This mostly gives results in 10 seconds or so for 100M graphs (the gremlin console starts iterating results as soon as they are available). Of course, this breaks down for analytic queries, e.g. using group().by(), which should be done using OLAP.

Must say I did not try the hasLabel() recently, so I am not entirely sure whether other factors like cache warming or the order of vertex types in the datastore play a role.

Cheers,    Marc

Op maandag 19 februari 2018 06:01:09 UTC+1 schreef rjurney:

HadoopMarc

unread,
Feb 19, 2018, 6:01:13 AM2/19/18
to Gremlin-users
Hi Russell,

My previous answer kept bothering me, so I checked: it was really the insertion order of vertices that is the determining factor in how fast the tablescan returns results. So you could do either of these:
  • when loading your graph database start with 100 or 1000 vertices of each vertex type 
  • store 100 or 1000 long ids of each vertex type returned during the graph loading and use these ids in g.V(userIds), g.V(eventIds), etc.
Indeed, it would be much nicer to abstract this away from the user!

Cheers,    Marc


Op maandag 19 februari 2018 06:01:09 UTC+1 schreef rjurney:
Are there any databases out there than can preview results of large queries? i.e. return the first few results?

Jason Plurad

unread,
Feb 19, 2018, 9:56:54 AM2/19/18
to Gremlin-users
If you're going to run that query over all users, you're best off running it with the Spark Graph Computer.

JP had a good suggestion with the limit() step and sample() fits as well, but you'd want to move that earlier in the traversal:

g.V().hasLabel('user').sample(10).as('owner').out('owned').as('repo').in('forked').as('fan').select('owner','repo','fan').by('userName').by('repoName')

Olav Laudy

unread,
Feb 20, 2018, 10:35:08 AM2/20/18
to gremli...@googlegroups.com
What about using .limit(100) at the right place. 

For example, after hasLabel('user').  If that's too restrictive, place it later.  



--

Russell Jurney

unread,
Feb 20, 2018, 9:08:41 PM2/20/18
to gremli...@googlegroups.com
Olav: that does not speed things up. Even hasNext() is really slow.

I have used SparkGraphComputer, but first I need to know the query even runs and it would be nice if this were possible in OLTP. None of the performance I'm looking for should be hard if it were a focus. I can iterate with SparkGraphComputer but doing so is slow. I'd like a graph database that returns fast results whenever possible.

On Sun, Feb 18, 2018 at 10:33 PM, Olav Laudy <ol...@laudy.net> wrote:
What about using .limit(100) at the right place. 

For example, after hasLabel('user').  If that's too restrictive, place it later.  

On Sun, Feb 18, 2018, 22:01 Russell Jurney <russell...@gmail.com> wrote:
Are there any databases out there than can preview results of large queries? i.e. return the first few results?

I just ran this query on my database with 40 million edges:

g.V().hasLabel('user').as('owner').out('owned').as('repo').in('forked').as('fan').select('owner','repo','fan').by('userName').by('repoName')

It will not return a single result for a very long time. Half an our maybe. Therefore I don't even know if it actually works.

What I actually want is to see the first few results from this operation in my console. Can any database out there pull this off? JanusGraph does not offer interactivity with even mid-sized graphs.

Thanks,

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/CAO%3DOAKJ%2BiaV4ri_uf45gPTuNRBO45b9pwH%3DtcshbhicjEFBAvg%40mail.gmail.com.

Robert Dale

unread,
Feb 20, 2018, 11:39:45 PM2/20/18
to gremli...@googlegroups.com
g.V().hasLabel('user') will be a full scan in JanusGraph. You may want to limit a test query to a single user either by ID or an indexed property. E.g. g.V(1234) or g.V().has('username', 'rjurney') or with label constraint g.V().has('user','username','rjurney') depending on the index.

Robert Dale

Russell Jurney

unread,
Feb 22, 2018, 5:46:45 PM2/22/18
to gremli...@googlegroups.com
I'll try picking nodes to preview by, thanks.
Reply all
Reply to author
Forward
0 new messages