Selecting final vertex in repeat

476 views
Skip to first unread message

Jeffrey Hagelberg

unread,
Apr 21, 2016, 11:10:10 AM4/21/16
to Gremlin-users

Hi, I have a graph that looks kind of like this




I am trying to translate a query from Gremlin 2 to Gremlin 3:

The query I'm translate is this:
var_0 = [] as Set
g
.V().has("__typeName", "Table").fill(_var_0)
g
.V().has("__superTypeNames", "Table").fill(_var_0)
var_0._().as("src").in("__LoadProcess.inputTables").out("__LoadProcess.outputTable").loop("src"){true}{(it.object.' __typeName' == 'Table') | (it.object.'__superTypeNames' ? it.object.'__superTypeNames'.contains('Table') : false)}.as( "dest").select(["src","dest"]){[it."Table.name"]}{[it."Table.name"]}.toList()
It returns a result that looks like this:

==>[src:[sales_fact_daily_mv], dest:[sales_fact_monthly_mv]]
==>[src:[sales_fact], dest:[sales_fact_daily_mv]]
==>[src:[sales_fact], dest:[sales_fact_monthly_mv]]
==>[src:[time_dim], dest:[sales_fact_daily_mv]]
==>[src:[time_dim], dest:[sales_fact_monthly_mv]]



The gremlin 3 version looks like this:

_var_0 = [] as Set;

g.V().has("__typeName", "Table").fill(_var_0);

g.V().has("__superTypeNames", "Table").fill(_var_0);

g.V(_var_0 as Object[]).as("src").repeat(__.in("__LoadProcess.inputTables").out("__LoadProcess.outputTable")).emit(has('__typeName',eq('Table')).or().has('__superTypeNames',eq('Table'))).as("dest").select("src","dest").by{[it."Table.name"]}.by{[it."Table.name"]}.toList()


The result I am getting is:


==>[src:[sales_fact_daily_mv], dest:[sales_fact_monthly_mv]]
==>[src:[sales_fact], dest:[sales_fact_daily_mv]]
==>[src:[sales_fact], dest:[[sales_fact_daily_mv, sales_fact_monthly_mv]]]
==>[src:[time_dim], dest:[sales_fact_daily_mv]]
==>[src:[time_dim], dest:[[sales_fact_daily_mv, sales_fact_monthly_mv]]]


I am trying to figure out what is needed to get the query result to look like the one from the original query.  From what I can tell, the difference has something to do with the as or select clause.  I've tried a couple experiments with simplified versions of the queries.  Without the select, the "dest" part of the output seems to look right:


gremlin> g.V(_var_0 as Object[]).as("src").repeat(__.in('__LoadProcess.inputTables').out('__LoadProcess.outputTable')).emit().as("dest")
g
.V(_var_0 as Object[]).as("src").repeat(__.in('__LoadProcess.inputTables').out('__LoadProcess.outputTable')).emit().as("dest")

==>v[16432]
==>v[12336]
==>v[16432]
==>v[12336]
==>v[16432]


However, when I add a select, some of the "dest" output values turn into an array:


gremlin> g.V(_var_0 as Object[]).as("src").repeat(__.in('__LoadProcess.inputTables').out('__LoadProcess.outputTable')).emit().as("dest").select("dest")
g
.V(_var_0 as Object[]).as("src").repeat(__.in('__LoadProcess.inputTables').out('__LoadProcess.outputTable')).emit().as("dest").select("dest")

==>v[16432]
==>v[12336]
==>[v[12336], v[16432]]
==>v[12336]
==>[v[12336], v[16432]]


Can anyone explain this behavior or suggest what can be done to get rid of the extra values in "dest" so that the result matches what we got in Germlin 1?


Thanks a lot for any help you can provide.


-Jeff



Daniel Kuppitz

unread,
Apr 21, 2016, 11:45:32 AM4/21/16
to gremli...@googlegroups.com
.select(last, "dest") is what you're looking for.

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/b811e185-905b-482a-8db6-b87b062d4fe6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jeffrey Hagelberg

unread,
Apr 25, 2016, 9:53:43 PM4/25/16
to Gremlin-users
Hi Daniel,

Thanks.  That seems to be working.  However, now when I add path() to the end of the query there is a duplicate element in it (produced by the select step).  Is there a way to do this without affecting the path? 

Part of the difficulty is that we are translating a DSL query into gremlin, so even small changes in semantics have massive ripple effects.  What I don't understand is how the value of "dest" is getting set to an array at all.  The two elements in the array are from different iterations of the repeat loop.  If you look at the graph diagram I attached,  if we start at "time_dim" (12504), then the first iteration of the repeat body should take us to "sales_fact_daily_mv" (12336).  At that point we do an emit and I would expect [time_dim, sales_fact_daily_mv] to be returned by the select, which I do see.  When we iterate again, though, the traverser should move from sales_fact_daily_mv(12336) to sales_fact_monthly_mv(16432).  At that point, when we emit, I would expect 16432 to be emitted and get put into "dest".  However, instead dest is getting set to [12336,16432], and the select returns [time_dim,[sales_fact_daily_mv, sales_fact_monthly_mv]]  Is this the expected behavior?  Is there maybe something else going on here that I'm missing (which is very possible, since I am very new to all of this :-))?  Ideally I'd like some way to have the repeat loop just emit the vertex(ices) from the last step in the nested traversal.  Anything that does not involve post-processing the query results to remove duplicate path entries would be good.  Do you have any more thoughts about this?

Thanks for all your help,

-Jeff

Jeffrey Hagelberg

unread,
Apr 25, 2016, 10:02:08 PM4/25/16
to Gremlin-users
BTW, the updated query with the select I used is:

g.V(_var_0 as Object[]).as("src").repeat(__.in("__LoadProcess.inputTables").out("__LoadProcess.outputTable")).emit().as("dummy").select(last,"dummy").as("dest").select("src","dest").path()



Is something like this what you had in mind?

-Jeff

Daniel Kuppitz

unread,
Apr 26, 2016, 3:16:52 AM4/26/16
to gremli...@googlegroups.com
Is this the expected behavior?

Yes, it is. It will collect all vertices that were traversed within repeat(). If you always only care about the last / emitted vertex, then place your .as("dest") outside the repeat().

Cheers,
Daniel


Daniel Kuppitz

unread,
Apr 26, 2016, 3:28:19 AM4/26/16
to gremli...@googlegroups.com
g.V(_var_0 as Object[]).as("src").repeat(__.in("__LoadProcess.inputTables").out("__LoadProcess.outputTable")).emit().as("dummy").select(last,"dummy").as("dest").select("src","dest").path()
 
Is something like this what you had in mind?

Uhm, no. What's the purpose of dummy?

Check this to see how the placement of as() changes the result: http://www.gremlinbin.com/bin/view/571f168d2ba9b

I'm not quite sure what you're looking for. Do you only want to get the src and dest vertices or do you want the path from src to dest? Both mixed up doesn't make much sense IMO.

Get src and dest:

g.V(_var_0 as Object[]).as("src").repeat(__.in("__LoadProcess.inputTables").out("__LoadProcess.outputTable")).emit().as("dest").select("src","dest")

Get the path from src to dest:

g.V(_var_0 as Object[]).repeat(__.in("__LoadProcess.inputTables").out("__LoadProcess.outputTable")).emit().path()

Cheers,
Daniel


Jeffrey Hagelberg

unread,
Apr 26, 2016, 1:54:27 PM4/26/16
to Gremlin-users
For some reason, when I don't put in a dummy variable and just do select(last), the query returns no results.

I'm still seeing dest get set to an array even when the "as" is outside the repeat.  Is it possible to assign dest as a side effect kind of like this:

g.V(_var_0 as
Object[]).as("src").repeat(__.in("__LoadProcess.inputTables").out("__LoadProcess.outputTable")).emit().sideEffect(__({((it
as Object[]) as List).last()}).as("dest")).select("src","dest")




I think if I could get that to work, it would solve both problems.  sideEffect should not have any impact on the path, right? Unfortunately, the above also seems to be assigning dest to an array as well, so there must be something wrong with it.

Thanks again,

-Jeff

Jeffrey Hagelberg

unread,
Apr 26, 2016, 2:12:43 PM4/26/16
to Gremlin-users
Ok, following up on this, it does look like only the final vertex is being emitted when as is outside.  For example, I tried this:

g.V(_var_0 as Object[]).as("src").repeat(__.in("__LoadProcess.inputTables").out("__LoadProcess.outputTable")).emit().sideEffect{System.out.println(it)}
And saw the following result:

v[16544]
v[24760]
v[16544]
v[24760]
v[16544]

This suggests that no special logic should actually be needed to set "dest". However, when I add on the select the 3rd and 5th results turn into an array.  Do you have any idea why that would be?  Maybe the solution here is to put the logic to remove the array into a by clause.

-Jeff

Jeffrey Hagelberg

unread,
Apr 26, 2016, 2:20:19 PM4/26/16
to Gremlin-users
Ok, I finally have something that works.  The query I ended up with is:

g.V(_var_0 as Object[]).as("src").repeat(__.in("__LoadProcess.inputTables").out("__LoadProcess.outputTable")).emit().as("dest").select("src","dest").by().by{((it as Object[]) as List).last()}.path()

==>[v[24760], v[28768], v[16544], {src=v[24760], dest=v[16544]}]
==>[v[8384], v[20568], v[24760], {src=v[8384], dest=v[24760]}]
==>[v[8384], v[20568], v[24760], v[28768], v[16544], {src=v[8384], dest=v[16544]}]
==>[v[24816], v[20568], v[24760], {src=v[24816], dest=v[24760]}]
==>[v[24816], v[20568], v[24760], v[28768], v[16544], {src=v[24816], dest=v[16544]}]

This has the paths correct and only one dest in all cases.  Thanks for all your help!

-Jeff

Daniel Kuppitz

unread,
Apr 26, 2016, 2:20:29 PM4/26/16
to gremli...@googlegroups.com
Can you please reproduce your problem in GremlinBin using a sample graph? My sample clearly shows that if the as() is used outside the loop, you won't get an array.

Cheers,
Daniel


Jeffrey Hagelberg

unread,
Apr 29, 2016, 1:33:34 PM4/29/16
to gremli...@googlegroups.com
I can't seem to reproduce the issue in GremlinBin.  Perhaps it has been fixed in later Tinkerpop versions?  I'm using 3.0.1-incubating-SNAPSHOT.

You received this message because you are subscribed to a topic in the Google Groups "Gremlin-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gremlin-users/P83GWgtRG_8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/CA%2Bf9seV36KVLRbT138RvcandLrxJ%3DraR1%2B%3D_AF1eA3A_DvJzmQ%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages