How to compare two arrays in Gremlin

312 views
Skip to first unread message

Alex Kumundzhiev

unread,
May 3, 2020, 2:46:56 AM5/3/20
to Gremlin-users
Hello. I need a pattern where array contains all elements from another array. It is pretty easy in Cypher:

MATCH (:ISSUE {uid: "B"}) -[:ISSUE_FILE]-> (file)
WITH collect(file) AS issue_files_list
UNWIND issue_files_list AS file
MATCH (file) <-[:DEVICE_FILE]- (device)
WITH DISTINCT device, collect(file) AS device_file_list, issue_files_list
WITH device
WHERE all(file IN issue_files_list WHERE file IN device_file_list)
RETURN device

But with Gremlin I can't achieve needed result. I started with following (tried to compare lengths, but without success):
g.V().has('ISSUE', 'uid', 'B')
.map(
    out('ISSUE_FILE').fold()
).as('linked_files')
.unfold()
.in('DEVICE_FILE')
.dedup()
.project('uid', 'files')
.by(values('uid'))
.by(
    out('DEVICE_FILE')
    .where(
        within('linked_files')
    )
    .count()
)
.select('uid', 'files')
.where(
    select('linked_files').unfold().count().is(eq('files'))
)

Daniel Kuppitz

unread,
May 4, 2020, 10:31:20 AM5/4/20
to gremli...@googlegroups.com
My Cypher is a little rusty, but isn't your query just doing the following?

g.V().has('ISSUE', 'uid', 'B').
  out('ISSUE_FILE').
  in('DEVICE_FILE').dedup()

Unless I'm missing something, all the folding and comparing really just ensures that each device in the result is connected to the initial issue (through a file). But obviously (?), that's always true.
Perhaps provide a sample graph if this is not the answer you're looking for.

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/146cbeaf-168b-4f2f-93c2-e08d8c1e1587%40googlegroups.com.

Alex Kumundzhiev

unread,
May 4, 2020, 12:42:50 PM5/4/20
to Gremlin-users
Yes, the question is to ensure that every issue's file is presented in device's files, it is not always true fr every device. And return the list of such devices.


On Monday, May 4, 2020 at 6:31:20 PM UTC+4, Daniel Kuppitz wrote:
My Cypher is a little rusty, but isn't your query just doing the following?

g.V().has('ISSUE', 'uid', 'B').
  out('ISSUE_FILE').
  in('DEVICE_FILE').dedup()

Unless I'm missing something, all the folding and comparing really just ensures that each device in the result is connected to the initial issue (through a file). But obviously (?), that's always true.
Perhaps provide a sample graph if this is not the answer you're looking for.

Cheers,
Daniel


To unsubscribe from this group and stop receiving emails from it, send an email to gremli...@googlegroups.com.

Alex Kumundzhiev

unread,
May 4, 2020, 1:57:55 PM5/4/20
to Gremlin-users
In Cypher this can be done using list comprehension (https://neo4j.com/docs/cypher-manual/current/syntax/lists/#cypher-list-comprehension) that returns new list with boolean [true, true, false], then applying all() to this new list

Alex Kumundzhiev

unread,
May 25, 2020, 4:37:16 AM5/25/20
to Gremlin-users
Here is a sample graph in Gremlin:

g
.addV('DEVICE').as('DEVICE_1').property('uid', '1').property('partition_key', '1')
.addV('DEVICE').as('DEVICE_2').property('uid', '2').property('partition_key', '1')
.addV('FILE').as('FILE_A').property('uid', 'A').property('partition_key', '1')
.addV('FILE').as('FILE_B').property('uid', 'B').property('partition_key', '1')
.addV('FILE').as('FILE_C').property('uid', 'C').property('partition_key', '1')
.addV('FILE').as('FILE_D').property('uid', 'D').property('partition_key', '1')
.addV('FILE').as('FILE_E').property('uid', 'E').property('partition_key', '1')
.addV('FILE').as('FILE_F').property('uid', 'F').property('partition_key', '1')
.addV('ISSUE').as('ISSUE_A').property('uid', 'A').property('partition_key', '1')
.addV('ISSUE').as('ISSUE_B').property('uid', 'B').property('partition_key', '1')
.addV('ISSUE').as('ISSUE_C').property('uid', 'C').property('partition_key', '1')
.addE('DEVICE_FILE').from('DEVICE_1').to('FILE_A')
.addE('DEVICE_FILE').from('DEVICE_1').to('FILE_B')
.addE('DEVICE_FILE').from('DEVICE_1').to('FILE_C')
.addE('DEVICE_FILE').from('DEVICE_1').to('FILE_D')
.addE('DEVICE_FILE').from('DEVICE_1').to('FILE_E')
.addE('DEVICE_FILE').from('DEVICE_2').to('FILE_A')
.addE('DEVICE_FILE').from('DEVICE_2').to('FILE_B')
.addE('DEVICE_FILE').from('DEVICE_2').to('FILE_D')
.addE('DEVICE_FILE').from('DEVICE_2').to('FILE_E')
.addE('ISSUE_FILE').from('ISSUE_A').to('FILE_A')
.addE('ISSUE_FILE').from('ISSUE_A').to('FILE_B')
.addE('ISSUE_FILE').from('ISSUE_A').to('FILE_D')
.addE('ISSUE_FILE').from('ISSUE_A').to('FILE_E')
.addE('ISSUE_FILE').from('ISSUE_B').to('FILE_A')
.addE('ISSUE_FILE').from('ISSUE_B').to('FILE_B')
.addE('ISSUE_FILE').from('ISSUE_B').to('FILE_C')
.addE('ISSUE_FILE').from('ISSUE_C').to('FILE_A')
.addE('ISSUE_FILE').from('ISSUE_C').to('FILE_B')
.addE('ISSUE_FILE').from('ISSUE_C').to('FILE_F')

For Issue A it should return Device 1 and 2
For Issue B it should return only Device 1 since it's the only one with files ABC 

Stephen Mallette

unread,
May 27, 2020, 7:24:09 AM5/27/20
to gremli...@googlegroups.com
How about this approach?

gremlin> g.V().has('ISSUE', 'uid', 'A').
......1>   out('ISSUE_FILE').aggregate('expected').
......2>   in('DEVICE_FILE').
......3>   groupCount().
......4>      by('uid').
......5>   unfold().as('e').
......6>   where('expected', eq('e')).
......7>     by(count(local)).
......8>     by(select(values)).
......9>   select(keys)
==>1
==>2
gremlin> g.V().has('ISSUE', 'uid', 'B').
......1>   out('ISSUE_FILE').aggregate('expected').
......2>   in('DEVICE_FILE').
......3>   groupCount().
......4>      by('uid').
......5>   unfold().as('e').
......6>   where('expected', eq('e')).
......7>     by(count(local)).
......8>     by(select(values)).
......9>   select(keys)
==>1


Perhaps there is a nicer way still....I just went with the direct approach of comparing counts of the files with the count of the devices we traversed to.



To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/5f5702a2-03dd-486b-a503-bb720a11a10c%40googlegroups.com.

Alex Kumundzhiev

unread,
May 28, 2020, 12:01:30 PM5/28/20
to Gremlin-users
Thanks a lot for the glimpse. But I need to return device vertices, I'm trying to do it this way:
g.V().has('ISSUE', 'uid', 'A')
.out('ISSUE_FILE').aggregate('expected')
.in('DEVICE_FILE')
.groupCount()
.by('uid')
.unfold().as('e')
.where('expected', eq('e'))
.by(count(local))
.by(select(values))
.select(keys).fold().as('device_uid')
.V().has('DEVICE', 'uid', within('device_uid'))

but it returns nothing, could you suggest something?

Stephen Mallette

unread,
May 28, 2020, 12:51:24 PM5/28/20
to gremli...@googlegroups.com
Just drop the by('uid') on the groupCount() - that should fix it.

To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/b6452523-b2bd-4bbb-8923-cfa7f803b0cf%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages