Find properties with array/list as value and replace them

1,109 views
Skip to first unread message

Matteo Lissandrini

unread,
Jun 10, 2020, 3:09:54 AM6/10/20
to Gremlin-users
Hi,

I'm  working with a newly imported very large graph.
I don't know the schema (it has been generated through some complex pipeline).

I need to find out which node properties have as value a an array or list and replace it with just the first value of that list.

I'm not sure how to express this in gremlin.
The only think that work is , assuming I know a property name `propName` I can run this

```
g.V().has(propName).property(propName, properties(propName).limit(1).value().unfold()).iterate();
```

but this is still  far from what I need.
Any suggestion?


Thanks,
Matteo

Stephen Mallette

unread,
Jun 11, 2020, 9:49:31 AM6/11/20
to gremli...@googlegroups.com
You could do this:

gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV('person').property('name','allan').property('things',['ball','car','duck'])
==>v[0]
gremlin> g.addV('person').property('name','bill').property('things',['apple','orange','bananna'])
==>v[3]
gremlin> g.V().elementMap()
==>[id:0,label:person,name:allan,things:[ball,car,duck]]
==>[id:3,label:person,name:bill,things:[apple,orange,bananna]]
gremlin> g.V().as('x').properties().as('p').
......1>   select('x').property(select('p').key(), select('p').value().unfold())
==>v[0]
==>v[0]
==>v[3]
==>v[3]
gremlin> g.V().elementMap()
==>[id:0,label:person,name:allan,things:ball]
==>[id:3,label:person,name:bill,things:apple]


but you do say that it's a "large graph" so keep in mind that this approach is fairly expensive to go OLTP style because you iterate every vertex and then every property of every vertex (and re-writing every single property value - evey if you filtered away non-List properties it still seems pretty costly). I think that if I had a large graph I'd try to learn more about this complex graph for which the schema is unknown. I'd run some OLAP style traversals and get a feel for what the properties are and what their types are before I tried to worry about doing mutations. Understanding the schema will help you develop the right strategy to deal with the task.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/47e470a7-21b8-4238-9200-ff8ccefd7a3fo%40googlegroups.com.

Matteo Lissandrini

unread,
Jun 12, 2020, 12:35:48 PM6/12/20
to Gremlin-users
This graph has something like 4000 distinct properties.
Most of them only on  few nodes, and --from a sample-- just a few of them have arrays.
So, how can I write the query to just return those property keys that have arrays as values?
Is there a way to insert an `instanceof List` or something like that in the gremlin selection query?









On Thursday, 11 June 2020 15:49:31 UTC+2, Stephen Mallette wrote:
You could do this:

gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV('person').property('name','allan').property('things',['ball','car','duck'])
==>v[0]
gremlin> g.addV('person').property('name','bill').property('things',['apple','orange','bananna'])
==>v[3]
gremlin> g.V().elementMap()
==>[id:0,label:person,name:allan,things:[ball,car,duck]]
==>[id:3,label:person,name:bill,things:[apple,orange,bananna]]
gremlin> g.V().as('x').properties().as('p').
......1>   select('x').property(select('p').key(), select('p').value().unfold())
==>v[0]
==>v[0]
==>v[3]
==>v[3]
gremlin> g.V().elementMap()
==>[id:0,label:person,name:allan,things:ball]
==>[id:3,label:person,name:bill,things:apple]


but you do say that it's a "large graph" so keep in mind that this approach is fairly expensive to go OLTP style because you iterate every vertex and then every property of every vertex (and re-writing every single property value - evey if you filtered away non-List properties it still seems pretty costly). I think that if I had a large graph I'd try to learn more about this complex graph for which the schema is unknown. I'd run some OLAP style traversals and get a feel for what the properties are and what their types are before I tried to worry about doing mutations. Understanding the schema will help you develop the right strategy to deal with the task.

On Wed, Jun 10, 2020 at 3:09 AM Matteo Lissandrini <matteo.l...@unitn.it> wrote:
Hi,

I'm  working with a newly imported very large graph.
I don't know the schema (it has been generated through some complex pipeline).

I need to find out which node properties have as value a an array or list and replace it with just the first value of that list.

I'm not sure how to express this in gremlin.
The only think that work is , assuming I know a property name `propName` I can run this

```
g.V().has(propName).property(propName, properties(propName).limit(1).value().unfold()).iterate();
```

but this is still  far from what I need.
Any suggestion?


Thanks,
Matteo

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremli...@googlegroups.com.

Stephen Mallette

unread,
Jun 12, 2020, 3:13:10 PM6/12/20
to gremli...@googlegroups.com
Gremlin doesn't know anything about types (yet), so the language itself doesn't let you do an instanceof or "typeof" soft of check. If you are in a position to use a lambda then this would work:

gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV('person').property('name','allan').property('things',['ball','car','duck'])
==>v[0]
gremlin> g.V().properties().filter{it.get().value() instanceof Collection}.key().dedup()
==>things

I can't quite think of a way in Gremlin to reliably detect a List object - maybe:

g = TinkerGraph.open().traversal()

g.addV('person').property('name','allan').property('things',['ball','car','duck'])
g.addV('person').property('name','bill').property('things',['ball'])
g.addV('person').property('name','cate').property('things',[])

gremlin> g.V().properties().
......1>   filter(value().as('a').unfold().limit(1).where(neq('a')))
==>vp[things->[ball, car, duck]]
==>vp[things->[ball]]

I think that's a neat little bit of Gremlin because it basically compares the original value() to its unfold().limit(1) which means that if the values are identical then it can't be a Collection. Of course, it misses an empty list. I could probably figure that out, but again, I'm just trying things a bit academically as I'm always interested to see what Gremlin can do in unique cases I've not tried to solve before. But not really knowing what your environment is. It's possible that none of this is truly helpful.


To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/b502ecda-f7e3-4a14-b65b-8180ce3c994fo%40googlegroups.com.

Matteo Lissandrini

unread,
Jun 17, 2020, 1:07:11 PM6/17/20
to Gremlin-users
Looking forward to the day gremlin understands basic types, I think it would be a huge leap forward.

In the meanwhile, this

g.V().as('x').properties().as('p').
......1>   select('x').property(select('p').key(), select('p').value().unfold())

worked quite well (or at least it seems so, let's see downstream)

Thanks a lot!



On Friday, 12 June 2020 21:13:10 UTC+2, Stephen Mallette wrote:
Gremlin doesn't know anything about types (yet), so the language itself doesn't let you do an instanceof or "typeof" soft of check. If you are in a position to use a lambda then this would work:

gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV('person').property('name','allan').property('things',['ball','car','duck'])
==>v[0]
gremlin> g.V().properties().filter{it.get().value() instanceof Collection}.key().dedup()
==>things

I can't quite think of a way in Gremlin to reliably detect a List object - maybe:

g = TinkerGraph.open().traversal()
g.addV('person').property('name','allan').property('things',['ball','car','duck'])
g.addV('person').property('name','bill').property('things',['ball'])
g.addV('person').property('name','cate').property('things',[])

gremlin> g.V().properties().
......1>   filter(value().as('a').unfold().limit(1).where(neq('a')))
==>vp[things->[ball, car, duck]]
==>vp[things->[ball]]

I think that's a neat little bit of Gremlin because it basically compares the original value() to its unfold().limit(1) which means that if the values are identical then it can't be a Collection. Of course, it misses an empty list. I could probably figure that out, but again, I'm just trying things a bit academically as I'm always interested to see what Gremlin can do in unique cases I've not tried to solve before. But not really knowing what your environment is. It's possible that none of this is truly helpful.


Reply all
Reply to author
Forward
0 new messages