Help collapsing partitions

81 views
Skip to first unread message

Jeremy Gustie

unread,
Feb 22, 2017, 3:20:16 PM2/22/17
to Gremlin-users
I am new to TinkerPop (and graph databases in general) and am trying to replace a (somewhat broken) relational model with a graph. I have come to point where I am struggling to understand how to tackle the problem of collapsing two partitions into one (in fact I would say that I am struggling to understand if that is even something that makes sense).

I tried coming up with an example to illustrate what I would like to do: I am starting with two partitions, but am trying to collapse them together into a single partition.

graph = TinkerFactory.createModern()
g = graph.traversal()
g.V().property('_partition', 'modern')
g.E().property('_partition', 'modern')
socialStrategy = PartitionStrategy.build().partitionKey("_partition").writePartition("social").readPartitions("social").create()
gS = graph.traversal().withStrategies(socialStrategy)
gS.addV('person').property('name', 'marko').property('email', 'marko@markor***.com').as('marko').
   addV('person').property('name', 'stephen').property('twitter', '@spmall***').
   addE('knows').from('marko')

In this example I am trying to eliminate the "social" partition and end up in the state where:

gremlin> m = g.V().has('name', 'marko').valueMap(true)
==>[_partition:[modern],label:person,name:[marko],id:1,arg:[29],email:[marko@markor***.com]]
gremlin> g.V().has('name', 'stephen').valueMap(true)
==>[_partition:[modern],twitter:[@spmall***],label:person,name:[stephen],id:22]
gremlin> g.V().has('name', 'marko').outE('knows')
==>e[7][1-knows->2]
==>e[8][1-knows->4]
==>e[26][1-knows->22]

I'm not yet comfortable enough with the TinkerPop language to know exactly what a good approach would look like. For example, I'm not sure if this is something I should try to accomplish using multiple traversals; e.g. maybe one to find the vertices that can just have their partition value updated and one to somehow combine the properties of vertices grouped by their "name" from both partitions (which I would be at a loss to actually do). I guess this might be similar to updating the partition value on all nodes in `gS` to the new value and then going back over and cleaning up the "modern" partition?

I'm also aware that partition strategies can have multiple "read" values, so perhaps the right anwser is to just leave the structure of the graph alone and to change how I construct queries to combine the grouped vertex properties (again, I'm not sure exactly what this might look like).

Any pointers or suggestions would be greatly appreciated!

Thanks,
-Jeremy 

Stephen Mallette

unread,
Mar 6, 2017, 7:46:19 AM3/6/17
to Gremlin-users
Sorry for this late response - I didn't realize this went unanswered. If I understand this properly, it looks that by "collapse" you're referring to merging the properties of a vertex from "modern" and "social" into one vertex based on some key (in this case "name"). Is that right? If so, you won't get that feature out of PartitionStrategy. The structure of the graph will always come through and you'll end up with two "marko" vertices not one. It might be best to think of partitions as ways to blind your traversals to certain parts of the graph and less as a means of structuring/organizing the graph's data. I'm not sure if that completely answers your question or not....

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/7526be91-342e-447c-b566-2bfb2889b206%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jeremy Gustie

unread,
Mar 6, 2017, 1:20:26 PM3/6/17
to Gremlin-users
Thanks Stephen, you are absolutely right about me wanting to merge the properties of two vertices into one.

At first I thought I might be able to just blind all my traversals to both "modern" and "social", but then I need to account for merging the properties at some point (or multiple points) in every traversal. I think you just confirmed that probably isn't a good idea and a one time mutation to the desired structure would be better.

The last sentence in the PartitionStrategy documentation seems to allude to exactly what I want to do ("merge those multiple graphs"); I just don't know what that looks like. I've done a bit more reading and I think I can use something like the "Duplicate Vertex Detection" recipe, but instead of filtering where the count is greater then 1, I would somehow need to branch the traversal so it either changes the "_partition" value (if the count was 1) or merges the properties.

Do you know of any examples of a traversal that merges the properties of two vertices?

Thanks,
-Jeremy
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

Stephen Mallette

unread,
Mar 6, 2017, 2:32:40 PM3/6/17
to Gremlin-users
Just to be clear, you can't really "merge" the properties of two (or more) vertices into one vertex. A vertex is bound to the data in the graph so if you were to actually add properties to a vertex from another as part of a merge process, those properties in your Gremlin traversal would end up propagated to your graph which I assume you don't want. So, I think for purposes of merging you'd really be merging a Map of properties that represents the data from each vertex being merged together. 

A pattern I've tended to use for situations like this is to create the concept of an "identity", where you have your master "person" vertex and then different identities that the person has for different contexts within your system (in your example a social context). I'd model like this perhaps:

g.addV('person').property('name','jeremy').
                 property('email','jer...@somewhere.com').as('x').
  addV('identitySocial').property('name','jeremy').
                         property('twitter','jeremyg').as('y').
  addE('linked').from('x').to('y').iterate()

In a bit of a crude merging process you'd then do something like this:

gremlin> jeremy = g.V().has('person','name','jeremy').valueMap().next()
==>name=[jeremy]
gremlin> socialIdentity = g.V(v).out('linked').valueMap().toList()
==>[twitter:[jeremyg],name:[jeremy]]
gremlin> socialIdentity.each{jeremy << it};[]
gremlin> jeremy
==>twitter=[jeremyg]
==>name=[jeremy]

I tried to think through a traversal that would merge the maps in a single traversal but it didn't come to me right away. perhaps Mr. Kuppitz can whip something up (and succeed in making me feel silly in the process)?




To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/3cd7271a-8ea3-44bb-8241-edadd95a185a%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages