I am trying to create an example query that returns two "connected components". I know they're not true connected components because it's a directed property graph, but I would like to return two connected clusters. I've created a simple example with two clusters:
graph = TinkerGraph.open()
client1 = graph.addVertex(label,'client','name','client1')
client2 = graph.addVertex(label,'client','name','client2')
client3 = graph.addVertex(label,'client','name','client3')
client4 = graph.addVertex(label,'client','name','client4')
order1 = graph.addVertex(label,'order','name','order1')
order2 = graph.addVertex(label,'order','name','order2')
order3 = graph.addVertex(label,'order','name','order3')
order4 = graph.addVertex(label,'order','name','order4')
order5 = graph.addVertex(label,'order','name','order5')
order6 = graph.addVertex(label,'order','name','order6')
order7 = graph.addVertex(label,'order','name','order7')
payment1 = graph.addVertex(label,'payment','name','payment1')
payment2 = graph.addVertex(label,'payment','name','payment2')
payment3 = graph.addVertex(label,'payment','name','payment3')
payment4 = graph.addVertex(label,'payment','name','payment4')
payment5 = graph.addVertex(label,'payment','name','payment5')
payment6 = graph.addVertex(label,'payment','name','payment6')
payment7 = graph.addVertex(label,'payment','name','payment7')
item1 = graph.addVertex(label,'item','name','item1')
item2 = graph.addVertex(label,'item','name','item2')
item3 = graph.addVertex(label,'item','name','item3')
item4 = graph.addVertex(label,'item','name','item4')
order1.addEdge('order_client',client1)
order1.addEdge('order_item',item1)
order2.addEdge('order_item',item1)
order2.addEdge('order_client',client2)
order3.addEdge('order_client',client2)
order3.addEdge('order_item',item2)
order4.addEdge('order_item',item3)
order4.addEdge('order_client',client3)
order5.addEdge('order_client',client3)
order5.addEdge('order_item',item4)
order6.addEdge('order_client',client3)
order6.addEdge('order_item',item4)
order7.addEdge('order_item',item4)
order7.addEdge('order_client',client4)
order1.addEdge('order_payment',payment1)
order2.addEdge('order_payment',payment2)
order3.addEdge('order_payment',payment3)
order4.addEdge('order_payment',payment4)
order5.addEdge('order_payment',payment5)
order6.addEdge('order_payment',payment6)
order7.addEdge('order_payment',payment7)
g = graph.traversal()
I would like to return the clients and items from each cluster. So (client1, client2, item1, item2) would be a cluster and (client2, client4, item3, item4) would be the other cluster.
But the dedup step drops off some of the nodes (e.g. client1 and client4) for some reason, and without the dedup, too many clusters are returned.
Any help would be appreciated. Let me know if you need more information.