generating a subgraph of similar nodes

140 views
Skip to first unread message

Burak Arikan

unread,
Mar 20, 2012, 5:52:22 AM3/20/12
to pacer...@googlegroups.com
Hi there, 

In a typical graph with relationships Person--ordered-->Item trying to generate a subgraph of similar Items for a given Person.

subgraph = person.out_e('ordered')
                 .in_v            # items the person ordered
                 .in('ordered')   # other people who ordered the same items  
                 .out_e('ordered')     
                 .in_v            # other people's other ordered items
                 .subgraph

This traversal returns a subgraph of Person--ordered-->Item<--ordered--Person-->ordered-->Item which we don't want fully. Instead, we want to exclude the Person in between by calling it 'similar' and return a subgraph of Person--ordered-->Item--similar-->Item

The point is to solve the problem 1) without a preprocessed similarity and 2) in the traversal as much as possible without extra post-processing. Any help, ideas would be appreciated. 

Burak

Darrick Wiebe

unread,
Mar 20, 2012, 10:54:42 AM3/20/12
to pacer...@googlegroups.com
Hey Burak,

Subgraph is created based on #paths (basically just replace subgraph with paths). You may simply want to grab the paths instead and create a subgraph of your own design based on it. Should be pretty straight forward. A subgraph is not that difficult to create. You can see the code for it at https://github.com/pangloss/pacer/blob/develop/lib/pacer/transform/path.rb

I was going to suggest an alternative method of including a map step which could do additional processing that wouldn't appear in the paths and therefore in the subgraph, but actually you wouldn't be able to put an edge that doesn't exist in the source graph in that way anyhow.

Hope that helps!
Darrick

Burak Arikan

unread,
Mar 20, 2012, 6:05:30 PM3/20/12
to pacer...@googlegroups.com
Thanks Darrick, 

In the latest version of Pacer I see .subgraph returns a #<TinkerGraph> while .paths returns a list of paths.

Any tips on how to approach this surgery. 

turning this graph:

Person--ordered-->Item<--ordered--Person-->ordered-->Item 

into this: 

Person--ordered-->Item--similar-->Item

burak 

Darrick Wiebe

unread,
Mar 20, 2012, 6:21:28 PM3/20/12
to pacer...@googlegroups.com
Best for you to first look at how subgraph is implemented and basically copy/paste that method into your code and update it. Perhaps eventually I'll support some transformation in the subgraph method but I hadn't thought of it when implementing it.

Once you've got the subgraph code, all you need to do is grab an array composed of items 0, 1, 2, 6 and run the subgraph creation stuff on it then add in the edge between 2 and 6.

Cheers,
Darrick

Burak Arikan

unread,
Mar 22, 2012, 4:34:00 AM3/22/12
to pacer...@googlegroups.com
Thanks a lot Darrick, 

After getting the paths:

paths = person.out_e('ordered')
              .in_v            # items the person ordered
              .in('ordered')   # other people who ordered the same items  
              .out_e('ordered')     
              .in_v            # other people's other ordered items
              .paths

[#<V[73137]>, #<E[1087042]:73137-order_from-16307>, #<V[16307]>, #<E[1027533]:47744-order_from-16307>, #<V[47744]>, #<E[1027551]:47744-order_from-2582>, #<V[2582]>]
[#<V[73137]>, #<E[1087042]:73137-order_from-16307>, #<V[16307]>, #<E[1027533]:47744-order_from-16307>, #<V[47744]>, #<E[1027546]:47744-order_from-22650>, #<V[22650]>]
[#<V[73137]>, #<E[1087042]:73137-order_from-16307>, #<V[16307]>, #<E[962379]:20213-order_from-16307>, #<V[20213]>, #<E[962304]:20213-order_from-1836>, #<V[1836]>]
...
=> #<V-Index(remote_id: "{04192b83-058f-4ce4-a7f5-71ad75c75cf9}") -> outE(:order_from) -> inV -> inE(:order_from) -> outV -> outE(:order_from) -> where(weight > 9) -> inV -> Obj-Path>

Mapped out another path excluding elements 3,4,5:

mpaths = paths.map {|path| [path[0], path[1], path[2], path[6]] }

But how can we create an edge on the fly –without persisting into the database? If I can, then I would add it to path like this, right:

mpaths = paths.map {|path| [path[0], path[1], path[2], similarity_edge, path[6]] }

Then I can create the subgraph from the new paths: 

subgraph = mpaths.subgraph

Getting close, but need some more help, thanks!
Burak

On Wednesday, March 21, 2012 12:21:28 AM UTC+2, Darrick Wiebe wrote:
Best for you to first look at how subgraph is implemented and basically copy/paste that method into your code and update it. Perhaps eventually I'll support some transformation in the subgraph method but I hadn't thought of it when implementing it.

Once you've got the subgraph code, all you need to do is grab an array composed of items 0, 1, 2, 6 and run the subgraph creation stuff on it then add in the edge between 2 and 6.

Cheers,
Darrick

Burak Arikan

unread,
Mar 22, 2012, 4:39:01 AM3/22/12
to pacer...@googlegroups.com
Sorry, this is the actual traversal of the previous post: 

subgraph = person.out_e(edge)
                 .in_v                # items the person ordered
                 .in_e('order_from')  # other people who ordered the same items
                 .out_v
                 .out_e('order_from')         
                 .where('weight > 9') # ...only if they ordered a lot
                 .in_v                # other people's other ordered items
                 .paths


On Tuesday, March 20, 2012 11:52:22 AM UTC+2, Burak Arikan wrote:

Darrick Wiebe

unread,
Mar 22, 2012, 9:15:10 AM3/22/12
to pacer...@googlegroups.com
What I would do is run the subgraph code on the filtered path first, then loop through the filtered path again and add the edge to the generated subgraph. You can't easily create a simulated edge between those two vertices before the fact.

Burak Arikan

unread,
Mar 22, 2012, 11:22:00 AM3/22/12
to pacer...@googlegroups.com
Thanks Darrick, 

That's a good idea, tried to implement it, got two unexpected errors:

1) Here .subgraph on filtered paths throws the error below: 

paths = person.out_e(edge)
              .in_v                # items the person ordered
              .in_e('order_from')  # other people who ordered the same items
              .out_v
              .out_e('order_from')         
              .where('weight > 9') # ...only if they ordered a lot
              .in_v                # other people's ordered items
              .paths

mpaths = paths.map {|path| [path[0], path[1], path[2], path[6]] }
msubgraph = mpaths.subgraph

undefined method `subgraph' for #<Pacer::Route:0x2f4b04ac>
from (irb):29:in `evaluate'
from org/jruby/RubyKernel.java:1088:in `eval'
from org/jruby/RubyKernel.java:1410:in `loop'
from org/jruby/RubyKernel.java:1197:in `catch'
from org/jruby/RubyKernel.java:1197:in `catch'

Is it because the last vertex is isolated after filtering or something, any idea how to fix it? 

2) Just tested how it'd work with the actual paths by trying to add a 'similar' edge between the vertices, but got the error below:

subgraph = paths.subgraph # works as expected
paths.each do |path|
  subgraph.create_edge(nil, path[2], path[6], 'similar')
end

NativeException: java.lang.ClassCastException: com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex cannot be cast to com.tinkerpop.blueprints.pgm.impls.tg.TinkerVertex
from com/tinkerpop/blueprints/pgm/impls/tg/TinkerGraph.java:220:in `addEdge'
from /Users/arikan/Sites/graphcm/resque/jruby/1.9/gems/pacer-0.9.1.1-java/lib/pacer/graph/graph_mixin.rb:106:in `create_edge'
from /Users/arikan/Sites/graphcm/resque/jruby/1.9/gems/pacer-0.9.1.1-java/lib/pacer/graph/graph_mixin.rb:341:in `creating_elements'
from /Users/arikan/Sites/graphcm/resque/jruby/1.9/gems/pacer-0.9.1.1-java/lib/pacer/graph/graph_mixin.rb:106:in `create_edge'
from (irb):31:in `evaluate'
from /Users/arikan/Sites/graphcm/resque/jruby/1.9/gems/pacer-0.9.1.1-java/lib/pacer/core/route.rb:115:in `each'
from (irb):30:in `evaluate'
from org/jruby/RubyKernel.java:1088:in `eval'
from org/jruby/RubyKernel.java:1410:in `loop'
from org/jruby/RubyKernel.java:1197:in `catch'
from org/jruby/RubyKernel.java:1197:in `catch'

TinkerVertex vs. Neo4jVertex ... is there a way to get around this mismatch?

Burak

On Thursday, March 22, 2012 3:15:10 PM UTC+2, Darrick Wiebe wrote:
What I would do is run the subgraph code on the filtered path first, then loop through the filtered path again and add the edge to the generated subgraph. You can't easily create a simulated edge between those two vertices before the fact.

Darrick Wiebe

unread,
Mar 22, 2012, 12:35:47 PM3/22/12
to pacer...@googlegroups.com
Hey,

For #1, the subgraph method operates on path routes not on arrays of arrays currently. Like I originally said, you won't be able to get the out of the box subgraph method to do what you need. I recommend creating a custom method to do this, or if you'd like to refactor the subgraph method based on this issue that I just created at https://github.com/pangloss/pacer/issues/17 that would be great!


For #2, the subgraph is created in a new TinkerGraph that replicates the neo4j ids of the source vertices, so what you need to do to create the edge between subgraph elements based on elements you are querying from your original graph (or from the paths) is as follows:

    thesubgraph.create_edge(nil,
    thesubgraph.vertex(path[2].element_id),
    thesubgraph.vertex(path[6].element_id),
    'similar')

Cheers,
Darrick
Reply all
Reply to author
Forward
0 new messages