excluding "disliked items from a certain person" in a collaborative filtering traversal

47 views
Skip to first unread message

Burak Arikan

unread,
Feb 9, 2012, 3:40:31 AM2/9/12
to pacer...@googlegroups.com
Hi there, 

Trying to build a collaborative filtering traversal where a Person is recommended Items while his disliked items from previous recommendations are excluded.

Person--ordered-->Item
Person--recommended-->Item (if person dislikes it, we update its property recommended[:dislike] = 1 else recommended[:dislike] is 0)

In two steps this works, we remove disliked items from resulting hash (the ruby way):

m = person.out_e('ordered').in_v.aggregate(:x).in('ordered').except(person).out('ordered').except(:x).group_count
person.out_e('recommended').where('dislike != :val', val:0).in_v.each { |v| m.delete(v) }

In one step (assuming that it will be faster) I can do this, but notice this is not correct because this excludes "dislikes from any person". How to make it so that it excludes "dislikes from a certain person"?

m = person.out_e('ordered').in_v.aggregate(:x).in('ordered').except(person).out('ordered').except(:x).neg_lookahead{|n| n.in_e('recommended').where('dislike != :val', val:0).in_v }.group_count

In Gremlin maybe this is doable in a single traversal via back()? Or any other traversal strategy we can apply for this case? 

Cheers,
Burak

Darrick Wiebe

unread,
Feb 9, 2012, 3:54:45 AM2/9/12
to pacer...@googlegroups.com

I think rather than use :x for the aggregated value, instantiate your own hashset and use a #process block at the right place to #clear the hashset.

Does that make sense?

Cheers!
Darrick

Burak Arikan

unread,
Feb 9, 2012, 4:11:33 AM2/9/12
to pacer...@googlegroups.com
Thanks for the quick reply Darrick, 

1. So do you mean, before the traversal, instantiating an 'exclude array' that combines x = person.out_e('ordered').in_v --and--  x = person.out_e('recommended').where('dislike != :val', val:0).in_v
 
2. Couldn't see any #process examples in the Pacer spec, can you elaborate on how to use it a bit.

Thanks,
burak

Darrick Wiebe

unread,
Feb 9, 2012, 5:16:32 PM2/9/12
to pacer...@googlegroups.com
Hmm, the dangers of replying on too little sleep...I actually kind of misunderstood your question there. Let me try again!

I think there is a flaw in your second traversal. Here I've analyzed it and I think I've found the problem:

person
  .out('ordered')  # what the person ordered
  .aggregate(:x)   
  .in('ordered')   
  .except(person)  # other people who ordered the same things
  .out('ordered')  # what the other people ordered
  .except(:x)      # ...excluding what the person ordered
  .neg_lookahead { |n|
    n.in_e('recommended') # *any* recommendations associated with the *product*
      .where('dislike != :val', val:0) # ...not disliked
      .in_v        # the person who didn't dislike it
      .is(person)  # THE MISSING BIT: that person must be our person
  }.group_count

Cheers,
Darrick

PS. the #process step is the same as Gremlin's side effect step:

[1, 2, 3]
  .to_route
  .process { |n| do_something_with n }
  .to_a #=> [1, 2, 3]

Darrick Wiebe

unread,
Feb 9, 2012, 6:01:12 PM2/9/12
to pacer...@googlegroups.com
I was thinking about this a little more because I think that the route above would actually be kind of slow because it may have to process a lot of recommend edges and I came up with the following instead:

disliked = nil
person
  .v                  # Treat the person record as a route to that record
  .process { |p| 
    disliked = 
      p.out_e('recommend')     # The person's recommend edges
        .where('dislike != 0') # ...only if they didn't dislike it
        .in_v                  # the product in question
        .to_hashset            # turn the product list into a java.util.HashSet
  }                    # process executes the side_effect
  .out('ordered')      # what the person ordered
  .aggregate(:ordered)
  .in('ordered')
  .except(person)      # other people who ordered the same things
  .out('ordered')      # what the other people ordered
  .except(:ordered)    # ...excluding what the person ordered
  .except(disliked)   # ...excluding what the person disliked
  .group_count

In this example, the process block is actually not necessary if you will only be dealing with a single person as the starting point, but would be useful if you want the route to be able to be applied to a route of multiple person records.  If you are using a single person starting point only, the following is equivalent:

disliked = 
  person.out_e('recommend')# The person's recommend edges
    .where('dislike != 0') # ...only if they didn't dislike it
    .in_v                  # the product in question
    .to_hashset            # turn the product list into a java.util.HashSet

person
  .out('ordered')      # what the person ordered
  .aggregate(:ordered)
  .in('ordered')
  .except(person)      # other people who ordered the same things
  .out('ordered')      # what the other people ordered
  .except(:ordered)    # ...excluding what the person ordered
  .except(disliked)   # ...excluding what the person disliked
  .group_count


I also realized while looking at this that in the code from the previous email, the last `.in_v` should actually be `.out_v`

Cheers,
Darrick

Burak Arikan

unread,
Feb 10, 2012, 3:35:45 PM2/10/12
to pacer...@googlegroups.com
Hey Darrick, thanks a lot! 

The last example works like a charm. So it is nice to know that we can simply use java.util.HashSet in pipes.

The one with the .process block somehow couldn't filter out the disliked items at first. But, at the end of the traversal pipe, when I put .except(disliked) before .except(:ordered) it worked. It is an unexpected weird behavior right? 

Cheers!
Burak

Darrick Wiebe

unread,
Feb 10, 2012, 5:37:39 PM2/10/12
to pacer...@googlegroups.com
In fact you can also use Ruby arrays or sets as well as Pacer routes in the #only and #except steps.

That is strange that the order of the #except steps would make a difference. If you could send me a bit of data that exhibits the problem and file a bug, that would help me a lot in diagnosing it.

Cheers,
Darrick
Reply all
Reply to author
Forward
0 new messages