>> In both cases, it seems like the first thing to do is gather a list of
>> users who rate things similarly to me. Then in the first case, filter
>> out any who have not rated the item I'm interested in, and average the
>> ratings of the remaining users. In the second case, average the
>> ratings of those users for all items I have not yet rated that they
>> have rated and filter out any that are less than 7.
First, so I know we are on the same page. That is all people that have rated the same things as person 1.
g.v(1).out('rated').in('rated')
The traversal below returns all the people that have rated things "similar" to person 1. Similar is defined as the absolute value of vertex 1's rating minus the rating of the person in question.
g.v(1).outE('rated').sideEffect{ w = it.stars }.inV.inE('rated').filter{ Math.abs(it.stars - w) <= 1 }.outV
You can then average their ratings for all items using mean():
g.v(1).outE('rated').sideEffect{ w = it.stars }.inV.inE('rated').filter{ Math.abs(it.stars - w) <= 1 }.stars.mean()
...hopefully that is enough to get you going on your particulars. Here are things to realize:
1. You will need to make use of the outE.inV pattern as you are reasoning beyond an edge's label and into their properties (e.g. the stars on an edge).
2. Make use of sideEffects to store intermediate values (e.g. person 1's stars)
3. With filter you can apply any function you want that returns boolean -- use the Java API as you see fit.
4. If you want to make this as blazing fast as you want it to be, look into strategically place range filters to sacrifice accuracy for time.
Good luck!,
Marko.
Hi Marko and others, I know I can use ExecutionEngine in Java to run cypher queries. What do I use to run Gremlin queries in Java? Any tutorial/documentaion to suggest?
Many thanks,
You can do something like this:
m = [:].withDefault({[]})
....groupCount(m){it.id}{it.add(diff)}.iterate()
Realize that the first line is your resultant data structure and that will generate values that are lists (hence the "withDefault()").
In essence, groupCount allows two closures -- one that is the key to use and one is the value to use. In Gremlin 1.4 and below, the key-"it" is the object propagating through the pipeline and the value-"it" is the last value for that key in the map. In Gremlin 1.5+, its a bit different, the key-"it" is the object propagating through the pipe and the value-"it" is a Pair which is the object propagating as the getA() and the last value for that key as the getB(). However, I suspect you are using Gremlin 1.4 or below so the code snippet above will work for you.
HTH,
Marko.
Thanks Marko! I am a lot further than I was a few hours ago. I've
gotten two major pieces of the recommendation engine done.
Piece 1: find all the users who tend to rate things similarly to a
target user
m=[:].withDefault{[0,0,0]};
user=g.v(1);
user.outE("RATED").sideEffect{w=it.rating}
.inV.inE("RATED").outV.except([user]).back(2)
.sideEffect{diff=Math.abs(it.rating-w)}
.outV.sideEffect{ me=m[it.id]; me[0]++; me[1]+=diff; me[2] =
me[1]/me[0]; }
.filter{m[it.id][2]<=2}.dedup()
Building off of this, I can get Piece 2, the answer to one of my
original questions: given users who are similar to a target user,
calculate the predicted rating of an item by that target user
m=[:].withDefault{[0,0,0]};
user=g.v(1);
item=g.v(9);
user.outE("RATED").sideEffect{w=it.rating}
.inV.inE("RATED").outV.except([user]).back(2)
.sideEffect{diff=Math.abs(it.rating-w)}
.outV.sideEffect{ me=m[it.id]; me[0]++; me[1]+=diff; me[2] =
me[1]/me[0]; }
.filter{m[it.id][2]<=2}.dedup()
.outE("RATED").inV.filter{it==item}.back(2).dedup().rating.mean()
So now I just need one more piece, the answer to the other question in
my original post: what are the top rated items by users similar to my
target user, that the target user has not rated. I think I'll need to
have another map variable and sideEffect to calculate the average
ratings for every item rated by my similar users, then sort by average
and limit to the top 10 or however many. Still working on it.
Also, I have not benchmarked this yet, but my instinct is that it
would not be very performant.
Wow.
If you two could could coordinate on a blog post about both approaches, that would be most awesome. Just sayin.
Send from a device with crappy keyboard and autocorrection.
/peter
Piece 1: find all the users who tend to rate things similarly to a
target user
m=[:].withDefault{[0,0,0]};
user=g.v(1);
user.outE("RATED").sideEffect{w=it.rating}
.inV.inE("RATED").outV.except([user]).back(2)
.sideEffect{diff=Math.abs(it.rating-w)}
.outV.sideEffect{ me=m[it.id]; me[0]++; me[1]+=diff; me[2] =
me[1]/me[0]; }
.filter{m[it.id][2]<=2}.dedup()
-Luanne
Wow,
That will be supercool, both of you!
Sent from a device with crappy keyboard and autocorrection.
/peter