Graph databases and recommendation systems

Marko Rodriguez

unread,

Apr 1, 2011, 5:04:36 PM4/1/11

to gremlin-users

Hi,

I thought many of you might be interested in a page I just wrote up:

http://markorodriguez.com/services/development/recommendation-system/

Enjoy,

Marko.

http://markorodriguez.com

Dan Brickley

unread,

Apr 1, 2011, 5:31:02 PM4/1/11

to gremli...@googlegroups.com, Marko Rodriguez

On 1 April 2011 23:04, Marko Rodriguez <okram...@gmail.com> wrote:
> Hi,
> I thought many of you might be interested in a page I just wrote up:
> http://markorodriguez.com/services/development/recommendation-system/

Nice :) We're looking at quite similar stuff in the NoTube project,
using dbpedia.org's RDF-ization of Wikipedia, amongst other sources.

If I wanted to try Gremlin against DBpedia data, ... can you suggest a recipe?

cheers,

Dan

Joshua Shinavier

unread,

Apr 2, 2011, 12:14:31 AM4/2/11

to gremli...@googlegroups.com, Dan Brickley, Marko Rodriguez

Hi Dan,

On Sat, Apr 2, 2011 at 5:31 AM, Dan Brickley <dan...@danbri.org> wrote:
[...]

> If I wanted to try Gremlin against DBpedia data, ... can you suggest a recipe?

I'll let Marko respond with a Gremlin snippet, but one general
approach we have used in the past is to define weighted paths of RDF
properties which take you from a resource of a given type (e.g.
foaf:Person) to resources of another given type (e.g. foaf:Person).
You can then compose paths to express more complex relationships for
recommendations. Here's a path (in Ripple syntax, but I think you can
see what's going on) for people related to musicians in DBpedia:

@prefix owl: <http://www.w3.org/2002/07/owl#>.
@prefix rel: <http://vocab.org/relationship/#>.
@prefix mo: <http://purl.org/ontology/mo/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.

@define relatedPeople:
((1.0 owl:sameAs) # link into Zitgist music...
(0.85 rel:siblingOf) # siblings
(0.85 rel:engagedTo) # fiancee
(0.7 rel:collaborated_with) # collaborators
(0.7 (mo:produced >> mo:producer >>)) # co-producers
(0.5 (mo:member_of >> foaf:member >>)) # co-members
(0.5 (foaf:made >> foaf:maker >>)) # co-makers
) each >>.

Apply this to the DBpedia resource for Michael Jackson, and you get a
ranked list of related people such as Janet Jackson, Diana Ross, and
so on. Each of a number of component paths are explored in parallel,
where the weight indicates a preference for certain paths over others.
Marko has indicated that something similar should be possible in
Gremlin using groupCount.

Best regards,

Josh

--
Joshua Shinavier
Tetherless World Constellation PhD student
http://tw.rpi.edu/wiki/Joshua_Shinavier
http://fortytwo.net
+1 509 570 6990

>
> cheers,
>
> Dan
>

Dan Brickley

unread,

Apr 2, 2011, 3:39:08 AM4/2/11

to Joshua Shinavier, gremli...@googlegroups.com, Marko Rodriguez, Balthasar Schopman

+cc: Balthasar

Hi there

On 2 April 2011 06:14, Joshua Shinavier <jo...@fortytwo.net> wrote:
> On Sat, Apr 2, 2011 at 5:31 AM, Dan Brickley <dan...@danbri.org> wrote:
> [...]
>> If I wanted to try Gremlin against DBpedia data, ... can you suggest a recipe?
>
>
> I'll let Marko respond with a Gremlin snippet, but one general
> approach we have used in the past is to define weighted paths of RDF
> properties which take you from a resource of a given type (e.g.
> foaf:Person) to resources of another given type (e.g. foaf:Person).
> You can then compose paths to express more complex relationships for
> recommendations.

Yup, sounds familiar :) My colleagues are doing similar but generally
implementing in (SWI-)Prolog.

[something of an aside]
One thing I'm thinking lately, is that it would be great if a bit more
of this kind of derrived dataset also got posted back into the Web as
Linked Data.

For example I've been using Apache Mahout for recommendations against
TV viewing data, which gives a similarity space for content (and
potentially actors etc.). And of course there has been huge amounts of
work for the NetFlix prize in recent years. Yet currently we tend to
see only the most basic
facts re-published in RDF, rather than more organic, 'bottom up'
clusters and associations.

> Here's a path (in Ripple syntax, but I think you can
> see what's going on) for people related to musicians in DBpedia:

Oh, I'd not seen Ripple before. This is rather nice :) So looking at
http://ripple.fortytwo.net/ - does it run against a database, or just
by following Web links at query time? My guess from a quick look is
the latter, but I didn't manage to run this script with the downloaded
tool yet.

> @prefix owl: <http://www.w3.org/2002/07/owl#>.
> @prefix rel: <http://vocab.org/relationship/#>.
> @prefix mo: <http://purl.org/ontology/mo/>.
> @prefix foaf: <http://xmlns.com/foaf/0.1/>.
>
> @define relatedPeople:
> ((1.0 owl:sameAs) # link into Zitgist music...
> (0.85 rel:siblingOf) # siblings
> (0.85 rel:engagedTo) # fiancee
> (0.7 rel:collaborated_with) # collaborators
> (0.7 (mo:produced >> mo:producer >>)) # co-producers
> (0.5 (mo:member_of >> foaf:member >>)) # co-members
> (0.5 (foaf:made >> foaf:maker >>)) # co-makers
> ) each >>.
>
> Apply this to the DBpedia resource for Michael Jackson, and you get a
> ranked list of related people such as Janet Jackson, Diana Ross, and
> so on. Each of a number of component paths are explored in parallel,
> where the weight indicates a preference for certain paths over others.
> Marko has indicated that something similar should be possible in
> Gremlin using groupCount.

I like the idea of combining this kind of approach (in
Ripple/Gremlin/Prolog) with machine learning techniques. Both give us
a way of working around the inherent sparseness of much SemWeb data.
So for example if someone follows
https://twitter.com/#!/michaeljackson that's an indication they like
Michael Jackson; but if we also figure out a pile of other (weaker)
indictators, that gives us a lot more routes to the same conclusion...

http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-248/paper10.pdf

Ok, I re-typed the above example into the ripple.sh shell, cleaning up
some encoding errors that crept in, and entered:

<http://dbpedia.org/resource/Michael_Jackson> :relatedPeople .

... gives:
13) <http://dbpedia.org/resource/Michael_Jackson> :relatedPeople .

[1] <http://dbpedia.org/resource/Michael_Jackson> (((1.0
owl:sameAs) (0.85 rel:siblingOf) (0.85 rel:engagedTo) (0.7
rel:collaborated_with) (0.7 (mo:produced >> mo:producer >>)) (0.5
(mo:member_of >> foaf:member >>)) (0.5 (foaf:made >> foaf:maker >>)))
each >>)
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#List>;
<http://www.w3.org/1999/02/22-rdf-syntax-ns#first>
<urn:bnode:d897c37a0fefaf2b04cce33139e3df81>;
<http://www.w3.org/1999/02/22-rdf-syntax-ns#rest>
<urn:bnode:d59ad21087227431bd79ff542e9b0bde>.

...so I've answered my question. It's demand-time lookup.

It would be great to have this in Gremlin too, for comparison. And
Ripple is really cool, deserves a more helpful howto page, unless I've
missed it.

cheers,

Dan

Joshua Shinavier

unread,

Apr 2, 2011, 11:41:34 AM4/2/11

to gremli...@googlegroups.com, Dan Brickley, Marko Rodriguez, Balthasar Schopman

On Sat, Apr 2, 2011 at 3:39 PM, Dan Brickley <dan...@danbri.org> wrote:
[...]

> Yup, sounds familiar :) My colleagues are doing similar but generally
> implementing in (SWI-)Prolog.
>
> [something of an aside]
> One thing I'm thinking lately, is that it would be great if a bit more
> of this kind of derrived dataset also got posted back into the Web as
> Linked Data.

I agree, and the Recommendation Ontology [1] provides a nice
vocabulary for it. Use an open data license, and provide links from
the derived data back to the original data.

> For example I've been using Apache Mahout for recommendations against
> TV viewing data, which gives a similarity space for content (and
> potentially actors etc.). And of course there has been huge amounts of
> work for the NetFlix prize in recent years. Yet currently we tend to
> see only the most basic
> facts re-published in RDF, rather than more organic, 'bottom up'
> clusters and associations.

I think part of the challenge lies in not only extracting clusters and
associations, but also preserving some semantics.

>> Here's a path (in Ripple syntax, but I think you can
>> see what's going on) for people related to musicians in DBpedia:
>
> Oh, I'd not seen Ripple before. This is rather nice :) So looking at
> http://ripple.fortytwo.net/ - does it run against a database, or just
> by following Web links at query time?

Either one; Ripple can be layered on any Sesame Sail implementation,
although it uses LinkedDataSail [2] on top of MemoryStore by default.
You can also use it with persistent triple stores and/or static data
sets.

> My guess from a quick look is
> the latter, but I didn't manage to run this script with the downloaded
> tool yet.

> [...]

Oh no, that recommender system (part of a pre-Tinkerpop venture called
Knowledge Reef) used a different, closed-world strategy for path
evaluation. Ripple's built-in LazyStackEvaluator does not allow for
aggregation (necessary for combining ranking results from multiple
paths). However, if you ignore the weights, you could use
relatedPeople with vanilla Ripple to get a list (stream) of non-ranked
results:

@define weight path unweight: path >> .
@define wpath step: wpath apply >> # dequote the compound path
apply >> # dequote weight/path pairs
:unweight >> . # throw away the weight, follow the path

@define mj: <http://dbpedia.org/resource/Michael_Jackson>.

# Note: run this more than once.
:mj >> :relatedPeople :step >> distinct >> .

Most of the relevant data in this case seems to begin an owl:sameAs
away in zitgist.com space, so you would want to take multiple steps,
e.g.

:mj >> (:relatedPeople :step >>){2} >> distinct >> foaf:name >> .

> I like the idea of combining this kind of approach (in
> Ripple/Gremlin/Prolog) with machine learning techniques. Both give us
> a way of working around the inherent sparseness of much SemWeb data.
> So for example if someone follows
> https://twitter.com/#!/michaeljackson that's an indication they like
> Michael Jackson; but if we also figure out a pile of other (weaker)
> indictators, that gives us a lot more routes to the same conclusion...

It doesn't seem like a great leap from RDF to "fuzzy" RDF in which
triples have real-valued membership in graphs. That would make it
easier to combine vector-based results from different sources to fill
in some of those gaps in explicit knowledge on the Semantic Web.
Dunno about the details, though.

[...]

> ...so I've answered my question. It's demand-time lookup.

That's correct. Programs are evaluated lazily by default, and
LinkedDataSail doesn't dereference URIs until it has to, in order to
complete an (s p ?) or (? p o) triple pattern.

> It would be great to have this in Gremlin too, for comparison. And
> Ripple is really cool,

Thanks!

> deserves a more helpful howto page, unless I've
> missed it.

Cool software without proper documentation to help you use it is
pretty much my "thing". But since you asked, I have expanded the
intro page [3] a bit. Maybe when I finally move Ripple to github it
will be time for a TinkerPop-worthy wiki.

Best regards,

Josh

>
> cheers,
>
> Dan
>

[1] http://smiy.sourceforge.net/rec/spec/recommendationontology.html
[2] http://code.google.com/p/ripple/wiki/LinkedDataSail
[3] http://code.google.com/p/ripple/wiki/Introduction

Marko Rodriguez

unread,

Apr 2, 2011, 2:14:34 PM4/2/11

to Dan Brickley, gremli...@googlegroups.com

Hi,

If I wanted to try Gremlin against DBpedia data, ... can you suggest a recipe?

The thing about graph/RDF-based recommendation is that its all about your domain model (ontology/schema). There are some general patterns like local ranks [1] , collaborative filtering [2] , content/item-based filtering [2], and the mixing of such patterns using weighting, sampling, inhibition, etc. Each data set is unique and determining what is a meaningful/useful mapping from user to items is all about what sort of recommendation 'vibe' you want to express.

There is some talk about using Gremlin against the Web of Data DBPedia dataset here:

https://github.com/tinkerpop/gremlin/wiki/LinkedData-Sail

Hope that provides some inspiration,

Marko.

http://markorodriguez.com

[1] http://markorodriguez.com/2011/03/30/global-vs-local-graph-ranking/

[2] https://github.com/tinkerpop/rexster/wiki/Recommendation-Traversals

Dan Brickley

unread,

Apr 2, 2011, 3:34:44 PM4/2/11

to Marko Rodriguez, Dan Brickley, gremli...@googlegroups.com

On Saturday, 2 April 2011, Marko Rodriguez <okram...@gmail.com> wrote:
> Hi,
> If I wanted to try Gremlin against DBpedia data, ... can you suggest a recipe?
>
> The thing about graph/RDF-based recommendation is that its all about your domain model (ontology/schema).

Yes - re recipe I meant more mechanics of attaching Gremlin to the data.

The on demand linked data sail stuff is great, was wondering if anyone
has done something more optimized too - eg. Loading up dbpedia data
into a graphstore you have drivers for.

Re mechanics, there is a feature in Apache Pig that might be
interesting. you can call out from Pig scripts into any Java static
method and it'll execute from a jar in parallel on hadoop as part of
Pig script interpretation.

So for eg if in Pig I have a list of celebs we could maybe drop out to
linked data processing per url to pull in similar bands etc. Perhaps
won't be fastest approach but might be interesting to try...

I'd need to know which .jar files are needed. Is there a single .jar
distro somewhere? Or rather, can I pack up a single .jar with maven
somehow?

> There are some general patterns like local ranks [1] , collaborative
> filtering [2] , content/item-based filtering [2], and the mixing of
> such patterns using weighting, sampling, inhibition, etc. Each data
> set is unique and determining what is a meaningful/useful mapping from
> user to items is all about what sort of recommendation 'vibe' you want
> to express.
>
>

I like this direction. There are a lot of interesting ingredients
around now; linked rdf in the Web, the Apache toolset, Gremlin etc.,
... Tricky part is assembling them into a workflow and evaluating the
results.

>> There is some talk about using Gremlin against the Web of Data DBPedia dataset here: https://github.com/tinkerpop/gremlin/wiki/LinkedData-Sail
>> Hope that provides some inspiration,Marko.
>> http://markorodriguez.com
>> [1] http://markorodriguez.com/2011/03/30/global-vs-local-graph-ranking/[2] https://github.com/tinkerpop/rexster/wiki/Recommendation-Traversals

I've just tried
https://github.com/tinkerpop/gremlin/wiki/LinkedData-Sail again, git
pulled in my Gremlin dir, mvn clean, mvn install etc... and run
gremlin.sh, ... it seems to work, until I get to:

gremlin> labels = [g.uri('dbpprop:associatedActs'),
g.uri('dbpedia-owl:associatedMusicalArtist'),
g.uri('dbpedia-owl:MusicalArtist/associatedBand'),
g.uri('dbpprop:pastMembers'), g.uri('owl:sameAs')]
No signature of method:
com.tinkerpop.blueprints.pgm.impls.sail.impls.LinkedDataSailGraph.uri()
is applicable for argument types: (java.lang.String) values:
[dbpprop:associatedActs]
Possible solutions: use([Ljava.lang.Object;), wait(), dump(), any(),
wait(long), V(groovy.lang.Closure)
Display stack trace? [yN] y
groovy.lang.MissingMethodException: No signature of method:
com.tinkerpop.blueprints.pgm.impls.sail.impls.LinkedDataSailGraph.uri()
is applicable for argument types: (java.lang.String) values:
[dbpprop:associatedActs]
Possible solutions: use([Ljava.lang.Object;), wait(), dump(), any(),
wait(long), V(groovy.lang.Closure)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.unwrap(ScriptBytecodeAdapter.java:54)
at org.codehaus.groovy.runtime.callsite.PojoMetaClassSite.call(PojoMetaClassSite.java:46)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:40)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:124)
at groovysh_evaluate.run(groovysh_evaluate:25)
at groovysh_evaluate$run.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:40)
at groovysh_evaluate$run.call(Unknown Source)
at org.codehaus.groovy.tools.shell.Interpreter.evaluate(Interpreter.groovy:67)
at org.codehaus.groovy.tools.shell.Interpreter$evaluate.call(Unknown Source)
at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:153)
at org.codehaus.groovy.tools.shell.Groovysh$execute.callCurrent(Unknown Source)
at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:114)
at org.codehaus.groovy.tools.shell.Shell$leftShift$0.call(Unknown Source)
at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:88)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)

Marko Rodriguez

unread,

Apr 2, 2011, 3:44:03 PM4/2/11

to dan...@danbri.org, gremli...@googlegroups.com

Hey,

> The on demand linked data sail stuff is great, was wondering if anyone
> has done something more optimized too - eg. Loading up dbpedia data
> into a graphstore you have drivers for.

There is no reason you can't load the DBPedia data set into, lets say, AllegroGraph and then use Gremlin over it through the AllegroGraph Sail interface. In this way, you aren't incurring the network overhead of pulling RDF documents over the wire and populating a local RDF store instance on the fly. Once in AllegroGraph, you then write your recommendation algorithms as graph traversals.

> I like this direction. There are a lot of interesting ingredients
> around now; linked rdf in the Web, the Apache toolset, Gremlin etc.,
> ... Tricky part is assembling them into a workflow and evaluating the
> results.

Yea. There are lots of technologies out there that overlap in functionality. I suppose it depends on what you are trying to accomplish. Are you trying to do:
1. Real-time recommendations
2. Heterogenous recommendations
3. Batch processing, single threaded
...

Marko.

http://markorodriguez.com

Marko Rodriguez

unread,

Apr 2, 2011, 3:45:25 PM4/2/11

to Dan Brickley, gremli...@googlegroups.com

Hey,

> I've just tried
> https://github.com/tinkerpop/gremlin/wiki/LinkedData-Sail again, git
> pulled in my Gremlin dir, mvn clean, mvn install etc... and run
> gremlin.sh, ... it seems to work, until I get to:

Read through the tutorial and do it line by line. You can't skip ahead. You missed:
com.tinkerpop.gremlin.loaders.SailGraphLoader.load()

Marko.

Dan Brickley

unread,

Apr 2, 2011, 4:49:39 PM4/2/11

to Marko Rodriguez, gremli...@googlegroups.com

Ah, the Recommendations section of the page gave the impression of
being stand-alone.

I've taken the liberty of adding that call into the second half of the
page, ie. the extra entry here:

gremlin> g = new LinkedDataSailGraph(new MemoryStoreSailGraph())
==>sailgraph[linkeddatasail]
gremlin> com.tinkerpop.gremlin.loaders.SailGraphLoader.load()
==>null
gremlin> v = g.v('http://dbpedia.org/resource/Grateful_Dead')
==>v[http://dbpedia.org/resource/Grateful_Dead]

...since it began with a new 'g' and jumped straight into the grateful
dead example, the missing call isn't obvious to non-experts.

It's working now :)

cheers,

Dan

ps. to answer your prev question re what we're trying to do, ...
basically whatever it takes to improve TV viewing experience; could be
on demand or batch. I think certaintly heterogenous, in that it's
quite boring to link only TV shows to TV shows. If someone has bought
every book by X, they deserve a notification if X is scheduled to be
on TV. Or if they follow them on Twitter. If they follow a similar
artist, perhaps also worth notifying; so a lot of this nudges us
towards mining similarity measures from linked data. But also from
viewing data - I've access to a nice pile of 'who watched what' data
where each item is quite richly described in RDF.

Oh, I tried changing the Grateful_Dead example to begin with
Stephen_Fry instead; an actor/writer/comic/presenter. It's interesting
perhaps as an exercise to figure out how much special-case tweaking is
needed. In that case I get to ...
gremlin> labels2 = [g.uri('dbpedia-owl:presenter'),
g.uri('dbpedia-owl:creator'), g.uri('dbpedia-owl:influenced') ]
...and perhaps a few more from http://dbpedia.org/page/Stephen_Fry
(without jumping into the Category data),

... but they're 'is foo of' inverse relations, so the logic needs a
bit of tweaking. I'm still learning to think in Gremlin-ese, but if
"v.inE.outV.outE('foaf:name').inV.value" gets me the labels of nodes
with links to Stephen_Fry, as it seems to, I guess I'm on the right
path.

gremlin> v.inE{labels.contains(it.label)}.outV.outE('foaf:name').inV.value
==>Last Chance To See
==>The Crystal Cube
==>Oscar Wilde
==>QI
==>Wildlife SOS
==>A Bit of Fry & Laurie
==>Last Chance To See
==>The Crystal Cube
==>Oscar Wilde
==>Wildlife SOS
==>A Bit of Fry & Laurie
==>QI

.... trying the full exploration,
gremlin> v.inE{labels.contains(it.label) &
rand.nextBoolean()}.outV.outE.groupCount(m).loop(4){it.loops < 4}

... not sure if I got this right, but it goes racing off to oscar
wilde and related tv shows, which seems positive. The results are
relevant but one might quibble with the ordering ... see
http://pastebin.com/GaBm5gAZ for full results. Intriguing :) To cover
TV well I think we'd need to hand craft quite some rule-set, but that
could be well worth doing, especially if it can be blended with
ratings-derived data.

oh, while I'm here: what's the Gremlin for restricting the output from
http://pastebin.com/GaBm5gAZ ie. m.sort{ a,b -> b.value <=> a.value }
so we only print the labels of -say- the first 20 entities?

Marko Rodriguez

unread,

Apr 2, 2011, 4:52:12 PM4/2/11

to Dan Brickley, gremli...@googlegroups.com

Hey,

> Ah, the Recommendations section of the page gave the impression of
> being stand-alone. I've taken the liberty of adding that call into the second half of the
> page, ie. the extra entry here:

Cool. Thanks for doing that. I appreciate it.

More on the rest of your email a bit later.

Thanks Dan,
Marko.

http://markorodriguez.com

Marko Rodriguez

unread,

Apr 3, 2011, 11:34:53 AM4/3/11

to Balthasar Schopman, Joshua Shinavier, gremli...@googlegroups.com, Dan Brickley

Hi Balthasar,

> I see lots of great software like Gremlin at Tinkerpop... have you got any (efforts towards building) mockups of a recommender system, like you described on your blog?

Me personally, I have a few recommender projects and I use the TinkerPop stack for them. That is, Blueprints/Pipes/Gremlin/Rexster. In short:

1. Model the data as a graph and parse/insert it into a Blueprints-enabled graph database.
2. Write recommendation algorithms for that particular data model in Gremlin.
3. Do Pipes-native optimizations and call those from Gremlin.
4. Expose traversal endpoints over REST through Rexster.

Thats my pattern,
Marko.

http://markorodriguez.com

Marko Rodriguez

unread,

Apr 3, 2011, 12:03:13 PM4/3/11

to gremli...@googlegroups.com

Oh. One more thing. I also use Frames [ http://frames.tinkerpop.com ] these days when parsing data into a Blueprints graph. Makes the parsing code much easier to read and less error prone. For those in the RDF world, Frames is to Blueprints as Elmo is to Sail.

See ya,

Marko.

http://markorodriguez.com

Marko Rodriguez

unread,

Apr 3, 2011, 2:56:55 PM4/3/11

to Dan Brickley, gremli...@googlegroups.com

Hey,

> If someone has bought
> every book by X, they deserve a notification if X is scheduled to be
> on TV.
> Or if they follow them on Twitter.
> If they follow a similar
> artist, perhaps also worth notifying; so a lot of this nudges us
> towards mining similarity measures from linked data.

Okay. So all those types of mappings you said in English have a corresponding Gremlin representation.

> ... but they're 'is foo of' inverse relations, so the logic needs a
> bit of tweaking. I'm still learning to think in Gremlin-ese, but if
> "v.inE.outV.outE('foaf:name').inV.value" gets me the labels of nodes
> with links to Stephen_Fry, as it seems to, I guess I'm on the right
> path.

One note: In Linked Data, you can't get the incoming links for the Web of Data (unless you already have the data from the outgoing of the subject URI). An RDF document only as the outgoing edges from a particular URI. Thus, you might run into issues like that... Just a heads up.

> Intriguing :) To cover
> TV well I think we'd need to hand craft quite some rule-set, but that
> could be well worth doing, especially if it can be blended with
> ratings-derived data.

Exactly----the way you write a recommendation algorithm is by studying your domain model, studying the graph structure of your data (graph statistics), and defining traversals that exploit that structure in a meaningful way. Thus, it basically all about a "hand crafted rule-set" for each dataset.

> oh, while I'm here: what's the Gremlin for restricting the output from
> http://pastebin.com/GaBm5gAZ ie. m.sort{ a,b -> b.value <=> a.value }
> so we only print the labels of -say- the first 20 entities?

Uh. This works:

m = m.sort{ a,b -> b.value <=> a.value }
m.subMap((m.keySet() as List)[0..10])

Sorta gross--probably some better way to do it in Groovy.. :/

Have fun,
Marko.

http://markorodriguez.com

Joshua Shinavier

unread,

Apr 4, 2011, 6:37:52 PM4/4/11

to Balthasar Schopman, gremli...@googlegroups.com, Dan Brickley, Marko Rodriguez

Hi Balthasar,

On Sun, Apr 3, 2011 at 7:42 PM, Balthasar Schopman <scho...@cs.vu.nl> wrote:
[...]
> Currently exploring the content-based part and looking for predicates in sets such as LinkedMDB and DBpedia that provide both useful knowledge and have decent usage frequencies, ie are statistically likely to provide recommendations on any given day.

One "challenging" thing about DBpedia is the heterogeneity of the
descriptions you tend to find there. Just when you think you have
found a great query pattern for sucking in relevant data for your
application, you discover that it only works for a fraction of the
resources you're interested in. Graph algos seem better suited that
sort of data than neat and clean SPARQL queries.

> This ontology looks useful indeed. My intensions are moving more and more towards producing RDF as output rather than plain JSON, so we might start using this ontology soon. We might want to extend this ontology, so we can include explanations of recommendations.

What sort of explanation do you have in mind? Natural language
comments, workflows, proof trees? For a potential project of mine, I
am considering using a very simple kind of explanation with Named
Graphs: to "explain" a recommendation, you merely point to RDF graphs
containing the original statements which were used to derive it. E.g.
when Peter is recommended to me as someone I might like to know, a
collection of named graphs are provided which contain statements to
the effect that Peter and I both know Marko and that we both like
graph databases.

[...]
> My experience is that closed-world reasoning is a must in applications thats provide results real-time.

Do you mean literally that closed world reasoning is faster (which I
would have to disagree with), or that closed-world concepts like query
windows are necessary for real-time applications? I would agree that
closed world reasoning is necessary for very many realistic end user
applications, although there's something elegant about OWA, especially
for data streams.

On the subject of ranking traversal, I have just added a 'rank'
primitive to Ripple which allow you to drop into a closed-world
ranking evaluator from the default environment, so you can mix and
match traversal styles. Here's a breadth-first FOAF traversal which
finds a ranked list of people in your social neighborhood.

@prefix foaf: <http://xmlns.com/foaf/0.1/> .

@define person foafSpread:
person foaf:knows >> 0.85 amp >> :foafSpread? >> .

@define tim: <http://www.advogato.org/person/timbl/foaf.rdf#me> .

# 30 is the number of computational cycles we're willing to spend on
the traversal.
:tim >> :foafSpread >> 30 rank >> .

This gives:

[1] <http://www.advogato.org/person/connolly/foaf.rdf#me> 4.24575E0
[2] <http://www.advogato.org/person/presbrey/foaf.rdf#me> 3.52325E0
[3] <http://www.advogato.org/person/timbl/foaf.rdf#me> 3.3957499999999996E0
[4] <http://www.advogato.org/person/oshani/foaf.rdf#me> 1.5724999999999998E0
[5] <http://www.advogato.org/person/jwz/foaf.rdf#me> 1.3366249999999997E0
[6] <http://www.advogato.org/person/gtaylor/foaf.rdf#me> 1.3366249999999997E0
etc.

Best,

Josh

> I see lots of great software like Gremlin at Tinkerpop... have you got any (efforts towards building) mockups of a recommender system, like you described on your blog?
>

> cheers,
> Balthasar
>
>

Bob Ferris

unread,

Apr 4, 2011, 7:14:41 PM4/4/11

to gremli...@googlegroups.com, Joshua Shinavier, Balthasar Schopman, Dan Brickley, Marko Rodriguez

Hi,

On 4/5/2011 12:37 AM, Joshua Shinavier wrote:

>> This ontology looks useful indeed. My intensions are moving more and more towards producing RDF as output rather than plain JSON, so we might start using this ontology soon. We might want to extend this ontology, so we can include explanations of recommendations.
>
>
> What sort of explanation do you have in mind? Natural language
> comments, workflows, proof trees? For a potential project of mine, I
> am considering using a very simple kind of explanation with Named
> Graphs: to "explain" a recommendation, you merely point to RDF graphs
> containing the original statements which were used to derive it. E.g.
> when Peter is recommended to me as someone I might like to know, a
> collection of named graphs are provided which contain statements to
> the effect that Peter and I both know Marko and that we both like
> graph databases.

The intended way to explain recommendations, i.e., their evidence, is to
make use of the Similarity Ontology [1]. Please have a look at the
extended music recommendation example [2] that illustrates how such a
more detailed explaination can look like. Furthermore, a workflow of an
association method (sim:AssociationMethod) can be described in a named
graph, see [3]. Another opportunity is to make use of the Association
Ontology [4], e.g., to describe musical contexts.
A starting point for further reading about provenance issues might be
the final report of the Provenance XG [5].

Cheers,

Bob

[1] http://purl.org/ontology/similarity/
[2]
http://smiy.sourceforge.net/rec/spec/recommendationontology.html#sec-ext-music-rec-example
[3] http://kakapo.dcs.qmul.ac.uk/ontology/musim/0.2/musim.html#s33
[4] http://purl.org/ontology/ao/core#
[5] http://www.w3.org/2005/Incubator/prov/XGR-prov-20101214/

Christian S.

unread,

Aug 9, 2013, 11:07:15 AM8/9/13

to gremli...@googlegroups.com

Hi Marko

Do you have an updated link to your recommendation system page?

thanks

Christian

Marko Rodriguez

unread,

Aug 9, 2013, 12:18:44 PM8/9/13

to gremli...@googlegroups.com

Hi Christian,

> Do you have an updated link to your recommendation system page?

I don't. When Aurelius started to be more "Aurelius" and not "Marko Rodriguez," I gutted lots of the information on my homepage and put it on thinkaurelius.com. Unfortunately, the recommendation system page was simply deleted (not transferred).

Marko.

http://markorodriguez.com

Reply all

Reply to author

Forward