Cosine Similarity

381 views
Skip to first unread message

Rodrigo Lacerda

unread,
Jul 23, 2015, 8:42:06 PM7/23/15
to Gremlin-users
It´s possible to calculate the cosine similarity directly with gremlin?

Cosine similarity is the cosine of the angle between two n-dimensional vectors in an n-dimensional space. It is the dot product of the two vectors divided by the product of the two vectors’ lengths (or magnitudes). For two vectors A and B in an n-dimensional space:

I saw a link with Neo4j making this calculation .

The link is : http://www.neo4j.org/graphgist?8173017

Tkhs!!


Daniel Kuppitz

unread,
Jul 24, 2015, 12:46:30 PM7/24/15
to gremli...@googlegroups.com
The queries from the Neo4j Gist converted to Gremlin:


Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/5953413a-328f-42da-bd61-af8083535462%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Stephen Mallette

unread,
Jul 24, 2015, 1:10:30 PM7/24/15
to Gremlin-users
Very cool, daniel....I started messing with this earlier today for fun but realized it was going to take more than a few lines of gremlin to give a complete answer.  

Daniel Kuppitz

unread,
Jul 24, 2015, 1:24:53 PM7/24/15
to gremli...@googlegroups.com
Forgot to mention: You'll need the current master in order to make that Gist work.

Cheers,
Daniel

Rodrigo Lacerda

unread,
Jul 24, 2015, 6:12:30 PM7/24/15
to Gremlin-users, m...@gremlin.guru
Wow, perfect!

I'll stay the whole weekend studying these queries !

Now another doubt in a matter of performance , is it better to do this calculation directly with gremlin or is it better to do it another way , eg using Python.

Anyway I will have to go through all the data.

any idea?

thank you!

Russell Jurney

unread,
Jul 24, 2015, 7:30:40 PM7/24/15
to gremli...@googlegroups.com

Rodrigo Lacerda

unread,
Nov 5, 2015, 7:16:21 AM11/5/15
to Gremlin-users
Hello Daniel!!

I need a little help!

I updated my titan to version 1.0.0, and the query similarity calculation is not working.

I looked in and saw no documentation found change that could generate the error.

Can you help me?

905932 [gremlin-server-exec-6] WARN  org.apache.tinkerpop.gremlin.server.op.AbstractEvalOpProcessor  - Exception processing a script on request [RequestMessage{, requestId=ead79fca-c092-49ab-9e14-afdf6c297afd, op='eval', processor='', args={gremlin= 
g.V().match(
  __.as("p1").outE("rated").as("r1"),
  __.as("r1").inV().as("m"),
  __.as("m").inE("rated").as("r2"),
  __.as("r2").outV().as("p2")
).where("p1", neq("p2")).
  group().by(select("p1","p2")).
          by(select("r1","r2").by("rating")).unfold().as("kv").
  mapKeys().addOutE("p1", "similarity", "p2").as("e").
  sideEffect {
    def r = it.path().get("kv").getValue()
    def xyDotProduct = r.collect {it.r1*it.r2}.sum()
    def xLength = Math.sqrt(r.collect {it.r1*it.r1}.sum())
    def yLength = Math.sqrt(r.collect {it.r2*it.r2}.sum())
    def similarity = xyDotProduct / (xLength * yLength)
    it.get().property("similarity", similarity)
  }.iterate()}}].
java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to java.util.Map
at org.apache.tinkerpop.gremlin.process.traversal.step.map.MapKeysStep.flatMap(MapKeysStep.java:41)
at org.apache.tinkerpop.gremlin.process.traversal.step.map.FlatMapStep.processNextStart(FlatMapStep.java:47)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:140)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:54)
at org.apache.tinkerpop.gremlin.process.traversal.step.map.FlatMapStep.processNextStart(FlatMapStep.java:46)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:140)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:54)
at org.apache.tinkerpop.gremlin.process.traversal.step.sideEffect.SideEffectStep.processNextStart(SideEffectStep.java:38)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:126)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:37)
at org.apache.tinkerpop.gremlin.process.traversal.Traversal.iterate(Traversal.java:166)
at org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal.iterate(GraphTraversal.java:1074)
at org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal$Admin.iterate(GraphTraversal.java:167)
at org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.DefaultGraphTraversal.iterate(DefaultGraphTraversal.java:45)
at org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal$Admin$iterate.call(Unknown Source)
at Script2.run(Script2.groovy:8)
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:525)
at org.apache.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:365)
at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:233)
at org.apache.tinkerpop.gremlin.groovy.engine.ScriptEngines.eval(ScriptEngines.java:96)
at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$22(GremlinExecutor.java:244)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Thank you!


Em sexta-feira, 24 de julho de 2015 14:24:53 UTC-3, Daniel Kuppitz escreveu:

Daniel Kuppitz

unread,
Nov 5, 2015, 10:03:46 AM11/5/15
to gremli...@googlegroups.com
Hi Rodrigo,

we somehow lost some code in the 3.0 development and didn't even notice it until now. Consider it to be a bug.
However 3.0 is closed and 3.1 no longer has mapKeys(). Hence it will be "fixed" in the next Titan release where you will use select(keys) instead of mapKeys().

Sorry for that, but there's really not much we can do about that at this point.

Cheers,
Daniel


Reply all
Reply to author
Forward
0 new messages