Cannot run a page rank

189 views
Skip to first unread message

NQuinn

unread,
Jun 9, 2017, 5:54:56 PM6/9/17
to Gremlin-users
Hopefully, I am not missing something but when I try to run a peer pressure clustering algorithm or page rank algorithm on spark computer, I keep getting an IllegalArgumentException because it is looking for a property that it cannot find. Something like this: 

Caused by: java.lang.IllegalStateException: The property does not exist as the key has no associated value for the provided element: v[pId-113]:gremlin.pageRankVertexProgram.edgeCount

Am I missing something? Do I need to add an empty property with this key? It is weird that I cannot find any information regarding this exception. Thanks for your help. I am using Tinkerpop 3.2.3.

Daniel Kuppitz

unread,
Jun 10, 2017, 2:35:25 AM6/10/17
to gremli...@googlegroups.com
How does your code look like? Did you try the examples shown in the docs?

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/1d9cefa5-4efb-4659-b323-8e9e624ee2a0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

NQuinn

unread,
Jun 10, 2017, 1:01:55 PM6/10/17
to Gremlin-users
Daniel--

Yes. I am using the HadoopGraph and the SparkGraphComputer. Here is what my query looks like:

g.traversal().withComputer(Computer.compute(SparkGraphComputer.class)).V().pageRank()

where g is the HadoopGraph.  I see that there are some unit tests in the Tinkerpop repo for peerPressure and pageRank on top of Spark, so I am not sure why i am not able to run this query except for some configuration issue or for the underlying HadoopGraph.  Any thoughts?
Thanks for your insight!
Nick

On Friday, June 9, 2017 at 11:35:25 PM UTC-7, Daniel Kuppitz wrote:
How does your code look like? Did you try the examples shown in the docs?

Cheers,
Daniel

On Fri, Jun 9, 2017 at 11:54 PM, NQuinn <nickqu...@gmail.com> wrote:
Hopefully, I am not missing something but when I try to run a peer pressure clustering algorithm or page rank algorithm on spark computer, I keep getting an IllegalArgumentException because it is looking for a property that it cannot find. Something like this: 

Caused by: java.lang.IllegalStateException: The property does not exist as the key has no associated value for the provided element: v[pId-113]:gremlin.pageRankVertexProgram.edgeCount

Am I missing something? Do I need to add an empty property with this key? It is weird that I cannot find any information regarding this exception. Thanks for your help. I am using Tinkerpop 3.2.3.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

NQuinn

unread,
Jun 10, 2017, 1:05:29 PM6/10/17
to Gremlin-users
Here is my configuration. Any suggestions would be helpful. Thanks!

configuration.setProperty("spark.master", "local[4]");
configuration.setProperty("spark.serializer", GryoSerializer.class.getCanonicalName());
configuration.setProperty(Graph.GRAPH, HadoopGraph.class.getName());
configuration.setProperty(Constants.GREMLIN_HADOOP_GRAPH_READER, TestInputRDD.class.getCanonicalName());
configuration.setProperty(Constants.GREMLIN_HADOOP_GRAPH_READER_HAS_EDGES, true);
configuration.setProperty(Constants.GREMLIN_HADOOP_GRAPH_WRITER, TestOutputRDD.class.getCanonicalName());
configuration.setProperty(Constants.GREMLIN_HADOOP_JARS_IN_DISTRIBUTED_CACHE, false);
configuration.setProperty(Constants.GREMLIN_SPARK_PERSIST_CONTEXT, true);

I don't think that the configuration is affecting it except maybe I am missing a configuration setting.
Nick

Daniel Kuppitz

unread,
Jun 12, 2017, 7:26:48 AM6/12/17
to gremli...@googlegroups.com
Hmm, still no idea. Can you provide a full stack trace? Also, does this only happen with your own graph or also with TinkerPop's toy graphs?

Cheers,
Daniel


To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/094b573a-7525-47ae-8554-7c4116f7ff89%40googlegroups.com.
Message has been deleted

NQuinn

unread,
Jun 12, 2017, 12:57:36 PM6/12/17
to Gremlin-users
Daniel--

I am using the HadoopGraph.  Here is an example of a single full stack trace.  I get a very similar exception when running the peer pressure step on spark except it is looking for the "cluster" property.  Thoughts? Thanks for your help!
Nick

java.lang.IllegalStateException: The property does not exist as the key has no associated value for the provided element: v[cId-26]:gremlin.pageRankVertexProgram.edgeCount
at org.apache.tinkerpop.gremlin.structure.Property$Exceptions.propertyDoesNotExist(Property.java:155)
at org.apache.tinkerpop.gremlin.structure.Element.lambda$value$1(Element.java:94)
at org.apache.tinkerpop.gremlin.structure.Property.orElseThrow(Property.java:101)
at org.apache.tinkerpop.gremlin.structure.Element.value(Element.java:94)
at org.apache.tinkerpop.gremlin.process.computer.util.ComputerGraph$ComputerElement.value(ComputerGraph.java:162)
at org.apache.tinkerpop.gremlin.process.computer.ranking.pagerank.PageRankVertexProgram.execute(PageRankVertexProgram.java:168)
at org.apache.tinkerpop.gremlin.spark.process.computer.SparkExecutor.lambda$null$4(SparkExecutor.java:118)
at org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils$3.next(IteratorUtils.java:247)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

NQuinn

unread,
Jun 13, 2017, 12:33:54 AM6/13/17
to Gremlin-users
Okay. I figured it out. It has to do when you are adding duplicate vertices to the underlying graph.  It works when there are no duplicates, but when there are duplicates, it throws that exception. I am not sure if that is an issue. Thoughts?
Thanks!
Nick

Daniel Kuppitz

unread,
Jun 13, 2017, 4:50:35 AM6/13/17
to gremli...@googlegroups.com
What do you mean by duplicate vertices? Vertices with the same id? This shouldn't even be allowed.

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.

NQuinn

unread,
Jun 13, 2017, 1:35:24 PM6/13/17
to Gremlin-users
Yes, duplicate vertices with the same id. When I write those vertices to spark, there is no filtering out of vertices with duplicate ids. I am also struggling with duplicate edges that are written.  I have temporarily found a way around it, but it involves writing it to a TinkerGraph to keep track of elements that have already been written.

NQuinn

unread,
Jun 13, 2017, 7:04:57 PM6/13/17
to Gremlin-users
I found that my dataset does not work with the following gremlin query without duplicates with a similar missing property exception as described below:

g.V().peerPressure().by("cluster").by(__.outE("knows")).pageRank(1.0d).by("rank").by(__.outE("knows")).times(1).<Object, Number>group().by("cluster").by(__.values("rank").sum()).limit(100);

But when I removed the second "by("cluster")", it is working.  

g.V().peerPressure().by("cluster").by(__.outE("knows")).pageRank(1.0d).by("rank").by(__.outE("knows")).times(1).<Object, Number>group().by(__.values("rank").sum()).limit(100);

Not sure, but it sure seems as if the pageRank step seemed to remove the cluster property.  I am not sure how the unit test here is working except maybe because the dataset is different.  

Nick

NQuinn

unread,
Jun 23, 2017, 4:04:49 PM6/23/17
to Gremlin-users
I figured it out.  The writer had to be set to the GryoOutputFormat and I had it set to a custom output rdd.  I guess I wasn't handling it correctly.  

configuration.setProperty(Constants.GREMLIN_HADOOP_GRAPH_WRITER, GryoOutputFormat.class.getCanonicalName());

Note: If you use the GyroOutputFormat, you need to also set the GREMLIN_HADOOP_OUTPUT_LOCATION like so:

configuration.setProperty(Constants.GREMLIN_HADOOP_OUTPUT_LOCATION, TestHelper.makeTestDataDirectory(XXXTest.class, "XXXTest"));

Hopefully, this will help out any poor soul that encounters the same problem. For those who can manage this, is this documented somewhere?

Thanks!
Nick
Reply all
Reply to author
Forward
0 new messages