Adding Rows in Batches and trying to Save Vertex Ids

Venkat Dasari

unread,

May 3, 2021, 8:22:29 PM5/3/21

to Gremlin-users

Hi,

I am trying to add the vertices in batches, and then trying to save the vertex ids for later use to build edges. Whenever I call .toList() after every 1000 records, I always get one record. What am I missing here?

I am stuck here since more than a month, and I don't know how else to load large data into the graph. If this works, I want to do the same thing using Spark and parallelize the load.

private static GraphTraversalSource createTraversal() {
final String[] contactPoints = new String[]{"someIpAddress"};
String graphName = "";
final MessageSerializer serializer = Serializers.GRYO_V3D0.simpleInstance();
final Map<String, Object> gryoConfig = new HashMap<>();
gryoConfig.put("ioRegistries",
Arrays.asList(JanusGraphIoRegistry.class.getName(), TinkerIoRegistryV3d0.class.getName()));
serializer.configure(gryoConfig, null);
final String remoteTraversalSource = graphName.isEmpty() ? "g" : graphName + "_traversal";

GraphTraversalSource g = traversal().withRemote(DriverRemoteConnection.using(Cluster.build()
.serializer(serializer)
.addContactPoints(contactPoints)
.create(),
remoteTraversalSource));

return g;

}

public status void main(String[] args) throws Exception {

StopWatch sw = new StopWatch();
Random rnd = new Random();

sw.start()

for (int j = 0; j < 100; j++) {
sout("========== ITERATION ===== " + j);
GraphTraversalSource g = createTraversal();
GraphTraversal<Vertex, Vertex> vertexT = null;
List<Vertex> vertexList = new ArrayList<>();
for (int i = 0; i < 1000; i++) {
vertexT = g.addV("Market");

int mktId = rnd.nextInt();
vertexT = vertexT.property("mkt_id", mktId);
}
vertexList = vertexT.toList();
sout("Size of Iteration: " + vertexList.size());
sw.split();
sout("Execution for Iteration : " + (sw.getSplitTime()/1000) + " seconds");

}

}

Bassem Naguib

unread,

May 4, 2021, 1:36:39 AM5/4/21

to Gremlin-users

There are two problems I can see in your code. First, this line

vertexT = g.addV("Market");

overwrites the previous traversal without executing it.

Your code needs to look something like that.

GraphTraversal<Vertex, Vertex> vertexT = null;

for (int i = 0; i < 1000; i++)

{

if (vertexT == null)

vertexT = g.AddV("Market");

else

vertexT = vertexT.AddV("Market");

...

}

vertexT.Iterate();

Second problem is that you are expecting

vertexList = vertexT.toList();

to give you a list of all the vertices you added. But it will only give you the last vertex you added because the last step in your traversal is AddV() which returns the vertex that was just added.

If you would like to get the full list of vertices, you need to use something like that

IList<Vertex> vertexList = g.V().HasLabel("Market").ToList();

Or if you just need the count, not the vertices

long count = g.V().HasLabel("Market").Count().Next();

My code is C# code. Hopefully the Java code will not be too different.

Venkat Dasari

unread,

May 6, 2021, 11:25:35 AM5/6/21

to Gremlin-users

This would take forever to get me the vertexList, since the graph is going to be so massive.

Even if I add, indexing, that would make the load slower, (although its in the plan).

The way I did is, parallelize the load using spark, and then for every 1000 records do a .toList, get those ids back and save it to a data frame and move on.

It worked perfectly fine. I was able to load 1.5 million rows in 20 seconds, but I am occasionally seeing weird errors as below:

Caused by: java.util.concurrent.CompletionException: java.lang.IllegalStateException: Connection to server is no longer active

at java.util.concurrent.CompletableFuture.reportJoin(CompletableFuture.java:375)

at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)

at org.apache.tinkerpop.gremlin.driver.ResultSet.one(ResultSet.java:119)

at org.apache.tinkerpop.gremlin.driver.ResultSet$1.hasNext(ResultSet.java:171)

at org.apache.tinkerpop.gremlin.driver.ResultSet$1.next(ResultSet.java:178)

at org.apache.tinkerpop.gremlin.driver.ResultSet$1.next(ResultSet.java:165)

Venkat Dasari

unread,

May 6, 2021, 5:40:52 PM5/6/21

to Gremlin-users

So, now I am running on Spark and tried to load some 20 million vertices with 20 partitions, and creating a graph traversal for every 10000 rows and committing for every 1000 rows, the error I see is as below.

Is there any configuration properties that I need to change or any other suggestions that can help?

java.util.concurrent.CompletionException: java.lang.IllegalStateException: Connection to server is no longer active

at java.util.concurrent.CompletableFuture.reportJoin(CompletableFuture.java:375)

at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)

at org.apache.tinkerpop.gremlin.driver.ResultSet.one(ResultSet.java:119)

at org.apache.tinkerpop.gremlin.driver.ResultSet$1.hasNext(ResultSet.java:171)

at org.apache.tinkerpop.gremlin.driver.ResultSet$1.next(ResultSet.java:178)

at org.apache.tinkerpop.gremlin.driver.ResultSet$1.next(ResultSet.java:165)

at org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteTraversal$TraverserIterator.next(DriverRemoteTraversal.java:146)

at org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteTraversal$TraverserIterator.next(DriverRemoteTraversal.java:131)

at org.apache.tinkerpop.gremlin.driver.remote.DriverRemoteTraversal.nextTraverser(DriverRemoteTraversal.java:112)

at org.apache.tinkerpop.gremlin.process.remote.traversal.step.map.RemoteStep.processNextStart(RemoteStep.java:80)

at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:129)

at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:39)

at org.apache.tinkerpop.gremlin.process.traversal.Traversal.fill(Traversal.java:184)

at org.apache.tinkerpop.gremlin.process.traversal.Traversal.toList(Traversal.java:122)

at com.iqvia.janus.JanusRemoteTraversalVertexMapPartitionJava$1$1.hasNext(JanusRemoteTraversalVertexMapPartitionJava.java:185)

at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:43)

at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)

Caused by: java.lang.IllegalStateException: Connection to server is no longer active

at org.apache.tinkerpop.gremlin.driver.Handler$GremlinResponseHandler.lambda$channelInactive$0(Handler.java:213)

at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4707)

at org.apache.tinkerpop.gremlin.driver.Handler$GremlinResponseHandler.channelInactive(Handler.java:213)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)

at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)

at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)