Fastest way to Load data/Create graph

3,336 views
Skip to first unread message

lukezh...@gmail.com

unread,
Jan 2, 2018, 5:46:31 PM1/2/18
to JanusGraph users
Hi So I'm new to JanusGraph and been following the ThinkerPop3 Tutorials.

So far I've been creating my vertex and edges in my groovy file. Example below:

graph = TinkerGraph.open()
marko = graph.addVertex(T.label, "person", T.id, 1, "name", "marko", "age", 29); 
vadas = graph.addVertex(T.label, "person", T.id, 2, "name", "vadas", "age", 27);
lop = graph.addVertex(T.label, "software", T.id, 3, "name", "lop", "lang", "java");
josh = graph.addVertex(T.label, "person", T.id, 4, "name", "josh", "age", 32);
ripple = graph.addVertex(T.label, "software", T.id, 5, "name", "ripple", "lang", "java");
peter = graph.addVertex(T.label, "person", T.id, 6, "name", "peter", "age", 35);
cn.dealmoon = graph.addVertex(T.label, "person", T.id, 14, "name", "dealmoon", "age", 1111);
marko.addEdge("knows", vadas, T.id, 7, "weight", 0.5f); 
marko.addEdge("knows", josh, T.id, 8, "weight", 1.0f);
marko.addEdge("created", lop, T.id, 9, "weight", 0.4f);
josh.addEdge("created", ripple, T.id, 10, "weight", 1.0f);
josh.addEdge("created", lop, T.id, 11, "weight", 0.4f);
peter.addEdge("created", lop, T.id, 12, "weight", 0.2f);
...

File name as init.groovy

And I have around 4000 vertexes and 12000 edges, which I don't think it's that big of graph. 
I've been using this command to load the data/create the graph with bin/gremlin.sh -i init.groovy.  Whole loading process takes like 30 mins, which is super slow in my opinion.

Question 1. Am I doing this the right way? If not, what's the most fastest/efficiency way to load the data I want?

Question 2. Is there a way to save the graph I created? So I don't have to load the data and create the graph everytime I open up the gremlin console.

Thx in advance.

Robert Dale

unread,
Jan 2, 2018, 6:54:48 PM1/2/18
to lukezh...@gmail.com, JanusGraph users
First, you're not actually using JanusGraph, but TinkerGraph [1], an in-memory graph database provided by TinkerPop. Just want to make sure you are aware of that distinction.

It's impossible to tell why loading your graph would be so slow without seeing your code. Are you doing any lookups? Maybe indexes would help.  Are the JVM memory settings sufficient?

You can persist TinkerGraph.  See 1.



Robert Dale

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/e74ab0ff-cdf9-467d-9490-9ecf716ee2c4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

lukezh...@gmail.com

unread,
Jan 2, 2018, 10:18:39 PM1/2/18
to JanusGraph users
Hi Robert, 
Thank you for replying this so fast, really appreciate it.

And good catch with the TinkerGraph, I wasn't aware of that at all. Could you explain more about the difference between TinkerGraph and JanusGraph? I'm kinda confused since it seems like once I change graph = TinkerGraph.open() to graph = JanusGraph.open(), everything should work the same. 

The example code I provided above is the code I'm running through the Gremlin terminal which is also the Groovy shell. I just have more Vertex and Edges.

And I haven't got to the part to do any look up or anything yet. Just loading all these vertexs and edges already cost more than 30 mins. 

The link you provided is almost exactly the step I follow.

Robert Dale

To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.

Robert Dale

unread,
Jan 3, 2018, 4:00:27 PM1/3/18
to lukezh...@gmail.com, JanusGraph users
You can find the description of JanusGraph on the main website at http://janusgraph.org/

Robert Dale

To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/5ee39135-899f-4104-94da-29fc6dbc8d44%40googlegroups.com.

Robert Dale

unread,
Jan 3, 2018, 4:22:57 PM1/3/18
to Lujia Zhang, JanusGraph users
Wow, that's incredible. I got 7 mins for 5k vertexes, 5k edges. The time must be the overhead of parsing, executing groovy.  Rewriting it as a loop takes a split-second. You'd be better off loading and parsing a file (csv, json, ...).

Robert Dale

Don Omondi

unread,
Jan 3, 2018, 6:06:41 PM1/3/18
to JanusGraph users
This is a very interesting discussion because in my opinion, anybody new to JanusGraph (and perhaps Graphs in general) might run into this, I know I certainly did. The assumption that after following the docs you can just scale it into the thousands and things start to look different. I also think this is why there are so many questions/discussion here on bulk loading.

If it's not too much @Robert could you share a little more on how you rewrote it as a loop, think it will help @Lujia and others who find this.

Debasish Kanhar

unread,
Jan 4, 2018, 10:56:12 AM1/4/18
to JanusGraph users
I also had a similar issue, and I tried manually converting to GraphSON using my Programming language (Python), and pushed the GraphSON using single Bulk loader query. That somehow improved performance, but its nowhere what would be ideal. (Pushed 300 nodes in ~ 5 minutes without GraphSON while using it took ~2.5 minutes).

hoda.mo...@gmail.com

unread,
Jan 4, 2018, 11:36:43 AM1/4/18
to JanusGraph users
Hi Robert,
I am new to janusgraph. I came across some of your responses in this group so I thought you may be able to help me. Can I create an edge between two graphs? I am using Hbase for my backend and I have two graphs in it. But when I try to create an edge between them I get 
java.lang.IllegalStateException: The vertex or type is not associated with this transaction [v[4128]]

Can you guide me how I can add edges between two graphs?

Thanks,

rahul.n.m...@verizon.com

unread,
Jan 4, 2018, 4:18:43 PM1/4/18
to JanusGraph users

 The assumption that after following the docs you can just scale it into the thousands and things start to look different. I also think this is why there are so many questions/discussion here on bulk loading.

I completely agree with you. I have set up JansuGraph with Cassandra backend. Now I am trying to figure out how to efficiently load data from SQLServer and create vertex/edges (appx 1 million nodes) and load into JanusGraph.

Kevin Schmidt

unread,
Jan 4, 2018, 5:17:37 PM1/4/18
to rahul.n.m...@verizon.com, JanusGraph users
FWIW, I'm able to load 1.05M vertices that I'm getting from a query against MySQL and load them into JanusGraph in right around 21 minutes.  This from a standalone Java program using the JanusGraph/Tinkerpop APIs with a back-end that is a single node Cassandra on the same machine the Java program is running on.

After this initial vertex load, I then have added 133K more vertices and over 23M edges in about 75 minutes.

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.

Don Omondi

unread,
Jan 4, 2018, 6:17:46 PM1/4/18
to JanusGraph users
Kevin this is great and much better results than many of us get. But I think to really help out, you should provide a bit more on implementation details. Because, I think it's clear that using the method @Lujia posted does not work and @Robert also showed his surprise that it takes that long, yet this is what is in the documentation and how to add vertexes and edges. Furthermore, for bulk-loading we are told just set bulk-loading to true and try parallel bulk requests. But this is too slow. I know the JanusGraph dev's have been talking of writing bulk-loading examples but until then people like you can really help out the community by giving more details on what and how you did it.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.

Kevin Schmidt

unread,
Jan 4, 2018, 7:23:53 PM1/4/18
to Don Omondi, JanusGraph users
The gist of my code adding the million vertices is:

            PreparedStatement ps = conn.prepareStatement(sql);
            ResultSet rs = ps.executeQuery();

            int i = 0;

            while (rs.next()) {
                Vertex v = this.graph.addVertex("person");
                v.property("id", rs.getInt("id"));
                // more properties added
                // ...

                if (++i % 100 == 0) {
                    this.graph.tx().commit();
                }
            }

            this.graph.tx().commit();

So it is just adding a vertex and up to 10 properties and doing a commit every 100 vertex adds.

To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/d0f3391d-c5f5-4c07-84b7-509c0781443d%40googlegroups.com.

Kevin Schmidt

unread,
Jan 4, 2018, 7:31:51 PM1/4/18
to Don Omondi, JanusGraph users
Note a big difference in what I'm doing from what I think others have described is I'm doing this from Java using JanusGraph embedded, e.g. not going through a Gremlin Server and not issuing Gremlin in a Gremlin Console that needs to be parsed in any way.  I have it on my list of things to do to try this interacting with a Gremlin Server via gremlin-client instead to see how it performs, but based on what others have reported my guess is it will be a lot slower.

Liping Huang

unread,
Jan 16, 2018, 2:11:46 AM1/16/18
to JanusGraph users
Robert, it is better to show your codes indeed.

There are always get such questions frequently, bulking loading too slow, ohhh I can write xxxx nodes/edges per second, but wait, what's the environment? PC? Laptop? Server? or Power..... i7? i5? SSD? SATA? how about the JVM setting?  single Cassandra or Cluster? what's the codes? but seems there is nothing, what I saw is just slow, fast....


在 2018年1月4日星期四 UTC+8上午5:22:57,Robert Dale写道:

Robert Dale


Robert Dale

lakshay....@gmail.com

unread,
Jun 5, 2018, 10:18:49 AM6/5/18
to JanusGraph users
Hi,
i am new to Janus graph and want to insert use spark to insert data into janusgraph from mysql server. Can you tell me where to start or some resources?

Steven Harlow

unread,
Jun 6, 2018, 6:17:48 PM6/6/18
to JanusGraph users

liduoXu

unread,
Jun 15, 2018, 2:58:49 AM6/15/18
to JanusGraph users
And I have a question, how to generate graphSon files for hundreds of millions of nodes, and specify id as a string type?

在 2018年1月4日星期四 UTC+8下午11:56:12,Debasish Kanhar写道:

jx ping

unread,
Jun 21, 2018, 10:34:35 PM6/21/18
to JanusGraph users


在 2018年6月15日星期五 UTC+8下午2:58:49,liduoXu写道:
there is a way you can consider. you should control id assign your self ,don't use janusgraph's id assign,I have use Janusgraph's id assing , the speed is only 300 / second ,if you control id assign yourself  ,the speed will approach 1500 /second ,but is still slow , if you have any other idea ,please tell me 

Shu SHANG

unread,
Oct 22, 2018, 11:31:55 AM10/22/18
to JanusGraph users
It is really a great suprise that Bulk Loading has become so frequently a problems to many newbiees of JanusGraph, indead, data integration has always been a problem to any database systems, but many of them provide a good way to do this.

Recently, Baidu open sourced a large-scale graph databse called HugeGraph, which is based on many ideas from Titan, JanusGraph and TinkerPop Framework. 

Actually, HugeGraph provide a good way of doing this, called HugeGraph-loader(https://github.com/hugegraph/hugegraph-loader).

I think this is actually a better way of doing it, instead of discussing it all around and cannot make the best choice betwwen toolk like BulkLoadVertexProgram, Hadoop, Spark, why not JanusGraph community just provide a toolkit/plugin to help newbies ? With this kind of tools, you just need to consider the input and output format of your data (csv, json ...) instead of struggling over the different ways of data integration and still could not figure it out.

Don Omondi

unread,
Oct 22, 2018, 11:51:40 AM10/22/18
to JanusGraph users
First, I totally agree, bulk loading is still quite the challenge even today. And I wouldn't just say that its specific to newbies only, even experienced devs find it challenge especially when given different datasets and/or formats.

That said, I don't think we can leave it to the responsibilities of the core devs. Since it's a community problem, if we the community come up with a generalized solution for most use cases, it will receive support from core team and perhaps can be included in some parts of the docs.

Hugegraph seems quite interesting, starred it to check out properly later. If you have experience with it and I see it has similarities with JanusGraph, prehaps you can start on the aforementioned community solution and we'll help :)

dengzim...@gmail.com

unread,
Oct 24, 2018, 7:28:35 AM10/24/18
to JanusGraph users
if you are using cassandra as your storage backend, i believe this will help you: https://github.com/dengziming/janusgraph-util

ivangarci...@gmail.com

unread,
Nov 14, 2019, 9:48:11 AM11/14/19
to JanusGraph users
Hi how to load batch Janusgraph to a especific graphname?
Reply all
Reply to author
Forward
0 new messages