How to upload rdf bulk data to janus graph

63 views
Skip to first unread message

Arpan Jain

unread,
Dec 24, 2020, 5:24:10 AM12/24/20
to JanusGraph users
I have data in RDF(ttl) format. It is having around 6 million triplets. Currently, I have used rdf2gremlin python script for this conversion but it's taking to much time i.e. for 10k records it took around 1 hour. I am using Scylla DB as a Janus graph backend. Below is the python code I am using. 

from rdf2g import setup_graph
DEFAULT_LOCAL_CONNECTION_STRING = "ws://localhost:8182/gremlin"
g = setup_graph(DEFAULT_LOCAL_CONNECTION_STRING) 
 import rdflib 
import pathlib
OUTPUT_FILE_LAM_PROPERTIES = pathlib.Path("path/to/ttl/file/.ttl").resolve() 
rdf_graph = rdflib.Graph() 
rdf_graph.parse(str(OUTPUT_FILE_LAM_PROPERTIES), format="ttl") 

Same RDF data in neo4j is taking around only 10 mins to load the whole data. But I want to use the Janus graph.

Kindly suggest to me the best way to upload bulk RDF data to Janus graph using python or java.

alexandr...@gmail.com

unread,
Dec 24, 2020, 5:35:50 AM12/24/20
to JanusGraph users
Hi,

Try to enable batch loading: "storage.batch-loading=true".
Increase your batch mutations buffer: "storage.buffer-size=20480".
Increase ids block size: "ids.block-size=10000000".
Not sure if your flows just adds or upserts data. In case it upserts you may also set "query.batch=true".
That said, I didn't use rdf2gremlin and can't suggest much. Above configurations are just options which I can immediately think of. Of course a proper investigation should be done to suggest performance improvement. You may additionally optimize your ScyllaDB for your use cases. 

Best regards,
Oleksandr

Arpan Jain

unread,
Dec 24, 2020, 6:43:42 AM12/24/20
to janusgra...@googlegroups.com
All these properties I need to set in the Janusgraph properties file right? I mean the config on which the server is starting. I mean the file where we set the backend storage and host etc.

--
You received this message because you are subscribed to a topic in the Google Groups "JanusGraph users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/janusgraph-users/Buk0hjlxVOs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/7a341919-78a4-48d2-9380-100f827803e1n%40googlegroups.com.

alexandr...@gmail.com

unread,
Dec 24, 2020, 6:44:21 AM12/24/20
to JanusGraph users
That's right

Arpan Jain

unread,
Dec 25, 2020, 5:29:13 AM12/25/20
to janusgra...@googlegroups.com
Actually I have around 70 fields. So my doubt is - whether is it possible to insert so data without bulk upload so that Janus graph will create it's own schema and letter for remaining data I will use bulk upload true.
Will this process give error?

You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/ddb3eb4d-3fe2-4a4e-9c34-4a76476af7c2n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages