Reading and storing graph data in AWS neptune

Aditya Srivastava

unread,

Feb 7, 2020, 10:10:56 AM2/7/20

to Gremlin-users

Hi All, Since aws neptune doesn’t support I/O methods like tinkerpop, are you guys aware of any other way to store graph data and then reading it later on from .json file. We were using IO in gremlin locally to achieve the same but with this limitation we were wondering how does one go about achieving the same thing in neptune. Ref:https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-gremlin-differences.html#gremlin-api-reference-features

"Neptune does not support the following Gremlin steps:

The Gremlin io( ) Step is not supported in Neptune. For example, the query g.io("graph.xml").read().iterate() would not work with Neptune."

Thanks in advance!!

Kelvin Lawrence

unread,

Feb 8, 2020, 2:38:04 PM2/8/20

to Gremlin-users

Hi Aditya, I try not to get too much into product specific items on this list but here are a few suggestions.

Several Graph databases that support Apache TinkerPop also provide a way to Bulk Load data. Amazon Neptune has such a capability [1], which supports loading of CSV files of nodes and edges. If you export the data from your local graph as GraphML there is a tool [2] that can convert that GraphML to CSV. If you export your local data as GraphSON you will need to convert it to CSV some other way before you can use the bulk loader. It would be pretty simple to write a small script that can do that and there are TinkerPop classes you can use to help parse GraphSON as needed.

To export data as GraphML from your local graph you could do something like this:

g.io('myfile.xml').write()

and then generate CSV using:

graphml2csv -i myfile.xml

If your graph is reasonably small you could also generate a set of addV() and addE() steps in a file and run that file from the Gremlin Console while attached to Neptune but in local mode (ie do ":remote connect" but do not do ":remote console".

If you take that approach, lines in the file you load into the console would be of the form shown below (you can have multiple addE and addV steps on a single line up to a maximum of a few thousand characters).

:> g.addV('person').as('p1').addV('person').as('p2').addE('knows').from('p1').to('p2')

I have used all of the above techniques but I think you will find the bulk loader the most convenient. If you would like to go deeper there is a Neptune specific discussion forum at [3].

I hope this helps,

Cheers

Kelvin

[1] https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html

[2] https://github.com/awslabs/amazon-neptune-tools/tree/master/graphml2csv

[3] https://forums.aws.amazon.com/forum.jspa?forumID=253

Aditya Srivastava

unread,

Feb 10, 2020, 1:50:17 AM2/10/20

to gremli...@googlegroups.com

Thanks a lot Kelvin!! This is very detailed,helpful and exactly what I needed. We will evaluate these options and will see which one suits best.

Right now our graph is small but it will eventually get bigger so it looks like the bulk loader method is the way to go here.

Thanks,

Aditya

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/e4412fe6-3930-4faa-998e-1e6a0a4da889%40googlegroups.com.

Reply all

Reply to author

Forward