Was hoping that the JS Client libraries abstracts away the websocket protocol from me. It just looks like the client is broken - because even the sample on the documentation doesn't work. I was running short of time and didn't easily grok the docs on configuring the logger. Too much noise on the console with the default configuration.
I did end up getting the ingestion through via NodeJS + REST. Here's what helped.
* you can batch commands in a gremlin query - just make sure you end each line with .next(); If not, you would observe only the results of the last statement in the batch.
* Used a queue to limit concurrent REST calls. Also helped with retry with occasional failures during bursts. Had to throttle the queue to avoid running out of memory with large graphs.
* Increase the heap available to the gremlin server - JAVA_OPTIONS="-Xms512m -Xmx4096m"
* Increase the script timeout - as the graph grows larger, operations may take longer. Update server-config.yaml with
scriptEvaluationTimeout: 120000
Tinkergraph has it's limitations; at some point, it runs out of memory; you would see some GC errors when it approaches the breaking point.
Against AWS Neptune, i was able to use the same script to upload ~3M verts and ~8M edges with the following modifications
* Neptune does NOT support gremlin variables - so I had to rewrite the gremlin queries accordingly to be single line
* Neptune does NOT support certain clauses - check out their page on Gremlin deviations. It would save you a load of time.
* Neptune errors out if the value in has() clause is blank. Same for setProperty() too I think. So had to add additional checks.
All said and done, a bulk API out of the box would have been much better. Neptune has one; yet to try that one out.
Thanks,
Gishu