Gremlin Python Beginner's Graph

123 views
Skip to first unread message

ypanc...@relpro.com

unread,
Jun 18, 2018, 4:46:57 PM6/18/18
to Gremlin-users
Hi, 
I am just getting started with Gremlin and graph databases. I want to build an application where my graphs will be stored and I can modify it as and when required. I am going to use Python for it. I have installed apache Tinkerpop's Gremlin-server 3.3.3. Can anyone tell me if I need anything else ? 
 

Stephen Mallette

unread,
Jun 19, 2018, 8:35:40 AM6/19/18
to Gremlin-users
You need to choose a graph database to use to store your data. TinkerGraph is a good place to start since you are just beginning. Once you have the basics figured out, you can move on to a more advanced graph database.

On Mon, Jun 18, 2018 at 4:46 PM <ypanc...@relpro.com> wrote:
Hi, 
I am just getting started with Gremlin and graph databases. I want to build an application where my graphs will be stored and I can modify it as and when required. I am going to use Python for it. I have installed apache Tinkerpop's Gremlin-server 3.3.3. Can anyone tell me if I need anything else ? 
 

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/a42fdaaf-96a8-4048-937c-78b9dbdb5e31%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ypanc...@relpro.com

unread,
Jun 19, 2018, 10:39:41 AM6/19/18
to Gremlin-users
Hi Stephen, 
Thank you. Which one would be the more advanced graph database ? 

Stephen Mallette

unread,
Jun 19, 2018, 10:45:32 AM6/19/18
to Gremlin-users
TinkerGraph is just an in-memory graph database and depending on what you are doing, it may be good enough to suit your use case. I only said "more advanced" in that TinkerGraph is really easy to get started with while other graphs may require more configuration to begin with. The full list of TinkerPop-enabled graphs (that we know of) can be found here:


Each will have it's own benefits and drawbacks that you'll need to consider, but that is the beauty of TinkerPop, in that you can generally switch between these different systems without a ton of pain.

reinhard...@gmail.com

unread,
Jun 26, 2018, 5:43:47 AM6/26/18
to Gremlin-users
@ Stpehen Malette

On Tuesday, June 19, 2018 at 4:45:32 PM UTC+2, Stephen Mallette wrote:
TinkerGraph is just an in-memory graph database and depending on what you are doing, it may be good enough to suit your use case...

(emphasis mine)

I'm trying to understand the notion of good enough.

For TinkerGraph, are there any tests or use cases of reasonable limits in regard to the number of vertices/edges/properties and the size of vertices/edges/properties (in Bytes) on a 4 GB or a 16 GB main-memory machine, i.e. 1 million nodes and 10 Million egdes with 1000 Bytes per node/edge? For a comparison: A 500-page ASCII-text-only uses roughly 1 million bytes (500 pages x 2000 characters/page, with 1 page = 40 lines x 50 characters/line).

Reinhard
Reinhard
 

Stephen Mallette

unread,
Jun 26, 2018, 6:11:24 AM6/26/18
to Gremlin-users
I only know of TinkerGraph being limited by memory and features (data persisted on change but only on close, not fully thread-safe and no transactions). I mean, it's been a while but I've used TinkerGraph at 10M edges before. I had a static graph that didn't change and I only did read type analysis on it - TinkerGraph, fastest graphdb there is in that situation.

There is no cost to you however to just test your use case. That's the beauty of TinkerPop, right? Write your code to load TinkerGraph with whatever data you will have. If its unsatisfactory in some way, just change your configuration up a bit and use something else. If your scale is in that 10M edge mark then Neo4j is your next likely candidate or perhaps JanusGraph with berkelydb as the backend. No matter the graph you choose, I don't recommend loading data via python. Just write a simple Gremlin script in Groovy and run it in the console. Its easier and likely faster. I'd also add, that if you do go to another graph besides TinkerGraph, you should look into the bulk loading facilities of the graph that you choose if you want the fastest load speed. That's another reason why I think a Groovy script is best as you can more easily convert your code to work with those bulk loaders typically.



Robert Dale

unread,
Jun 26, 2018, 8:46:24 AM6/26/18
to gremli...@googlegroups.com


I'm actually going to try it today.  I'll post the results later.

Robert Dale


reinhard...@gmail.com

unread,
Jun 26, 2018, 8:50:24 AM6/26/18
to Gremlin-users
@Stephen Malette

Thanks for your fast reply (and sorry for the typo in your name!).

I was just wondering if there are some known, well-tested limits using TinkerGraph with a 4GB memory machine.

The application I'm working on contains around 100K (not M) nodes and around 500K edges, but a 1 to 5 properties with longer texts (1+K bytes, think of entries in an encyclopedia) plus some additional shorter properties with several external links. It is a typical one-user desktop or in-browser app that is accessing a local data store. I'll be using Qt/PyQt or Vue.js for the GUI frontend and TinkerPop/TinkerGraph for the backend. My glue code will be written in Python. As the user interacts with the data through a GUI (adding, deleting, changing data), using Groovy and the Gremlin console is out of question.

Starting and stopping the server as well as loading and saving the graph will be done by the application in the background.  So it will be totally transparent to the user.

Your answer suggests that this is doable with TinkerGraph.
Thanks again!

Reinhard

Robert Dale

unread,
Jun 26, 2018, 8:01:40 PM6/26/18
to gremli...@googlegroups.com
I saw a 37% heap memory reduction and a slight performance improvement with ShiftLeft Vs. TinkerGraph.
 
Loading 6,899,377 lines of a data set of tower,device association.  Creates 1,524,870 vertices, 6,899,377 edges.
Towers created with 'g.addV("tower").property("towerId", towerId)' and then cached for future reference.
Devices created with 'g.addV("device").property("deviceId", deviceId)' and then cached for future reference.
Edges created with g.V(tower).addE("link").to(__.V(device)).iterate();

TinkerGraph: 107s
Specialized:  103s

Running shortest path for 267 towers to select 10 devices:
Setup:
// mark 10 random devices
g.V().hasLabel("device").sample(10).property("target", true)
// create edges between transitively connected towers to reduce paths
g.V().hasLabel("tower").as("a").local(__.out("link").in("link").dedup().as("b").where("a", P.neq("b"))
                .coalesce(__.where(__.out("conn").as("a")), __.addE("conn").from("a").to("b"))).iterate()
Test:
// shortest path for each tower to any target 
g.V().hasLabel("tower").toList().stream().map(it -> {
           g.V(it).coalesce(__.out().has("ref", true),
                            __.repeat(__.both("conn").where(P.without("t")).store("t").simplePath())
                                    .until(__.out().has("ref", true)))
                    .hasNext();
} [...]

Shortest Path Sequential
TinkerGraph: 41s
Specialized: 36s

Shortest Path Parallel Stream:
TinkerGraph: 10s
Specialized: 10s

Heap Memory:
TinkerGraph: 2.65 gb
Specialized: 1.68 gb

I put an explicit System.gc() and sleep(5s) between the loading and the analysis to see if there was any difference in memory usage. Nothing notable. 

GC Reports:

Robert Dale


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages