Blueprints would still make sense, because it would allow accessing
the api of graphdbs, pipes would allow vendors to implement pipes
efficiently but still keeping composability.
The design would still be completely transparent without weird callbacks.
--
Claudio Martella
claudio....@gmail.com
You are very right about the object creation & garbage collection
overheads, I remember, java.io.File.list() is many times faster than
java.io.File.listFiles().
regards,
Rawjeev.
> www.orientechnologies.com <http://www.orientechnologies.com>
>
These are all very nice ideas. I have two comments.
----------------------
1. I will create a benchmark for doing a traversal using raw Neo4j and using Blueprints to see what the speed differences are. That is, to get a quantitative idea regarding how inefficient the "object wrapping" model that Blueprints employs is.
2. Instead of doing this seemingly complicated work around Blueprints (i.e. callbacks), it might be a good idea for graph database provides to provide native implementations of Blueprints. That is, for example in OrientDB, OGraphVertex implements Vertex. This way, there is no object wrapping and vendors can implement getOutEdges(), getInEdges(), etc. etc. as they see fit.
----------------------
I fear changing the flow of Gremlin --> Pipes --> Blueprints ---> GraphDB. I fear this because it feels like a workaround hack that simply drives the graph database vendors code further up the TinkerPop stack instead of being nicely abstracted by Blueprints. Moreover, it then yields a "free for all" on what Pipe operations exist for which vendors. This obviously then will bleed into Gremlin and could pollute Gremlin by yielding vendor specific code.
As such, I feel #2 is the best solution. If the vendors simply implement Blueprints by making their respective objects (e.g. Node in Neo4j, OGraphVertex in OrientDB) implement the Blueprints interfaces then two excellent things happen:
1. Blueprints is no longer the holder of all the Blueprints implementation code. Neo4j and OrientDB will simply release as "Blueprints-enabled."
2. There will no longer exist the "object wrapping" model as Blueprints interfaces are native to the underlying graphDB. Thus, seemingly much more efficient.
From there, working with the graph database vendors to ensure that the Blueprints interfaces have all the methods they need for efficient evaluation of traversals would come next. This will then alter which Pipes exist, and thus, what Gremlin looks like. This, of course, is a delicate matter that needs to balanced between all vendors so there isn't a rampart growth of methods/interfaces that make Blueprints difficult to adopt and comprehend.
Thanks,
Marko.
>> GREMLIN -> PIPES -> BLUEPRINTS -> OrientDB API
>>
how do you see the idea of allowing vendors to implement pipes like
they implement now blueprints?
--
Claudio Martella
claudio....@gmail.com
> how do you see the idea of allowing vendors to implement pipes like
> they implement now blueprints?
If the vendors implement Blueprints "native" (that is, e.g., neo4j.Node implements Vertex), then there would be no need to implement their own Pipes. The desire to implement new pipes would be more of a desire to add new methods to Blueprints.
Again, the trick to all this is to not let the vendors ride *up* the stack or else it will be a versioning/API nightmare. My counter solution to Luca's is to push TinkerPop further *down* the stack where Blueprints is native to the vendor's object system.
Thanks,
Marko.
I performed a benchmark comparing Neo4j raw (Node, Relationship) and Blueprints Neo4jGraph (Vertex, Edge) to see if Blueprints object wrapping is in fact causing performance problems.
EXPERIMENT: Using the GratefulDead graph (809 vertices, 8049 edges), for each vertex in the graph, I traverse to a depth of 3. This touches 29,601,779 elements.
SUMMARY: Neo4j (Raw) takes, on average, 5.6 seconds to touch 29.6 million elements. Neo4jGraph (Blueprints) takes, on average, 6.0 seconds to touch 29.6 million elements.
Here are the results of the experiment. Attached is the source code of the experiment.
------------------------------------------------------------------------------------
NEO4J RAW --- GraphDatabase/Node/Relationship
Testing testNeo4jRaw...
EmbeddedGraphDatabase [/tmp/blueprints_test]: 29601779 Neo4j raw elements touched in 8836.18994140625ms
EmbeddedGraphDatabase [/tmp/blueprints_test]: 29601779 Neo4j raw elements touched in 5439.324951171875ms
EmbeddedGraphDatabase [/tmp/blueprints_test]: 29601779 Neo4j raw elements touched in 5510.56787109375ms
EmbeddedGraphDatabase [/tmp/blueprints_test]: 29601779 Neo4j raw elements touched in 5315.5068359375ms
EmbeddedGraphDatabase [/tmp/blueprints_test]: 29601779 Neo4j raw elements touched in 4995.390869140625ms
EmbeddedGraphDatabase [/tmp/blueprints_test]: 29601779 Neo4j raw elements touched in 5152.767822265625ms
EmbeddedGraphDatabase [/tmp/blueprints_test]: 29601779 Neo4j raw elements touched in 5154.794921875ms
EmbeddedGraphDatabase [/tmp/blueprints_test]: 29601779 Neo4j raw elements touched in 5063.9501953125ms
EmbeddedGraphDatabase [/tmp/blueprints_test]: 29601779 Neo4j raw elements touched in 5472.148193359375ms
EmbeddedGraphDatabase [/tmp/blueprints_test]: 29601779 Neo4j raw elements touched in 5215.058837890625ms
Neo4jRaw: 1 Neo4j Raw experiment average time in 5615.570043945312ms
------------------------------------------------------------------------------------
NEO4J BLUEPRINTS --- Neo4jGraph/Vertex/Edge
Testing testNeo4jGraph...
neo4jgraph[EmbeddedGraphDatabase [/tmp/blueprints_test]]: 29601779 Neo4jGraph elements touched in 6556.494140625ms
neo4jgraph[EmbeddedGraphDatabase [/tmp/blueprints_test]]: 29601779 Neo4jGraph elements touched in 5934.42724609375ms
neo4jgraph[EmbeddedGraphDatabase [/tmp/blueprints_test]]: 29601779 Neo4jGraph elements touched in 6072.413818359375ms
neo4jgraph[EmbeddedGraphDatabase [/tmp/blueprints_test]]: 29601779 Neo4jGraph elements touched in 6182.333251953125ms
neo4jgraph[EmbeddedGraphDatabase [/tmp/blueprints_test]]: 29601779 Neo4jGraph elements touched in 6034.69580078125ms
neo4jgraph[EmbeddedGraphDatabase [/tmp/blueprints_test]]: 29601779 Neo4jGraph elements touched in 6133.049072265625ms
neo4jgraph[EmbeddedGraphDatabase [/tmp/blueprints_test]]: 29601779 Neo4jGraph elements touched in 6110.630859375ms
neo4jgraph[EmbeddedGraphDatabase [/tmp/blueprints_test]]: 29601779 Neo4jGraph elements touched in 6059.311767578125ms
neo4jgraph[EmbeddedGraphDatabase [/tmp/blueprints_test]]: 29601779 Neo4jGraph elements touched in 6037.650146484375ms
neo4jgraph[EmbeddedGraphDatabase [/tmp/blueprints_test]]: 29601779 Neo4jGraph elements touched in 5805.708984375ms
Neo4jGraph: 1 Neo4jGraph experiment average time in 6092.6715087890625ms
Thanks,
Marko.
The idea of building pipes on blueprints brings to a necessary loss of
signal. You go from the semantics of a traversal to the semantics of
object retrieval. This way the graphdb/blueprints has to behave as
simple as possible.
--
Claudio Martella
claudio....@gmail.com
I actually don't think neo4j would gain much from that pipe as neo4j
has to scan through the whole adjancency list anyway to extract
certain edges (the edges are not grouped by label at storage level).
So, to clarify my argument, I just mean that where, at storage level,
looking up an Element costs more than keep on going through a path by
following a pointer (so splitting and resuming the traversal per step
costs more than not splitting it), the implementation of pipes by
vendors would be more efficient. But then you might argue that it's
their design's fault as they're not designing on graphs efficiently.
And I'd agree with you :)
--
Claudio Martella
claudio....@gmail.com