I've written a sharded entity/triple-store in Accumulo and I'm adding graph traversal componentry to it currently. The column-oriented design of Accumulo seems to align very well with the property graph and edge traversal model.
I'm most interested in running path traversal algorithms where I'm starting a node and finding all possible paths that exist to another node. The interfaces for blueprints are very clean and well thought out.
Before I can start using this to solve real problems, there are two things of which I am currently concerned. I'm using tinkerpop 2.5.0 so it'd be great if the new blueprints3 had fixes for this (I'd also be willing to work on fixes for the baseline if necessary).1. Unless I'm missing something, My intuition is telling me that gremlin crawls the graph from isolated node to isolated node instead of batching level to level. In order words, I notice that gremlin will visit a node, find all the edges that match some filter, then visit each node on the other end of the edges and repeat the process given the pipeline conditions that have been specified. If I'm correct that it works this way, that means I'm making a lot of small calls down to Accumulo batch scanners. I've sharded my data so the scans are very fast, but they are also resource intensive and making a lot of small scans isn't as optimal as making a couple large ones. Does gremlin have any possibility of doing batching of edges/vertices throughout the propagation of levels in the graphs? It would be really easy to implement in Accumulo the logic where I could start at a vertex, find all the edges, grab all the vertices on the other side of the edges in one large batch (or partition and batch in the case where # of edges may be an unscalable sie), then grab all the matching edges for those vertices in a batch and repeat. I could easily implement this logic on my own for more custom analytics but i like the ease of use that gremlin brings.
2. Accumulo's base client for performing scans needs to be closed when it is done being used. Specifically in the case where I'm trying to talk to many tablets in parallel, it can be resource intensive and can very quickly overload a client if the Scanners aren't being closed when they are done. I've written an AutoCloseIterable that I can decorate the scanners with that will close them upon the last item being iterated, but I'm not sure if gremlin guarantees that every Iterable<Edge> and Iterable<Vertex> traversed will be iterated fully. In fact,
Awesome on Faunus and the Gremlin OLAP! I believe I saw you speaking about this on an interview you did for YOW!2012. Backing the Gremlin OLAP / Blueprints3 graph with Accumulo would be perfect for my needs- if it could be possible. What do you think?
I've checked out the HBase/Accumulo back-ends for Titan and my main complaint thus far seems to be that Accumulo's back-end was a direct copy from the HBase back-end (which seems to be lacking the ability to take advantage of Accumulo's added features). Specifically, I take advantage here of Accumulo's ability to run unions/intersections of properties on the server-side via an iterator stack that allows custom code to seek through the files on each tablet at their whim. This significantly speeds up query time and reduces the need for any joins or post-process filtering to occur on the client.
Is there any intention for Blueprints3 to support CloseableIterable in the future? My AutoCloseIterable does just what you've mentioned but resources can still leak if I can't guarantee full iteration in every case. I'll use Blueprints2's CloseableIterable for now.
Thanks again!
On Sunday, February 2, 2014 10:25:54 AM UTC-5, Marko A. Rodriguez wrote:Hi,I've written a sharded entity/triple-store in Accumulo and I'm adding graph traversal componentry to it currently. The column-oriented design of Accumulo seems to align very well with the property graph and edge traversal model.Great. Have you seen, by chance, http://titan.thinkaurelius.com ?I'm most interested in running path traversal algorithms where I'm starting a node and finding all possible paths that exist to another node. The interfaces for blueprints are very clean and well thought out.Thank you.Before I can start using this to solve real problems, there are two things of which I am currently concerned. I'm using tinkerpop 2.5.0 so it'd be great if the new blueprints3 had fixes for this (I'd also be willing to work on fixes for the baseline if necessary).1. Unless I'm missing something, My intuition is telling me that gremlin crawls the graph from isolated node to isolated node instead of batching level to level. In order words, I notice that gremlin will visit a node, find all the edges that match some filter, then visit each node on the other end of the edges and repeat the process given the pipeline conditions that have been specified. If I'm correct that it works this way, that means I'm making a lot of small calls down to Accumulo batch scanners. I've sharded my data so the scans are very fast, but they are also resource intensive and making a lot of small scans isn't as optimal as making a couple large ones. Does gremlin have any possibility of doing batching of edges/vertices throughout the propagation of levels in the graphs? It would be really easy to implement in Accumulo the logic where I could start at a vertex, find all the edges, grab all the vertices on the other side of the edges in one large batch (or partition and batch in the case where # of edges may be an unscalable sie), then grab all the matching edges for those vertices in a batch and repeat. I could easily implement this logic on my own for more custom analytics but i like the ease of use that gremlin brings.Please see http://faunus.thinkaurelius.com which will have most of its functionality merged into TinkerPop3 as Gremlin OLAP. You can read more about Gremlin OLAP on this list here:2. Accumulo's base client for performing scans needs to be closed when it is done being used. Specifically in the case where I'm trying to talk to many tablets in parallel, it can be resource intensive and can very quickly overload a client if the Scanners aren't being closed when they are done. I've written an AutoCloseIterable that I can decorate the scanners with that will close them upon the last item being iterated, but I'm not sure if gremlin guarantees that every Iterable<Edge> and Iterable<Vertex> traversed will be iterated fully. In fact,It does not. We have CloseableIterable in Blueprints2, but not in Blueprints3… One way people have gotten around this is that they make their iterator implementation check if there is hasNext() after the next() and if not, close().Thanks for your contributions to the graph space Corey,Marko.
--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
No. In Gremlin, people usually do not reference their iterators (Gremlin is one big iterator). They typically iterate their data and move onto the next query (asking for a close() is too much to ask). In situations where they only iterated a subset of the data, they would need reference to the GraphQuery/VertexQuery iterator to close it and these objects are typically not "seen" when working with Gremlin. Is there a way that you could solve this without having explicit iterator shutdown at the Blueprints level? Thoughts from others?
--
You received this message because you are subscribed to a topic in the Google Groups "Gremlin-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gremlin-users/tOfjbwOnky8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gremlin-user...@googlegroups.com.
The way Titan does this is registering the iterator with the transactional context and closing all open ones when the transaction is committed or rolled back.
You received this message because you are subscribed to a topic in the Google Groups "Gremlin-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gremlin-users/tOfjbwOnky8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gremlin-user...@googlegroups.com.