[Pipes/Gremlin] Refactoring RFC -- Move Graph Pipes out of Gremlin and into Pipes

41 views
Skip to first unread message

Marko Rodriguez

unread,
May 23, 2013, 12:13:56 PM5/23/13
to gremli...@googlegroups.com
Hello,

When TinkerPop moved to 2.0, Peter Neubauer urged me to move the graph related pipes in Gremlin to Pipes proper and NOT have Pipes just be a generic data flow framework. I said, "no." Now I regret that decision.

Here is what I would like to do. I would like to move all the graph specific pipes in Gremlin to Pipes and have Pipes depend on Blueprints. What does this entail?

1. Gremlin would be very small and would simply be GremlinPipeline and compiler/optimizer utilities.
- for projects like Pacer, there would be no need to depend on Gremlin -- simply Pipes.
2. Pipes would depend on Blueprints and thus, would be able to leverage the enum's in Blueprints (e.g. Compare.EQUAL, etc. etc.).
- right now, I have lots of "Compare c = mapFilter(FilterPipe.Filter)"
3. Remove the PipesFluentPipeline interface as it would simply be GremlinFluentPipeline (much simpler hierarchy).
- if you want to write a Gremlin variant, you simple need to respect GremlinFluentPipeline's interface (which is what is done now, but there is no chain of interface dependencies).

Here is how this would effect people.

1. This would NOT effect general users.
2. This would lightly (to not at all) effect people who implement Gremlin variants (e.g. no more Filter needed, just change to Compare).
3. For those that use Pipes for something totally unrelated to graphs, Pipes would have a graph dependency (Blueprints).
- Do such people exist?
4. For us as TinkerPop developers, this would make our code much simpler and easier to maintain/test.

Your thoughts are more than welcome.

Thanks,
Marko.

http://markorodriguez.com

Luca Garulli

unread,
May 23, 2013, 12:31:30 PM5/23/13
to gremlin-users
+1!

Luca Garulli
CEO at NuvolaBase.com
the Company behind OrientDB




--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



daniel...@gmail.com

unread,
May 23, 2013, 12:53:08 PM5/23/13
to gremli...@googlegroups.com, gremlin-users
3. For those that use Pipes for something totally unrelated to graphs, Pipes would have a graph dependency (Blueprints).
Yep, we do this.  We probably would not update after the refactor, or we would not use the blueprints related stuff.  Why we are in this state is a long story, not ideal, I had always intended adding the graphs back in, but other things have always remained higher priority.

As long as abstract pipe and pipeline don't change too much we should be fine.

Sent from my iPhone

Marko Rodriguez

unread,
May 23, 2013, 1:02:07 PM5/23/13
to gremli...@googlegroups.com
Hi,

Yep, we do this.  We probably would not update after the refactor, or we would not use the blueprints related stuff.  Why we are in this state is a long story, not ideal, I had always intended adding the graphs back in, but other things have always remained higher priority.

Whoa. Pipes without Graphs --- a story I would like to hear.

As long as abstract pipe and pipeline don't change too much we should be fine.

They will not change at all. However PipesFluentPipeline will go away. Thoughts?

Marko.

daniel...@gmail.com

unread,
May 23, 2013, 2:21:14 PM5/23/13
to gremli...@googlegroups.com, gremli...@googlegroups.com
Hey Marko, yah that's fine we don't use it directly.  We do use pipeline.add(Pipe p) and getPipeline() (my syntax may be off but basically the functions that allow you to modify the contents of a pipeline at runtime).  A common pattern for us is 1) setup a pipeline to determine the file structure.
2) construct a pipeline based on what we find in 1.
3) blast away at the input file(s) transforming them into objects/json and then serializing it to File/database.  At the start of the project we would push to Oracle, or GraphDBs, but we could not get them to scale (before Titan), so now we push to raw flat files that we have custom indexing scrips on top of... These are implemented as tinkerpop pipes.
4) users access our indexed files to annotate millions of records per-minute.  They do this through bash scripts that basically wrap pipes.  

So yah, we are happy if our code does not have to change too much, we have hundreds of pipes that extend or use abstract pipe, pipe function /transform pipe function, or pipeline.   We also use identityPipe and several other pipes in the framework.  

We are working on open sourcing it, so you should be able to look at it in the next month or so.  Pipes have been a REAL lifesaver!  Thanks so much for developing them!

Best
Dan

Sent from my iPhone

Marko Rodriguez

unread,
May 23, 2013, 2:35:44 PM5/23/13
to gremli...@googlegroups.com
Hi,

> Hey Marko, yah that's fine we don't use it directly. We do use pipeline.add(Pipe p) and getPipeline() (my syntax may be off but basically the functions that allow you to modify the contents of a pipeline at runtime).

Pipeline is untouched.
AbstractPipe is untouched.
…basically, only new pipes were added (the graph pipes from Gremlin) and blueprints-core was depended on in the Pipes pom.xml.

NOTE: There is no more need for FilterPipe.Filter.EQUALS (etc.) as we can simply use the Query.Compare.EQUALS (etc.). I did a simple replace all "FilterPipe.Filter" to "Query.Compare". This was one of the big headaches we were having as we were mapping between Filter and Compare everywhere. And if we start to move CONTAINS, WITHIN, etc. concepts into Blueprints (geo and full-text), having yet more mappings just gets confusing. By having Pipes depend on blueprints-core, we simplified things and made Gremlin a tiny little codebase -- excluding test cases, Gremlin is 21 classes!

> 3) blast away at the input file(s) transforming them into objects/json and then serializing it to File/database. At the start of the project we would push to Oracle, or GraphDBs, but we could not get them to scale (before Titan), so now we push to raw flat files that we have custom indexing scrips on top of... These are implemented as tinkerpop pipes.

Nice. Glad Titan is doing it for you.

> 4) users access our indexed files to annotate millions of records per-minute. They do this through bash scripts that basically wrap pipes.

Whoa -- very very cool. Sounds like you have an intense system you are running.

> So yah, we are happy if our code does not have to change too much, we have hundreds of pipes that extend or use abstract pipe, pipe function /transform pipe function, or pipeline. We also use identityPipe and several other pipes in the framework.

Crazy -- 100's of pipes. Note that IdentityPipe did move to the root package as its NOT (semantically) a transform, filter, or side-effect pipe, but a degenerate case of all those pipe (since 2.3.0). A small, import change.

> We are working on open sourcing it, so you should be able to look at it in the next month or so. Pipes have been a REAL lifesaver! Thanks so much for developing them!

Thats stellar man. Glad to be of service. Nice to get feedback that what we are doing is helping others.

Take care,
Marko.

http://thinkaurelius.com

Daniel Quest

unread,
May 23, 2013, 5:15:59 PM5/23/13
to gremli...@googlegroups.com
Marko, all sounds good to me, leter-rip

Dan

Sent from my iPad

James Thornton

unread,
May 24, 2013, 4:06:41 AM5/24/13
to gremli...@googlegroups.com
Hi Marko -

I like the idea of separating the libraries out into more composable units. 

Since there has been some debate on where the graph-related pipes should go, maybe they should go in a separate Graph-Pipes library. 

This way you keep Pipes as a generic data flow framework, Gremlin gets smaller, and libraries like Pacer don't need to depend on Gremlin. 

- James
Reply all
Reply to author
Forward
0 new messages