[Publications] Graphs, graphs, and unfortunately, more graphs...

Marko Rodriguez

unread,

Dec 20, 2010, 10:33:16 AM12/20/10

to gremlin-users

Hi,

There are three publications that I'd like to point people to that are interested in the TinkerPop (and general graphdb) scene.

http://engineering.attinteractive.com/2010/12/a-graph-processing-stack/ (external blog)
http://arxiv.org/abs/1004.1001 (accepted book chapter)
http://arxiv.org/abs/1011.0390 (accepted workshop paper)

This weekend was a great weekend for TinkerPop related writings.

Take care everyone,
Marko.

http://markorodriguez.com
http://tinkerpop.com

Marko Rodriguez

unread,

Dec 20, 2010, 10:50:14 AM12/20/10

to gremlin-users

Hi Dan,

> "Blueprints, Pipes, Gremlin, and Rexster form a graph processing stack
> that is agnostic to the underlying graph database being used."
>
> So I checked out the blog piece, and hoped you'd have some mention of
> how this approach compares/contrasts/fits with the kind of things
> we're seeing emerge on top of map/reduce. Or more specifically, on top
> of Hadoop, eg. Apache Pig and Hive projects. I think I mentioned
> similar in a slideshare comment recently...

Ricky Ho and I, many moons ago, talked about a map/reduce(/pregel)-style backend w/ Blueprints as the front-end.
http://horicky.blogspot.com/
However, we never got around to doing anything more than go "ooooo...ahhh... neat!... yea... I totally agree, man."

> Can Blueprints & friends sit on top of hadoop, when dealing with super
> large graphs? What kind of scalability are you aiming for? (in dataset
> size, responsiveness, etc...).

That would be stellar to do --- again, Blueprints is like a JDBC, it doesn't care what the backend is, as long as the interfaces are implemented correctly. In fact, in the early days of Blueprints, we had a MongoDB representation [ http://www.mongodb.org/ ]. Unfortunately, at that time, it was very slow because documents in MongoDB are not directly linked. I know MongoDB is being heavily developed and they have a map/reduce model, so perhaps, nowadays, it might be faster for graph related processing... ?

In the short term, our primary desires are:

1. connect more colloquially accepted graph databases: InfiniteGraph, DeX, Sones....
2. connect more with the RDF scene so graphdbs are performant triple/quad stores (see Josh Shinavier's post: http://blog.fortytwo.net/2010/12/16/your-favorite-graph-db-as-a-triple-store ).

Any pushes/resources to go another direction would be gratefully entertained.

Thanks Dan,
Marko.

Dan Brickley

unread,

Dec 20, 2010, 11:08:34 AM12/20/10

to gremli...@googlegroups.com

On Mon, Dec 20, 2010 at 4:50 PM, Marko Rodriguez <okram...@gmail.com> wrote:
> Hi Dan,
>
>> "Blueprints, Pipes, Gremlin, and Rexster form a graph processing stack
>> that is agnostic to the underlying graph database being used."
>>
>> So I checked out the blog piece, and hoped you'd have some mention of
>> how this approach compares/contrasts/fits with the kind of things
>> we're seeing emerge on top of map/reduce. Or more specifically, on top
>> of Hadoop, eg. Apache Pig and Hive projects. I think I mentioned
>> similar in a slideshare comment recently...
>
> Ricky Ho and I, many moons ago, talked about a map/reduce(/pregel)-style backend w/ Blueprints as the front-end.
> http://horicky.blogspot.com/
> However, we never got around to doing anything more than go "ooooo...ahhh... neat!... yea... I totally agree, man."

:) well, I totally agree too

>> Can Blueprints & friends sit on top of hadoop, when dealing with super
>> large graphs? What kind of scalability are you aiming for? (in dataset
>> size, responsiveness, etc...).
>
>
> That would be stellar to do --- again, Blueprints is like a JDBC, it doesn't care what the backend is, as long as the interfaces are implemented correctly. In fact, in the early days of Blueprints, we had a MongoDB representation [ http://www.mongodb.org/ ]. Unfortunately, at that time, it was very slow because documents in MongoDB are not directly linked. I know MongoDB is being heavily developed and they have a map/reduce model, so perhaps, nowadays, it might be faster for graph related processing... ?

I've never looked into MongoDB...

> In the short term, our primary desires are:
>
> 1. connect more colloquially accepted graph databases: InfiniteGraph, DeX, Sones....

Great, wouldn't wnat to distract you from that! I'm happy if I can
treat Blueprints as a nice abstraction for all that stuff.

> 2. connect more with the RDF scene so graphdbs are performant triple/quad stores (see Josh Shinavier's post: http://blog.fortytwo.net/2010/12/16/your-favorite-graph-db-as-a-triple-store ).

The most obvious puzzle here is where SPARQL and Gremlin fit relative
to each other, ...eg. extent to which it's practical, possible and
useful to convert between the query language notations. You might also
look at SPARQL 1.1 before it gets frozen, as it's acquired a property
path language, eg. see
http://www.w3.org/TR/sparql11-property-paths/#complex_paths

I'd love to see some more worked examples showing some query use case
(ideally with a bit of provenance in the problem), then Gremlin and
SPARQL approaches to the same data. I don't see any problem with
having query language other than SPARQL out there, it gives people new
tools and perspectives. What's harder is helping developers understand
which tools to use when.

From the FOAF side of things, I especially like where you're headed
since emphasising the graph structure nicely fits our particular
problem domain. Others working in RDF are using a network data model
but their actual data isn't always so network-oriented. Perhaps those
situations sticking with SPARQL makes more sense? Or can Gremlin
perhaps be used as a SPARQL authoring tool, if you can convert between
the languages?

> Any pushes/resources to go another direction would be gratefully entertained.

Sounds fine to me. Just the Hadoop scene is getting bigger and bigger
so this is just an occasional nudge to encourage you folks to think
about building on top of it. Not that I am yet :)

cheers,

Dan

ps. somewhat in this space: http://www.few.vu.nl/~jui200/webpie.html

"WebPIE (Web-scale Parallel Inference Engine) is a MapReduce
distributed RDFS/OWL inference engine written using the Hadoop
framework. This engine applies the RDFS and OWL ter Horst rules and it
materializes all the derived statements."
...seems to be built with Sesame, if that gives you any interop advantage...

Peter Neubauer

unread,

Dec 20, 2010, 11:19:24 AM12/20/10

to gremlin-users

Totally agree Dan,
I think we need to be even more crisp on when to use what tool for the
job - it might even turn out that there are just a few pieces from
each tool that should be taken - that's the point of Tinkerpop. For
instance, from Gremlin I mostly like the Pipes-framework and the
XPath-style selection expression. The rest is probably possible to be
expressed in any good scripting language.

Still, it is hard to pick and match pieces seamlessly, we probably
need some experience and "tinkering" to get it right and usable.

Cheers,

/peter neubauer

GTalk: neubauer.peter
Skype    peter.neubauer
Phone    +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter http://twitter.com/peterneubauer

http://www.neo4j.org - Your high performance graph database.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.

Darrick Wiebe

unread,

Dec 20, 2010, 11:58:05 AM12/20/10

to gremli...@googlegroups.com

Hey,

There was one more post this weekend! If anyone's interested in a little better explanation of what Pacer is all about, I've blogged it over here:

http://ofallpossibleworlds.wordpress.com/2010/12/19/introducing-pacer/

Cheers,
Darrick

Peter Neubauer

unread,

Dec 20, 2010, 12:15:17 PM12/20/10

to gremlin-users

Nice work Darrick,
looks very handy, I like the crossover of JRuby and the Xpath syntax,
and that it is a very focused approach to only the traversal part.
Very cool!

Cheers,

/peter neubauer

GTalk: neubauer.peter
Skype    peter.neubauer
Phone    +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter http://twitter.com/peterneubauer

http://www.neo4j.org - Your high performance graph database.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.

rjurney

unread,

Jun 3, 2011, 12:05:58 AM6/3/11

to gremli...@googlegroups.com

Reopening this thread.

I want Tinkerpop on top of Hadoop. I want a unified system for graph
processing from Hadoop at scale in the back end, to on-line graphs
sending JSON data to web browsers, and everything in between. This
would seriously amplify Tinkerpop's value.

---------- Forwarded message ----------
From: Peter Neubauer <peter.neuba...@neotechnology.com>
Date: Dec 20 2010, 10:15 am
Subject: Graphs, graphs, and unfortunately, more graphs...
To: Gremlin-users

Nice work Darrick,
looks very handy, I like the crossover of JRuby and the Xpath syntax,
and that it is a very focused approach to only the traversal part.
Very cool!

Cheers,

/peter neubauer

GTalk: neubauer.peter
Skype    peter.neubauer
Phone   +46 704 106975
LinkedIn  http://www.linkedin.com/in/neubauer
Twitter http://twitter.com/peterneubauer

http://www.neo4j.org - Your high performance graph
database.http://www.thoughtmade.com- Scandinavia's coolest Bring-a-
Thing party.

On Mon, Dec 20, 2010 at 5:58 PM, Darrick Wiebe

<darr...@innatesoftware.com> wrote:
> Hey,

> There was one more post this weekend! If anyone's interested in a little better explanation of what Pacer is all about, I've blogged it over here:

> http://ofallpossibleworlds.wordpress.com/2010/12/19/introducing-pacer/

> Cheers,
> Darrick

> On 2010-12-20, at 10:33a, Marko Rodriguez wrote:

>> Hi,

>> There are three publications that I'd like to point people to that are interested in the TinkerPop (and general graphdb) scene.

>> http://engineering.attinteractive.com/2010/12/a-graph-processing-stack/(external blog)
>> http://arxiv.org/abs/1004.1001(accepted book chapter)

>> http://arxiv.org/abs/1011.0390(accepted workshop paper)

Marko Rodriguez

unread,

Jun 4, 2011, 12:23:02 AM6/4/11

to gremli...@googlegroups.com

Hey,

I want Tinkerpop on top of Hadoop. I want a unified system for graph
processing from Hadoop at scale in the back end, to on-line graphs
sending JSON data to web browsers, and everything in between. This
would seriously amplify Tinkerpop's value.

I want chocolate covered gummy bears.

Check out: https://github.com/dgreco/graphbase

Hope that tickles your fancy,

Marko.

http://markorodriguez.com

Russell Jurney

unread,

Jun 4, 2011, 12:45:02 AM6/4/11

to gremli...@googlegroups.com

That is cool, but I want Pipes on Hadoop, not a key value store. That I already got with Voldemort.

Russell Jurney

twitter.com/rjurney

russell...@gmail.com

datasyndrome.com

Daniel Quest

unread,

Jun 5, 2011, 8:59:43 AM6/5/11

to gremli...@googlegroups.com

Hmm,

HBase is a bit more than a key value store.

But I don't really think HBase is a robust enough data store for
multi-relational graphs. I am not even sure map-reduce is a robust
enough framework to handle graph traversal. Appears a bit of a square
peg-round hole to me.

Marko?

-Daniel

Pavel Yaskevich

unread,

Jun 5, 2011, 9:03:19 AM6/5/11

to gremli...@googlegroups.com

I concur with Daniel on this.

Regards, Pavel.

Alex Averbuch

unread,

Jun 5, 2011, 9:07:56 AM6/5/11

to gremli...@googlegroups.com

+ 1 on agreeing with Daniel.

As an exercise, try calculating the diameter of a graph in Map Reduce - pseudo code "implementation" is enough to get the point across... for those types of graph algorithms Map Reduce transmits an unholy amount of state between phases and stages of a job.

Gary Berger (gaberger)

unread,

Jun 5, 2011, 3:49:59 PM6/5/11

to gremli...@googlegroups.com, gremli...@googlegroups.com

+1. We are in the realm of BSP here and Map/Reduce is not the right pattern.

Sent from my iPhone

rjurney

unread,

Jun 6, 2011, 10:32:54 PM6/6/11

to Gremlin-users

In practice, who wants the diameter of a large graph? Basically
nobody, which ismwhy MR graph processing is so handy.
.

On Jun 5, 6:07 am, Alex Averbuch <alex.averb...@gmail.com> wrote:
> + 1 on agreeing with Daniel.
>
> As an exercise, try calculating the diameter of a graph in Map Reduce -
> pseudo code "implementation" is enough to get the point across... for those
> types of graph algorithms Map Reduce transmits an unholy amount of state
> between phases and stages of a job
>
>
>
> On Sun, Jun 5, 2011 at 3:03 PM, Pavel Yaskevich <pove...@gmail.com> wrote:
> > I concur with Daniel on this.
>
> > Regards, Pavel.
>
> > On söndag den 5 juni 2011 at 15.59, Daniel Quest wrote:
>
> > Hmm,
>
> > HBase is a bit more than a key value store.
>
> > But I don't really think HBase is a robust enough data store for
> > multi-relational graphs. I am not even sure map-reduce is a robust
> > enough framework to handle graph traversal. Appears a bit of a square
> > peg-round hole to me.
>
> > Marko?
>
> > -Daniel
>
> > On Sat, Jun 4, 2011 at 12:45 AM, Russell Jurney
> > <russell.jur...@gmail.com> wrote:
>
> > That is cool, but I want Pipes on Hadoop, not a key value store. That I
> > already got with Voldemort.
> > Russell Jurney
> > twitter.com/rjurney

> > russell.jur...@gmail.com
> > datasyndrome.com

Alex Averbuch

unread,

Jun 7, 2011, 8:14:51 AM6/7/11

to gremli...@googlegroups.com

To be honest I don't know, maybe nobody, but it was the simplest example I could come up with at the time that would make my point.

MR _is_ applicable to the processing of large graphs.

MR _is_ being used to process the largest graphs in the world.

MR _is_ open source, performant, and a good tool for _batch_ processing of large graphs.

But... that doesn't make it ideal for graph processing.

MR _does_ transmit a lot of graph state between stages/phases of jobs. Because it's basically a distributed functional programming language, it's stateless... in a sense.

MR is _not_ well suited to interactive querying of graphs (be they large or small)... the latency is too high.

Last weekend we had a cocktail party at our flat. Because we don't have a blender at the moment, I crushed the ice using a hammer. It did the job, but a blender would've likely made the job easier and faster.

On Tue, Jun 7, 2011 at 4:32 AM, rjurney <russell...@gmail.com> wrote:

In practice, who wants the diameter of a large graph? Basically

nobody, which is why MR graph processing is so handy.

Marko Rodriguez

unread,

Jun 7, 2011, 11:26:04 AM6/7/11

to gremli...@googlegroups.com

Hey guys,

Russ, you might be interested in:

http://www.cs.cmu.edu/~pegasus/

Thanks,

Marko.

http://markorodriguez.com

Marko Rodriguez

unread,

Jun 7, 2011, 11:29:27 AM6/7/11

to gremli...@googlegroups.com

Also, Russ, you might want to check out:

http://www.cse.usf.edu/~anda/CIS6930-S11/papers/graph-processing-w-mapreduce.pdf

See ya,

Marko.

http://markorodriguez.com

On Jun 7, 2011, at 6:14 AM, Alex Averbuch wrote:

Reply all

Reply to author

Forward