There are three publications that I'd like to point people to that are interested in the TinkerPop (and general graphdb) scene.
http://engineering.attinteractive.com/2010/12/a-graph-processing-stack/ (external blog)
http://arxiv.org/abs/1004.1001 (accepted book chapter)
http://arxiv.org/abs/1011.0390 (accepted workshop paper)
This weekend was a great weekend for TinkerPop related writings.
Take care everyone,
Marko.
http://markorodriguez.com
http://tinkerpop.com
> "Blueprints, Pipes, Gremlin, and Rexster form a graph processing stack
> that is agnostic to the underlying graph database being used."
>
> So I checked out the blog piece, and hoped you'd have some mention of
> how this approach compares/contrasts/fits with the kind of things
> we're seeing emerge on top of map/reduce. Or more specifically, on top
> of Hadoop, eg. Apache Pig and Hive projects. I think I mentioned
> similar in a slideshare comment recently...
Ricky Ho and I, many moons ago, talked about a map/reduce(/pregel)-style backend w/ Blueprints as the front-end.
http://horicky.blogspot.com/
However, we never got around to doing anything more than go "ooooo...ahhh... neat!... yea... I totally agree, man."
> Can Blueprints & friends sit on top of hadoop, when dealing with super
> large graphs? What kind of scalability are you aiming for? (in dataset
> size, responsiveness, etc...).
That would be stellar to do --- again, Blueprints is like a JDBC, it doesn't care what the backend is, as long as the interfaces are implemented correctly. In fact, in the early days of Blueprints, we had a MongoDB representation [ http://www.mongodb.org/ ]. Unfortunately, at that time, it was very slow because documents in MongoDB are not directly linked. I know MongoDB is being heavily developed and they have a map/reduce model, so perhaps, nowadays, it might be faster for graph related processing... ?
In the short term, our primary desires are:
1. connect more colloquially accepted graph databases: InfiniteGraph, DeX, Sones....
2. connect more with the RDF scene so graphdbs are performant triple/quad stores (see Josh Shinavier's post: http://blog.fortytwo.net/2010/12/16/your-favorite-graph-db-as-a-triple-store ).
Any pushes/resources to go another direction would be gratefully entertained.
Thanks Dan,
Marko.
:) well, I totally agree too
>> Can Blueprints & friends sit on top of hadoop, when dealing with super
>> large graphs? What kind of scalability are you aiming for? (in dataset
>> size, responsiveness, etc...).
>
>
> That would be stellar to do --- again, Blueprints is like a JDBC, it doesn't care what the backend is, as long as the interfaces are implemented correctly. In fact, in the early days of Blueprints, we had a MongoDB representation [ http://www.mongodb.org/ ]. Unfortunately, at that time, it was very slow because documents in MongoDB are not directly linked. I know MongoDB is being heavily developed and they have a map/reduce model, so perhaps, nowadays, it might be faster for graph related processing... ?
I've never looked into MongoDB...
> In the short term, our primary desires are:
>
> 1. connect more colloquially accepted graph databases: InfiniteGraph, DeX, Sones....
Great, wouldn't wnat to distract you from that! I'm happy if I can
treat Blueprints as a nice abstraction for all that stuff.
> 2. connect more with the RDF scene so graphdbs are performant triple/quad stores (see Josh Shinavier's post: http://blog.fortytwo.net/2010/12/16/your-favorite-graph-db-as-a-triple-store ).
The most obvious puzzle here is where SPARQL and Gremlin fit relative
to each other, ...eg. extent to which it's practical, possible and
useful to convert between the query language notations. You might also
look at SPARQL 1.1 before it gets frozen, as it's acquired a property
path language, eg. see
http://www.w3.org/TR/sparql11-property-paths/#complex_paths
I'd love to see some more worked examples showing some query use case
(ideally with a bit of provenance in the problem), then Gremlin and
SPARQL approaches to the same data. I don't see any problem with
having query language other than SPARQL out there, it gives people new
tools and perspectives. What's harder is helping developers understand
which tools to use when.
From the FOAF side of things, I especially like where you're headed
since emphasising the graph structure nicely fits our particular
problem domain. Others working in RDF are using a network data model
but their actual data isn't always so network-oriented. Perhaps those
situations sticking with SPARQL makes more sense? Or can Gremlin
perhaps be used as a SPARQL authoring tool, if you can convert between
the languages?
> Any pushes/resources to go another direction would be gratefully entertained.
Sounds fine to me. Just the Hadoop scene is getting bigger and bigger
so this is just an occasional nudge to encourage you folks to think
about building on top of it. Not that I am yet :)
cheers,
Dan
ps. somewhat in this space: http://www.few.vu.nl/~jui200/webpie.html
"WebPIE (Web-scale Parallel Inference Engine) is a MapReduce
distributed RDFS/OWL inference engine written using the Hadoop
framework. This engine applies the RDFS and OWL ter Horst rules and it
materializes all the derived statements."
...seems to be built with Sesame, if that gives you any interop advantage...
Still, it is hard to pick and match pieces seamlessly, we probably
need some experience and "tinkering" to get it right and usable.
Cheers,
/peter neubauer
GTalk: neubauer.peter
Skype peter.neubauer
Phone +46 704 106975
LinkedIn http://www.linkedin.com/in/neubauer
Twitter http://twitter.com/peterneubauer
http://www.neo4j.org - Your high performance graph database.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
There was one more post this weekend! If anyone's interested in a little better explanation of what Pacer is all about, I've blogged it over here:
http://ofallpossibleworlds.wordpress.com/2010/12/19/introducing-pacer/
Cheers,
Darrick
Cheers,
/peter neubauer
GTalk: neubauer.peter
Skype peter.neubauer
Phone +46 704 106975
LinkedIn http://www.linkedin.com/in/neubauer
Twitter http://twitter.com/peterneubauer
http://www.neo4j.org - Your high performance graph database.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
I want Tinkerpop on top of Hadoop. I want a unified system for graph
processing from Hadoop at scale in the back end, to on-line graphs
sending JSON data to web browsers, and everything in between. This
would seriously amplify Tinkerpop's value.
---------- Forwarded message ----------
From: Peter Neubauer <peter.neuba...@neotechnology.com>
Date: Dec 20 2010, 10:15 am
Subject: Graphs, graphs, and unfortunately, more graphs...
To: Gremlin-users
Nice work Darrick,
looks very handy, I like the crossover of JRuby and the Xpath syntax,
and that it is a very focused approach to only the traversal part.
Very cool!
Cheers,
/peter neubauer
GTalk: neubauer.peter
Skype peter.neubauer
Phone +46 704 106975
LinkedIn http://www.linkedin.com/in/neubauer
Twitter http://twitter.com/peterneubauer
http://www.neo4j.org - Your high performance graph
database.http://www.thoughtmade.com- Scandinavia's coolest Bring-a-
Thing party.
On Mon, Dec 20, 2010 at 5:58 PM, Darrick Wiebe
<darr...@innatesoftware.com> wrote:
> Hey,
> There was one more post this weekend! If anyone's interested in a little better explanation of what Pacer is all about, I've blogged it over here:
> http://ofallpossibleworlds.wordpress.com/2010/12/19/introducing-pacer/
> Cheers,
> Darrick
> On 2010-12-20, at 10:33a, Marko Rodriguez wrote:
>> Hi,
>> There are three publications that I'd like to point people to that are interested in the TinkerPop (and general graphdb) scene.
>> http://engineering.attinteractive.com/2010/12/a-graph-processing-stack/(external blog)
>> http://arxiv.org/abs/1004.1001(accepted book chapter)
>> http://arxiv.org/abs/1011.0390(accepted workshop paper)
I want Tinkerpop on top of Hadoop. I want a unified system for graph
processing from Hadoop at scale in the back end, to on-line graphs
sending JSON data to web browsers, and everything in between. This
would seriously amplify Tinkerpop's value.
HBase is a bit more than a key value store.
But I don't really think HBase is a robust enough data store for
multi-relational graphs. I am not even sure map-reduce is a robust
enough framework to handle graph traversal. Appears a bit of a square
peg-round hole to me.
Marko?
-Daniel
In practice, who wants the diameter of a large graph? Basically
nobody, which is why MR graph processing is so handy.