OLTP vs. OLAP

514 views
Skip to first unread message

Marko Rodriguez

unread,
Mar 4, 2016, 7:57:09 AM3/4/16
to gremli...@googlegroups.com, d...@tinkerpop.incubator.apache.org
Hi,

Robin Schumacher at DataStax is trying to dispel confusions around the difference between Gremlin OLTP and Gremlin OLAP. Here is what I wrote him, I thought others might like to read it.

As it relates to Gremlin's execution environment.

OLTP: Serial stream processing.
OLAP: Parallel step-wise processing.

OLTP: Low memory footprint, lazy evaluator.
OLAP: High memory footprint, eager evaluator.

OLTP: Touches as little data as possible, low strain on the database.
OLAP: Touches a lot of data, high strain on the database.

OLTP: "Pointer chasing" to enact a traversal.
OLAP: Full "table" scans with message passing to enact a traversal.

OLTP: Database can support numerous concurrent traversals.
OLAP: Database can support few concurrent traversals.

OLTP: Millisecond/second response times. 
OLAP: Minute/hour response times. 

OLTP: Uses DSEGraph ?Thrift? driver to fetch data.
OLAP: Uses DSEGraph?CassandraInputFormat? table scanner to fetch data.

Why OLTP or OLAP?

When your query is going to touch everything in the graph, go OLAP. Examples include PageRank, BulkLoading, BulkDumping, Global analytics.
- "What is the average number of friends for all people?"

When your query is going to touch a very small part of the graph, go OLTP. Examples include personalized recommendations, local queries, get/put behaviors.
- "What is the average number of friends for people that work for DataStax?"

HTH,
Marko.

Abhilash Sharma

unread,
Aug 19, 2017, 12:39:32 AM8/19/17
to Gremlin-users, d...@tinkerpop.incubator.apache.org
Hey, 

I am a researcher working on optimizing graph queries and am familiar with Gremlin. I have read your paper "The Gremlin Graph Traversal Machine and Language" (https://arxiv.org/pdf/1508.03843.pdf). I have two questions regarding OLTP execution model in gremlin. I have read your blog post regarding mechanics of OLAP(https://www.datastax.com/dev/blog/the-mechanics-of-gremlin-olap).In that you mention that OLTP has pull based execution model. Does it mean that it pulls remote Objects referenced by traversers to the client machine?

My second question is how does lazy evaluation help in optimizing queries? I am familiar with Traverser's bulk which essentially helps in reducing memory as well as redundant computation, but i am not able to understand how lazy evaluation help in this?

Tiago Franco

unread,
Jun 5, 2019, 10:49:24 PM6/5/19
to Gremlin-users

Marko Rodriguez

unread,
Jun 6, 2019, 12:57:54 PM6/6/19
to gremli...@googlegroups.com
Hello,

I am a researcher working on optimizing graph queries and am familiar with Gremlin. I have read your paper "The Gremlin Graph Traversal Machine and Language" (https://arxiv.org/pdf/1508.03843.pdf). I have two questions regarding OLTP execution model in gremlin. I have read your blog post regarding mechanics of OLAP(https://www.datastax.com/dev/blog/the-mechanics-of-gremlin-olap).In that you mention that OLTP has pull based execution model. Does it mean that it pulls remote Objects referenced by traversers to the client machine?

Pull-based refers to the way in which objects are propagated through the execution pipeline. When a result is needed, the pipeline can be “next()’d”. This causes a backwards chain reaction where each step in the pipeline calls next() on its preceding step. Objects are processed on demand, when needed. Pull-based execution engines are simple to implement, but hard to thread.

Push-based execution pipelines force results upon their consumer — even if the consumer is currently processing an object. These engines lend themselves naturally to parallelism. You can think of each computational step as an individual “actor”/thread processing objects in its “mailbox”/input-queue and then pushing results out to its subscribers. The complication of this model is around into having to implement buffers, back-pressure (“slow down their buddy”), etc.

NOTE: Unlike TP3, where push-based engines are difficult to incorporate, TinkerPop4 will naturally support both push- and pull-based execution pipelines.

My second question is how does lazy evaluation help in optimizing queries? I am familiar with Traverser's bulk which essentially helps in reducing memory as well as redundant computation, but i am not able to understand how lazy evaluation help in this?

Lazy evaluation is related to pull-based execution. You only execute what is needed, when it is needed. For example:

g.V().has(“name”,’marko”).limit(1)

In a pull-based system, when the first vertex with name=marko is found, the computation halts.

In a push-based system, while the first vertex with name=marko might have already reached has(), the V() step is still pushing out vertices. Thus, extra clock cycles are being used on data that will not be used.

——

Finally, yes, bulking is a very important optimization technique in graph computing. This idea has been captured more elegantly and generally in Stream Ring Theory as “object coefficient.”

Hope that helps,
Marko.




--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/38863566-d21d-49b4-ab55-7a0a06d96a25%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages