Hi,
I've noticed numerous situations where developers are writing Gremlin traversals that make use of data aggregation along the traverser's path. Typically, people will do path() and then "reduce" the data in the path to get the specific result they want out of the path. Unfortunately, this is inefficient as path calculations are expensive, unmergable, and such calculations are "post traversal" and not innate to the act of traversing. What does all that mean?
gremlin> g.V().as('x').outE().inV().jump('x',2).path{1.0}{it.value('weight')}
==>[1.0, 1.0, 1.0, 1.0, 1.0]
==>[1.0, 1.0, 1.0, 0.4, 1.0]
-------------------------------
gremlin> g.V().as('x').outE().inV().jump('x',2).path{1.0}{it.value('weight')}.map{it.get().objects().inject(1.0){a,b -> a * b}} // OLD WAY
==>1.0
==>0.4
gremlin> g.V().withSack{1.0f}.as('x').outE().sack(mult,'weight').inV().jump('x',2).sack() // NEW WAY
In the first example, we walk a path, get that path, then get data from the path elements to reduce to some single result -- i.e. the multiplied weights of the edges traversed.
In the second example, as we walk, we multiply the weight value to the traverser's current "sack" which was initialized to 1.0f via withSack() -- i.e. on the fly reduction.
One the the primary boons of using sack() over path() is that there is less memory usage and with merge operators, scalable path analysis in OLAP situations.
You can read some more examples on the SNAPSHOT docs:
Use cases:
1. Decaying energy algorithms (Gremlin's are no longer discrete but can be modulated by an "energy sack").
2. Graph data harvesting (Gremlin's can pick up data as they walk).
3. In-process path analysis -- as paths are walked, statistics can be gleaned and processed.
Finally, for those wanting to know the difference between sideEffects and sacks:
Sacks are traverser local data structures.
SideEffects are traversal global data structures.