Gremlin.Net Resource Leaks

139 views
Skip to first unread message

Austin Malpede

unread,
Jun 5, 2019, 2:57:52 PM6/5/19
to Gremlin-users
Hello,

I've discovered this issue after running a coverity scan on my Gremlin.Net project. 

   
1. alloc_fn: A new resource is returned from allocation method AddV.
   
2. noescape: Resource g.AddV("test") is not closed or saved in Property.
   
CID 11944 (#1 of 1): Resource leak (RESOURCE_LEAK)3. leaked_resource: Failing to save or close resource created by g.AddV("test") leaks it.
 97            g.AddV("test").Property("__graph"GraphName)


Let me explain:

With the way that the library is structured, new instances of GraphTraversal are created relatively frequently yet never get garbage collected. Let me show you an example from the source code.

        /// <summary>
        ///     Adds the addV step to this <see cref="GraphTraversal{SType, EType}" />.
        /// </summary>
        public GraphTraversal<S, Vertex> AddV (string vertexLabel)
        {
            Bytecode.AddStep("addV", vertexLabel);
            return Wrap<S, Vertex>(this);
        }

        private static GraphTraversal<S2, E2> Wrap<S2, E2>(GraphTraversal<S, E> traversal)
        {
            if (typeof(S2) == typeof(S) && typeof(E2) == typeof(E))
            {
                return traversal as GraphTraversal<S2, E2>;
            }
            // New wrapper
            return new GraphTraversal<S2, E2>(traversal.TraversalStrategies, traversal.Bytecode);
        }

Every Gremlin step always ends with the line "return Wrap<S, E>(this);". 
The Wrap function either returns the existing traversal object, or it wraps the old traversal in a new GraphTraversal object and returns that.

So, with all of this in mind we know that for rather large queries many GraphTraversal objects could be created. The issue is that they never go away because GraphTraversal implements IDisposable.
This means that each GraphTraversal object will hang around in memory unless explicitly disposed of.

Hopefully my explanation is clear. I want to understand why GraphTraversal currently implements IDisposable? With how freely new GraphTraversal objects are being created right now it causes resource leaks.

Thanks,
Austin





Stephen Mallette

unread,
Jun 6, 2019, 6:39:21 AM6/6/19
to gremli...@googlegroups.com
This email got cut-off for me after "Let me explain:" - not sure why - but Google Groups seems to have captured the whole thing. Here's the rest for those following along:

---------------------------------------  

Let me explain:

With the way that the library is structured, new instances of GraphTraversal are created relatively frequently yet never get garbage collected. Let me show you an example from the source code.

        /// <summary>
        ///     Adds the addV step to this <see cref="GraphTraversal{SType, EType}" />.
        /// </summary>
        public GraphTraversal<S, Vertex> AddV (string vertexLabel)
        {
            Bytecode.AddStep("addV", vertexLabel);
            return Wrap<S, Vertex>(this);
        }

        private static GraphTraversal<S2, E2> Wrap<S2, E2>(GraphTraversal<S, E> traversal)
        {
            if (typeof(S2) == typeof(S) && typeof(E2) == typeof(E))
            {
                return traversal as GraphTraversal<S2, E2>;
            }
            // New wrapper
            return new GraphTraversal<S2, E2>(traversal.TraversalStrategies, traversal.Bytecode);
        }

Every Gremlin step always ends with the line "return Wrap<S, E>(this);". 
The Wrap function either returns the existing traversal object, or it wraps the old traversal in a new GraphTraversal object and returns that.

So, with all of this in mind we know that for rather large queries many GraphTraversal objects could be created. The issue is that they never go away because GraphTraversal implements IDisposable.
This means that each GraphTraversal object will hang around in memory unless explicitly disposed of.

Hopefully my explanation is clear. I want to understand why GraphTraversal currently implements IDisposable? With how freely new GraphTraversal objects are being created right now it causes resource leaks.

---------------------------------------

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/38596e8f-c64d-4b1a-bd88-336c3f0dcef2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Stephen Mallette

unread,
Jun 6, 2019, 8:24:42 AM6/6/19
to gremli...@googlegroups.com
Hi Austin, thanks for bringing this up. Coverity is correct that we don't explicitly dispose() the GraphTraversal that we create, but:

1. GraphTraversal technically isn't an "expensive" object nor does each creation hold open expensive resources so even if you generate a lot of them it shouldn't be taxing the client system
2. On the server side, there is the potential to leave open large side-effects generated by traversals. The behavior of the side-effect cache is controlled by the TraversalOpPrcessor configuration, so if you never care about retrieval of side-effects you can probably just set the cacheMaxSize to zero and that will alleviate any issues with "open resources" not being disposed

So, the summary here is that you really just need to dispose() the top level traversal if side-effects are being cached on the server as that's the only resource of concern. All the intermediate GraphTraversal instances really don't have anything expensive to close out.

Note that with respect to your point that:

The issue is that they never go away because GraphTraversal implements IDisposable. This means that each GraphTraversal object will hang around in memory unless explicitly disposed of.

Once your GraphTraversal goes out of scope, it should be eligible for garbage collection just like any other object. I suppose we've bent the rules of IDisposable a little bit in terms of our implementation because using this interface has some implied meaning about the nature of the object itself. That said....interestingly, despite what our .NET API docs say[2], we technically don't implement IDisposable on GraphTraversal (if you can see where that is in the code, please point me to it, because i keep staring at it and I don't see it). We do however have a dispose() method available to it inherited from DefaultTraversal and if you follow that, you can see that it simply calls dispose() on side-effects. Now, TraversalSideEffects does implement IDisposable and I think that it makes sense to do so as it explicitly holds the keys to a remote reference that could be closed out. I'm not sure how Coverity picked up the resource leak without explicit implementation of IDisposable on GraphTraversal, unless it realized there was a dispose() method in place that fit the signature or that the object held a TraversalSideEffects which wasn't being explicitly disposed.

If you're completely unaware of Traversal side-effects here's what I'm talking about:

gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().groupCount('a').by(label)
==>v[1]
==>v[2]
==>v[3]
==>v[4]
==>v[5]
==>v[6]
gremlin> g.V().groupCount('a').by(label).cap('a')
==>[software:2,person:4]
gremlin> t = g.V().groupCount('a').by(label)
==>v[1]
==>v[2]
==>v[3]
==>v[4]
==>v[5]
==>v[6]
gremlin> t.getSideEffects().get('a')
==>software=2
==>person=4


Most folks aren't even aware that traversals hang on to side-effects because they simply don't make use of them. They only make use of the traversal result and technically, I wonder if we should have ever even exposed this feature at all. If you want the side-effect you should cap() it to get your result rather than retrieve it after iteration. I don't know what the implications are, but exploring the idea of removing the ability to get side-effects doesn't sound like a bad idea in general. It certainly would simplify Gremlin Server operations to do so.

Given all that, I'm not sure what, if anything, we should do with this. We could:

1. Go to the extreme of looking into killing Traversal side-effects completely in 3.5.0 - then the need for IDisposable potentially goes away. interestingly for a full Gremlin VM implementation we would need to keep IDisposable. As seen in the Java case, we implement Closeable which is analogous to IDisposable because there may be open resources for the graph provider that need to be closed. it seems that this may need some review in the Gremlin Server implementation as i poke around the code :/
2. Explicitly implemented IDisposable on Traversal to match the dispose() method already in place
3. Write more user/api documentation around all of this
4. ??

Anyway, interested to see if anyone has any further thoughts on this one.

Reply all
Reply to author
Forward
0 new messages