ResultSet has no close() ?

91 views
Skip to first unread message

Luca Garulli

unread,
Dec 22, 2016, 4:33:05 AM12/22/16
to Gremlin-users
Hi guys,

In order to make more efficient the client-side queries, why don't support a close() method in the resultset?

http://tinkerpop.apache.org/javadocs/3.1.4/full/org/apache/tinkerpop/gremlin/driver/ResultSet.html

The goal is to free precious resources in the server as soon as the resultset is not needed anymore.

WDYT?

Thanks,

Luca Garulli
Founder & CEO
OrientDB

Stephen Mallette

unread,
Dec 22, 2016, 7:40:40 AM12/22/16
to Gremlin-users
Hi Luca - There's two ways to look at ResultSet.close(). The first perspective is for sessionless requests and the second is for in-session requests. For sessionless, there is no state maintained between requests, so presuming the user sent a simple traversal to the server, there really is nothing expensive memory-wise opened on the server - just an Iterator that streams back results. I suppose there could be an expensive memory issue if you eagerly evaluated a traversal as in g.V().toList() or purposely opened long-lived expensive resources as part of a script sent to the server. I think that these could be cases for ResultSet.close(), but it would seem that close() would behave as a cancellation mechanism to kill a long running g.V() (for example). As for my other scenario that involved a script that opened an expensive resource, I'm not sure that ResultSet.close() could help in those situations - ResultSet wouldn't know what resources had been opened by a script to be able to release them. 

I don't know if you had other "resources" in mind that would benefit from having a ResultSet.close() - maybe I didn't address something you were thinking of. So - given what I have thought of, it sounds like close() could behave as a bit of a kill command for long running iterators or script evaluations are controlled by timeouts. Unfortunately, we don't have the server side capability to handle a kill request for sessionless requests. We rely on various timeouts that will stop something from being out of control for too long.

For in-session requests, I think you'd largely want to defend against the same things though there is also the notion of state being maintained in this case. I'm not sure what ResultSet.close() would apply to in the latter situation. If I use a session and I send two separate requests:

x = g.V(1).out().toList()
y = g.V(1).in().toList()

I'd get two ResultSet instances where the results would be available on the server for future requests as stored in x and y. Calling close() on the first ResultSet would lead me to believe that i was releasing resources of "x" on the server, but close() wouldn't know how to do that. The problem is similar to the one i mentioned earlier with "long-lived expensive resources" in sessionless requests. It doesn't know anything about "x" or the contents of the script. It wouldn't know how to reference "x" on the server to clear the reference and make it eligible for garbage collection. Perhaps it doesn't need to do any of that but it was just something that popped into my mind as I thought the idea of close() through.

Anyway, I guess the summary of all this is that I'm not sure what the exact semantics of ResultSet.close() should be and where it does make sense in my mind we'd need to make some changes in Gremlin Server protocol to allow for it to work. Hopefully, this wasn't too confusing.


--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/4be7ae25-1020-4578-a18c-d09e71d7a841%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Paul Jackson

unread,
Dec 22, 2016, 8:28:50 AM12/22/16
to Gremlin-users
I've had similar requirement in OLTP mode. Say this query returns 5 elements and I lazily retrieve only the first 3:
x = g.V(1).out()
x.next()
x.next()
x.next()

I could have put a limit on the query in terms of count or time, I suppose. Are we saying that this is how it must be done, that all queries must be iterated to completion? If partially iterating a traversal is supported, a call to x.close() would be really handy.

-Paul
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

Stephen Mallette

unread,
Dec 22, 2016, 8:43:16 AM12/22/16
to Gremlin-users
Sorry - I'm confused by your example. Let me state what I think you're doing. You're talking about using a session and each of those lines in your script is submitted as a separate request. If so, and perhaps you were just shorthanding for your example, but if you did this, then your second request would fail with NoSuchElementException, because "x" would have been iterated out on the first request. You would instead want to do:

x = g.V(1).out();x.next()
x.next()
x.next()

So, now let's consider the resource of "x". Is there really anything "expensive" in "x". It's just an Iterator (one that isn't even chewing up CPU by actively streaming a long list of results. Of course, you may not want it hanging around. This is a session, so you could simply null the reference yourself and send:

x = null

You could probably even access the bindings on the session and remove "x" as a variable all together, though I forget the command to do that off the top of my head. In this case, I don't see how ResultSet.close() would do that for you with a hypothetical close() method (it's the case where ResultSet isn't aware of what is in the script - e.g. what variables it is creating) or what ResultSet.close() would release otherwise as there is no endless iteration or script evaluation taking place to kill.  

To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/6680a199-d351-456c-9e83-51eee2002fd0%40googlegroups.com.

Paul Jackson

unread,
Dec 22, 2016, 9:28:37 AM12/22/16
to Gremlin-users
I wasn't putting this in the context of a session - I was looking at it from the perspective of using the script engine embedded in Java. I'm not familiar with how sessions work with Gremlin - I haven't worked with the Gremlin server, as my platform is its own server. It's not clear to me how it matters whether you pull the first element as part of the first request or not but I'm assuming it has something to do with calling Gremlin through a server.

When working with the script engine embedded, x is a traversal. When I call next() the code eventually winds through Tinkerpop to my implementation of Graph where the call is delegated to an instance of org.neo4j.graphdb.index.IndexHits, which has a close() method.

The only way I can see getting close() called by nulling a reference is by overriding finalize(), which is an antipattern. Instead, I've resorted to wrapping the traversal returned by Gremlin in my own closeable iterable that tracks all iterables that were opened in my Graph and closes them once it is closed - very ugly.

I agree the resources aren't expensive, but leaving them open is still a memory leak.

Regards,
-Paul

Stephen Mallette

unread,
Dec 22, 2016, 9:37:46 AM12/22/16
to Gremlin-users
ResultSet.close() is related to Gremlin Server/Driver, as are sessions/sessionless, and everything else I discussed. Speaking more specifically to the case you described:

Instead, I've resorted to wrapping the traversal returned by Gremlin in my own closeable iterable that tracks all iterables that were opened in my Graph and closes them once it is closed - very ugly.

What do you mean by closing your own iterables? What does your wrapper call close() on?

To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/f200f7f9-1d14-423e-8600-8dd80144924a%40googlegroups.com.

Luca Garulli

unread,
Dec 22, 2016, 9:45:48 AM12/22/16
to gremlin-users
Sorry guys, I was referring to the API that returns Iterator (or Iterable). Example in the Graph interface:

public Iterator<Vertex> vertices(Object... vertexIds) {}
public Iterator<Edge> edges(Object... edgeIds) {}

What I was looking for was a way to free the server-side cursor.

Luca


Paul Jackson

unread,
Dec 22, 2016, 9:54:28 AM12/22/16
to Gremlin-users
Sorry - not meaning to hijack Luca's thread. I use a registry class to track what's open and my iterable uses it to close resources.

My wrapper looks like this:
public class RegistryClosingIterator implements CloseableIterable {
  private Iterator<?> iterator;

  public void setResult(Object resultPipe) { this.iterator = QueryResultUtil.getIterator(resultPipe); }

  public Iterator<?> iterator() { return iterator; }

  public void close() {
    CloseableRegistry.closeAll();
  }
}

where CloseableRegistry is a container of opened resources:
public class CloseableRegistry {
  private static final ThreadLocal<CloseableRegistry> threadLocalCloseable = new ThreadLocal<>();
  private final Map<Closeable, Closeable> map = new IdentityHashMap<>();

  private CloseableRegistry() { threadLocalCloseable.set(this); }

  public static void add(Closeable closeable) {
    CloseableRegistry registryClosingIterator = threadLocalCloseable.get();
    if (registryClosingIterator == null)
      registryClosingIterator = new CloseableRegistry();
    registryClosingIterator.map.put(closeable, closeable);
  }

  public static void remove(Closeable closeable) {
    CloseableRegistry closeableRegistry = threadLocalCloseable.get();
    if (closeableRegistry != null)
      closeableRegistry.map.remove(closeable);
  }

  static void closeAll() {
    CloseableRegistry closeableRegistry = threadLocalCloseable.get();
    if (closeableRegistry != null) {
      threadLocalCloseable.remove();
      if (!closeableRegistry.map.isEmpty())
        closeableRegistry.map.keySet().forEach(FileUtils::closeQuietly);
    }
  }
}

Finally, my Graph registers any iterable that it returns to Gremlin before returning.
    CloseableRegistry.add(iterable);

It's a hack, of course, but it was the only way I could see to ensure resources are closed when the caller is finished with the traversal.

-Paul

Paul Jackson

unread,
Dec 22, 2016, 9:57:37 AM12/22/16
to Gremlin-users
Wouldn't you need Gremlin to call close() on this once it was completed traversing? Gremlin 2.x had a CloseableIterable interface, but close() wasn't being called. Looks like it's been dropped in 3.x.

-Paul

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

Stephen Mallette

unread,
Dec 28, 2016, 8:10:30 AM12/28/16
to Gremlin-users
Paul, I'm sorry but I'm not connecting something here. I still don't really understand what you are closing. Is this code based on TinkerPop 2.x or something? I see you referencing "pipes" and "CloseableIteratable" - that's not TinkerPop 3.x talk.

To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/6d7a3a3c-fdae-4266-9199-eb8b5e02ecc5%40googlegroups.com.

Stephen Mallette

unread,
Dec 28, 2016, 8:29:03 AM12/28/16
to Gremlin-users
Luca, perhaps we should take a page from 2.x (which Paul reminded me of in this thread) and bring CloseableIterator. In a way that would bring some unity to the Traversal API. I started a discussion on the dev list with more details. Let's move the conversation there.




Luca Garulli

unread,
Dec 28, 2016, 9:29:46 AM12/28/16
to gremlin-users
Thanks for posting it to the Apache TinkerPop ML.

Reply all
Reply to author
Forward
0 new messages