GC overhead limit exceeded

567 views
Skip to first unread message

Darren Govoni

unread,
Apr 7, 2012, 8:38:34 AM4/7/12
to ne...@googlegroups.com
Hi,
  I am running neo4j community SNAPSHOT  1.7 on a 8GB RAM quad core intel i7 laptop. 
My database has about 5k nodes and 10k relationships. Suddenly I am getting this exception.
Sorry its not pretty printed, but its what comes back. I have plenty of available memory.

Thanks for any tips.

SystemError: ({'status': '500', 'content-length': '4330', 'cache-control': 'must-revalidate,no-cache,no-store', 'content-type': 'text/html; charset=iso-8859-1', 'server': 'Jetty(6.1.25)'}, '<html>\n<head>\n<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>\n<title>Error 500 GC overhead limit exceeded</title>\n</head>\n<body><h2>HTTP ERROR 500</h2>\n<p>Problem accessing /db/data/ext/GremlinPlugin/graphdb/execute_script. Reason:\n<pre>    GC overhead limit exceeded</pre></p><h3>Caused by:</h3><pre>java.lang.OutOfMemoryError: GC overhead limit exceeded\n\tat java.lang.AbstractStringBuilder.&lt;init&gt;(AbstractStringBuilder.java:45)\n\tat java.lang.StringBuilder.&lt;init&gt;(StringBuilder.java:68)\n\tat org.neo4j.server.rest.repr.Serializer.joinBaseWithRelativePath(Serializer.java:101)\n\tat org.neo4j.server.rest.repr.Serializer.relativeUri(Serializer.java:79)\n\tat org.neo4j.server.rest.repr.MappingSerializer.putUri(MappingSerializer.java:36)\n\tat org.neo4j.server.rest.repr.ValueRepresentation$1.putTo(ValueRepresentation.java:108)\n\tat org.neo4j.server.rest.repr.ObjectRepresentation$PropertyGetter.putTo(ObjectRepresentation.java:132)\n\tat org.neo4j.server.rest.repr.ObjectRepresentation.serialize(ObjectRepresentation.java:143)\n\tat org.neo4j.server.rest.repr.Serializer.serialize(Serializer.java:40)\n\tat org.neo4j.server.rest.repr.ListSerializer.addMapping(ListSerializer.java:56)\n\tat org.neo4j.server.rest.repr.MappingRepresentation.addTo(MappingRepresentation.java:52)\n\tat org.neo4j.server.rest.repr.ListRepresentation.serialize(ListRepresentation.java:60)\n\tat org.neo4j.server.rest.repr.Serializer.serialize(Serializer.java:73)\n\tat org.neo4j.server.rest.repr.ListSerializer.addList(ListSerializer.java:61)\n\tat org.neo4j.server.rest.repr.ListRepresentation.addTo(ListRepresentation.java:67)\n\tat org.neo4j.server.rest.repr.ListRepresentation.serialize(ListRepresentation.java:60)\n\tat org.neo4j.server.rest.repr.ListRepresentation.serialize(ListRepresentation.java:51)\n\tat org.neo4j.server.rest.repr.OutputFormat.format(OutputFormat.java:123)\n\tat org.neo4j.server.rest.repr.OutputFormat.response(OutputFormat.java:100)\n\tat org.neo4j.server.rest.repr.OutputFormat.ok(OutputFormat.java:48)\n\tat org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:122)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)\n\tat java.lang.reflect.Method.invoke(Method.java:597)\n\tat com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)\n\tat com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)\n\tat com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)\n\tat com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)\n\tat com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)\n\tat com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)\n\tat com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)\n</pre>\n<hr /><i><small>Powered by Jetty://</small></i><br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n\n</body>\n</html>\n')

Darren Govoni

unread,
Apr 7, 2012, 8:54:28 AM4/7/12
to ne...@googlegroups.com
SystemError: ({'status': '500', 'content-length': '4330', 'cache-control': 'must-revalidate,no-cache,no-store', 'content-type': 'text/html; charset=iso-8859-1', 'server': 'Jetty(6.1.25)'}, '<html>\n<head>\n<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>\n<title>Error 500 GC overhead limit exceeded</title>\n</head>\n<body><h2>HTTP ERROR 500</h2>\n<p>Problem accessing /db/data/ext/GremlinPlugin/graphdb/execute_script. Reason:\n<pre>    GC overhead limit exceeded</pre></p><h3>Caused by:</h3><pre>java.lang.OutOfMemoryError: GC overhead limit exceeded\n\tat java.lang.AbstractStringBuilder.&lt;init&gt;(AbstractStringBuilder.java:45)\n\tat java.lang.StringBuilder.&lt;init&gt;(StringBuilder.java:68)\n\tat org.neo4j.server.rest.repr.Serializer.joinBaseWithRelativePath(Serializer.java:101)\n\tat org.neo4j.server.rest.repr.Serializer.relativeUri(Serializer.java:79)\n\tat org.neo4j.server.rest.repr.MappingSerializer.putUri(MappingSerializer.java:36)\n\tat org.neo4j.server.rest.repr.ValueRepresentation$1.putTo(ValueRepresentation.java:108)\n\tat org.neo4j.server.rest.repr.ObjectRepresentation$PropertyGetter.putTo(ObjectRepresentation.java:132)\n\tat org.neo4j.server.rest.repr.ObjectRepresentation.serialize(ObjectRepresentation.java:143)\n\tat org.neo4j.server.rest.repr.Serializer.serialize(Serializer.java:40)\n\tat org.neo4j.server.rest.repr.ListSerializer.addMapping(ListSerializer.java:56)\n\tat org.neo4j.server.rest.repr.MappingRepresentation.addTo(MappingRepresentation.java:52)\n\tat org.neo4j.server.rest.repr.ListRepresentation.serialize(ListRepresentation.java:60)\n\tat org.neo4j.server.rest.repr.Serializer.serialize(Serializer.java:73)\n\tat org.neo4j.server.rest.repr.ListSerializer.addList(ListSerializer.java:61)\n\tat org.neo4j.server.rest.repr.ListRepresentation.addTo(ListRepresentation.java:67)\n\tat org.neo4j.server.rest.repr.ListRepresentation.serialize(ListRepresentation.java:60)\n\tat org.neo4j.server.rest.repr.ListRepresentation.serialize(ListRepresentation.java:51)\n\tat org.neo4j.server.rest.repr.OutputFormat.format(OutputFormat.java:123)\n\tat org.neo4j.server.rest.repr.OutputFormat.response(OutputFormat.java:100)\n\tat org.neo4j.server.rest.repr.OutputFormat.ok(OutputFormat.java:48)\n\tat org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:122)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)\n\tat java.lang.reflect.Method.invoke(Method.java:597)\n\tat com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)\n\tat com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)\n\tat com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)\n\tat com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)\n\tat com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)\n\tat com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)\n\tat com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)\n</pre>\n<hr /><i><small>Powered by Jetty://</small></i><br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n<br/>                                                \n\n</body>\n</html>\n')


James Thornton

unread,
Apr 7, 2012, 12:20:05 PM4/7/12
to ne...@googlegroups.com
What command did you execute?

Presumably you did something like a g.V that tried to return all the vertices.

- James

Darren Govoni

unread,
Apr 7, 2012, 1:47:36 PM4/7/12
to ne...@googlegroups.com

        config = bulbs.config.Config('http://'+host+':7474/db/data')
        g = Graph(config)
        g.config.autoindex = False         
        script='g.V.filter{it.name=="NAME'"}.inE.outV.loop(2){it.loops<=3}{true}.paths'
        results = g.client.gremlin(script)
        
        return results

But it was working fine yesterday. It's only a few thousand nodes, so if it returned them all (which it doesn't), I would be surprised if that exceeded GC.
Very odd.

James Thornton

unread,
Apr 7, 2012, 2:21:21 PM4/7/12
to ne...@googlegroups.com


On Saturday, April 7, 2012 12:47:36 PM UTC-5, project2501 wrote:

        config = bulbs.config.Config('http://'+host+':7474/db/data')
        g = Graph(config)
        g.config.autoindex = False         
        script='g.V.filter{it.name=="NAME'"}.inE.outV.loop(2){it.loops<=3}{true}.paths'
        results = g.client.gremlin(script)
        
        return results

But it was working fine yesterday. It's only a few thousand nodes, so if it returned them all (which it doesn't), I would be surprised if that exceeded GC.
Very odd.


You're using g.V.filter, which is returning all the vertices and filtering on the ones with a certain name. 

Like I said a couple of days ago, you don want to do that (https://groups.google.com/d/msg/neo4j/n56KUtoUiyo/JQE6eAja6DAJ) -- use an index instead:

script = "g.idx('vertex')[[name:'NAME']].inE.outV.loop(2){it.loops<=3}{true}.paths"

Also, this Bulbs command does not return results:

results = g.client.gremlin(script)

It returns a Response object:

resp = g.client.gremlin(script)

The results are in the response:

results = resp.results

And you don't need to use the low-level client directly, you can execute Gremlin scripts via the Gremlin object on the graph. 

Using the generic execute() method does the same thing as the low-level client:

resp = g.gremlin.execute(script)

The query() method returns initialized elements (but you're returning paths, so don't use it in this case):

vertices = g.gremlin.query(script)

And the command() method returns a single Result object from a Gremlin command:

result = g.gremlin.command(script)

All the Bulbs Gremlin methods are documented here: http://bulbflow.com/docs/api/bulbs/gremlin/

In all cases, you should use the script params arg rather than hard-coding them in the script. 

>>> config = bulbs.config.Config('http://'+host+':7474/db/data')
>>> g = Graph(config)
>>> g.config.autoindex = False         
>>> script = "g.idx(index_name)[[name:name]].out.loop(2){it.loops<=3}{true}.paths"
>>> params = dict(index_name='vertex', name='NAME')
>>> resp = g.gremlin.execute(script, params)

Do this because the Gremlin-Groovy script engine compiles and caches individual scripts, and when you execute scripts without using bind variables, it treats each call as a new script so it recompiles the script and caches each one. 

The script cache is presently set to 500 scripts so not using bind params will quickly blow through your script cache and kill your performance because it has to recompile the script each time.

But all of the above is predicated on you indexing vertices so put the property conversion in the create_indexed_vertex Gremlin script like I showed you:


Then leave autoindex set to True. Your final Bulbs code will look like this:

>>> config = bulbs.config.Config('http://'+host+':7474/db/data')
>>> g = Graph(config)
>>> script = "g.idx(index_name)[[name:name]].out.loop(2){it.loops<=3}{true}.paths"
>>> params = dict(index_name='vertex', name='NAME')
>>> resp = g.gremlin.execute(script, params)

- James

Darren Govoni

unread,
Apr 7, 2012, 2:53:42 PM4/7/12
to ne...@googlegroups.com
Thanks James. I will try it. I have a lot of code here to update still based on new versions of stuff, so still playing catch up!

I tried the bulbflow link but maybe the server is down? I was waiting for the docs there to be updated for 0.3 so that's good news!

James Thornton

unread,
Apr 7, 2012, 2:56:35 PM4/7/12
to ne...@googlegroups.com


On Saturday, April 7, 2012 1:53:42 PM UTC-5, project2501 wrote:
Thanks James. I will try it. I have a lot of code here to update still based on new versions of stuff, so still playing catch up!

I tried the bulbflow link but maybe the server is down? I was waiting for the docs there to be updated for 0.3 so that's good news!

Yes, the docs on the website have been updated for 0.3. 

The link works for me...


- James 

Darren Govoni

unread,
Apr 7, 2012, 3:12:15 PM4/7/12
to ne...@googlegroups.com
I tried your suggestion and get some errors. Here is my code context.

        config = bulbs.config.Config('http://'+host+':7474/db/data')
        g = Graph(config)
        g.config.autoindex = False         
        script = 'g.idx(index_name)[[name:vname]].out.loop(2){it.loops<=3}{true}.paths'
        params = dict(index_name='vertex', vname=entity)
        results = g.gremlin.execute(script, params)  

AttributeError: 'Gremlin' object has no attribute 'execute'

Then I tried.....

        config = bulbs.config.Config('http://'+host+':7474/db/data')
        g = Graph(config)
        g.config.autoindex = False         
        script = 'g.idx(index_name)[[name:vname]].out.loop(2){it.loops<=3}{true}.paths'
        params = dict(index_name='vertex', vname=entity)
        results = g.client.gremlin(script, params)  

SystemError: ({'status': '500', 'content-length': '4847', 'content-encoding': 'UTF-8', 'server': 'Jetty(6.1.25)', 'access-control-allow-origin': '*', 'content-type': 'application/json'}, '{\n  "exception" : "java.lang.NullPointerException",\n  "stacktrace" : [ "com.tinkerpop.pipes.branch.LoopPipe.getPath(LoopPipe.java:76)", "com.tinkerpop.pipes.branch.LoopPipe.processNextStart(LoopPipe.java:45)", "com.tinkerpop.pipes.AbstractPipe.next(AbstractPipe.java:75)", "com.tinkerpop.pipes.transform.PathPipe.processNextStart(PathPipe.java:24)", "com.tinkerpop.pipes.transform.PathPipe.processNextStart(PathPipe.java:16)", "com.tinkerpop.pipes.AbstractPipe.hasNext(AbstractPipe.java:84)", "com.tinkerpop.pipes.util.Pipeline.hasNext(Pipeline.java:107)", "org.neo4j.server.rest.repr.ObjectToRepresentationConverter.convertValuesToRepresentations(ObjectToRepresentationConverter.java:100)", "org.neo4j.server.rest.repr.ObjectToRepresentationConverter.getListRepresentation(ObjectToRepresentationConverter.java:87)", "org.neo4j.server.rest.repr.ObjectToRepresentationConverter.convert(ObjectToRepresentationConverter.java:46)", "org.neo4j.server.plugin.gremlin.GremlinPlugin.executeScript(GremlinPlugin.java:85)", "sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)", "sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)", "sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)", "java.lang.reflect.Method.invoke(Method.java:597)", "org.neo4j.server.plugins.PluginMethod.invoke(PluginMethod.java:57)", "org.neo4j.server.plugins.PluginManager.invoke(PluginManager.java:168)", "org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:300)", "org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:122)", "sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)", "sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)", "sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)", "java.lang.reflect.Method.invoke(Method.java:597)", "com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)", "com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)", "com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)", "com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)", "com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)", "com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)", "com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)", "com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)", "com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)", "com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)", "com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)", "com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)", "com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)", "com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)", "com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)", "javax.servlet.http.HttpServlet.service(HttpServlet.java:820)", "org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)", "org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166)", "org.neo4j.server.statistic.StatisticFilter.doFilter(StatisticFilter.java:62)", "org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)", "org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)", "org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)", "org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)", "org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)", "org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)", "org.mortbay.jetty.Server.handle(Server.java:326)", "org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)", "org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:943)", "org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)", "org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)", "org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)", "org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)", "org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)" ]\n}')
File "/usr/lib/python2.6/threading.py", line 504, in __bootstrap

On Saturday, April 7, 2012 8:38:34 AM UTC-4, Darren Govoni wrote:

Michael Hunger

unread,
Apr 7, 2012, 3:13:38 PM4/7/12
to ne...@googlegroups.com
Are you sure that you didn't change anything on the gremlin script? It rather seems to be building up a ultralarge result set in memory?

How is your memory config for the server in (conf/neo4j-wrapper.conf) ?

Michael

James Thornton

unread,
Apr 7, 2012, 3:17:05 PM4/7/12
to ne...@googlegroups.com


On Saturday, April 7, 2012 2:12:15 PM UTC-5, project2501 wrote:
I tried your suggestion and get some errors. Here is my code context.

        config = bulbs.config.Config('http://'+host+':7474/db/data')
        g = Graph(config)
        g.config.autoindex = False         
        script = 'g.idx(index_name)[[name:vname]].out.loop(2){it.loops<=3}{true}.paths'
        params = dict(index_name='vertex', vname=entity)
        results = g.gremlin.execute(script, params)  

AttributeError: 'Gremlin' object has no attribute 'execute'


Make sure you have pulled the latest Bulbs from GitHub.

 

Then I tried.....

        config = bulbs.config.Config('http://'+host+':7474/db/data')
        g = Graph(config)
        g.config.autoindex = False         
        script = 'g.idx(index_name)[[name:vname]].out.loop(2){it.loops<=3}{true}.paths'
        params = dict(index_name='vertex', vname=entity)
        results = g.client.gremlin(script, params)  

SystemError: ({'status': '500', 'content-length': '4847', 'content-encoding': 'UTF-8', 'server': 'Jetty(6.1.25)', 'access-control-allow-origin': '*', 'content-type': 'application/json'}, '{\n  "exception" : "java.lang.NullPointerException",\n  


I inadvertently combined outE.inV into "out" when I rewrote your script so loop didn't have enough steps to loop on. 

Try this instead:

        config = bulbs.config.Config('http://'+host+':7474/db/data')
        g = Graph(config)
        g.config.autoindex = False         
        script = 'g.idx(index_name)[[name:vname]].outE.inV.loop(2){it.loops<=3}{true}.paths'
        params = dict(index_name='vertex', vname=entity)
        results = g.client.gremlin(script, params)  

 - James

Darren Govoni

unread,
Apr 7, 2012, 3:18:40 PM4/7/12
to ne...@googlegroups.com
For some reason, I can't get to bulbflow.com right now, it times out. All my other sites load. Just fyi.

James Thornton

unread,
Apr 7, 2012, 3:24:25 PM4/7/12
to ne...@googlegroups.com


On Saturday, April 7, 2012 2:18:40 PM UTC-5, project2501 wrote:
For some reason, I can't get to bulbflow.com right now, it times out. All my other sites load. Just fyi.

That's odd -- I moved it to Heroku last week, and it works for me.

Michael and others, can you access it..?
 

- James

Darren Govoni

unread,
Apr 7, 2012, 3:40:19 PM4/7/12
to ne...@googlegroups.com
FYI. Tried it again in case it was a hiccup somewhere. Still can't reach it. Can't ping it either.

darren@tungsten:~/software$ ping bulbflow.com
ping: unknown host bulbflow.com
darren@tungsten:~/software$ 

Darren Govoni

unread,
Apr 7, 2012, 4:17:36 PM4/7/12
to ne...@googlegroups.com
James,
   Thanks again. So far, I'm catching up with your suggestions and improving things. The index stuff from yesterday helped and I can abandon g.V (not sure why it exists or why its provided in Gremlin tutorials though).
And this current suggestion works, but sorta. It seems that I get intermittent GC limit exceed and now Java Heap exceptions[1] depending if the vertex in question is more than a couple edges.
Also, if I adjust the loop factor  it.loops<=3  to it.loops<=2 in some cases it corrects the problem.

This is leading me to think that Gremlin is not very efficient nor scalable and somewhere there is a geometric calculation occurring. It seems neo4j/blueprints doesn't re-write the gremlin logic to an optimal expression. My simple looping query over a meager 5k vertex, 10k edge is causing Neo4j to exhaust all resources. This can't be right....so I must be missing something here again. I'll keep trying.

:/
Darren

[1] SystemError: ({'status': '500', 'content-length': '1500', 'cache-control': 'must-revalidate,no-cache,no-store', 'content-type': 'text/html; charset=iso-8859-1', 'server': 'Jetty(6.1.25)'}, '&lt;html&gt;\n&lt;head&gt;\n&lt;meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/&gt;\n&lt;title&gt;Error 500 Java heap space&lt;/title&gt;\n&lt;/head&gt;\n&lt;body&gt;&lt;h2&gt;HTTP ERROR 500&lt;/h2&gt;\n&lt;p&gt;Problem accessing /db/data/ext/GremlinPlugin/graphdb/execute_script. Reason:\n&lt;pre&gt; Java heap space&lt;/pre&gt;&lt;/p&gt;&lt;h3&gt;Caused by:&lt;/h3&gt;&lt;pre&gt;java.lang.OutOfMemoryError: Java heap space\n&lt;/pre&gt;\n&lt;hr

Darren Govoni

unread,
Apr 7, 2012, 4:20:07 PM4/7/12
to ne...@googlegroups.com
Hmmm, the queries work fine in Neo4j web gremlin console....but not through bulbs and rest endpoint....

James Thornton

unread,
Apr 7, 2012, 4:56:54 PM4/7/12
to ne...@googlegroups.com


On Saturday, April 7, 2012 3:17:36 PM UTC-5, project2501 wrote:
James,
   Thanks again. So far, I'm catching up with your suggestions and improving things. The index stuff from yesterday helped and I can abandon g.V (not sure why it exists or why its provided in Gremlin tutorials though).
And this current suggestion works, but sorta. It seems that I get intermittent GC limit exceed and now Java Heap exceptions[1] depending if the vertex in question is more than a couple edges.
Also, if I adjust the loop factor  it.loops<=3  to it.loops<=2 in some cases it corrects the problem.

This is leading me to think that Gremlin is not very efficient nor scalable and somewhere there is a geometric calculation occurring. It seems neo4j/blueprints doesn't re-write the gremlin logic to an optimal expression. My simple looping query over a meager 5k vertex, 10k edge is causing Neo4j to exhaust all resources. This can't be right....so I must be missing something here again. I'll keep trying.



You can obviously increase the heap size, but think about what your script is doing...

g.idx(index_name)[[name:vname]].outE.inV.loop(2){it.loops<=3}{true}.paths
 
Why are you doing this? -- What are you trying to accomplish?

- James

Darren Govoni

unread,
Apr 7, 2012, 5:35:36 PM4/7/12
to ne...@googlegroups.com
I was told[1] that script is the best way to "fan out" from a given vertex, up to 3 edges away....
I had to modify the script slightly so the results would include edges. Even if the script obtained ALL the edges and nodes in my database,

Michael Hunger

unread,
Apr 7, 2012, 5:49:11 PM4/7/12
to ne...@googlegroups.com
yep, works for me.

Michael

perhaps a DNS issue Darren?

Darren Govoni

unread,
Apr 7, 2012, 6:00:23 PM4/7/12
to ne...@googlegroups.com
I can get to it now. Must have been a glitch in the matrix.

James Thornton

unread,
Apr 7, 2012, 6:43:14 PM4/7/12
to ne...@googlegroups.com

On Saturday, April 7, 2012 4:35:36 PM UTC-5, project2501 wrote:
I was told[1] that script is the best way to "fan out" from a given vertex, up to 3 edges away....
I had to modify the script slightly so the results would include edges. Even if the script obtained ALL the edges and nodes in my database,


You original questions was regarding "attempting the recipe for paths between two nodes."

But your current script will return all paths emanating from your start node that have path lengths 1, 2, or 3.

Are you trying to find all paths between two nodes, or all paths emanating from the start node? 

- James


Darren Govoni

unread,
Apr 7, 2012, 8:24:47 PM4/7/12
to ne...@googlegroups.com
Both. In different parts of my code. I actually noticed the maxmemory setting for neo4j was only 64MB.
Seems awfully low. So I raised it and it works better now, but I'm concerned what the memory requirements
are going to be for very large graph databases.

Also, I don't get any of these memory errors when running the same script from the Neo4j webadmin, so not sure what else is going on there.

I'll do more study on this.

Peter Neubauer

unread,
Apr 8, 2012, 2:34:13 AM4/8/12
to ne...@googlegroups.com

How many nodes is your script returning? I guess it is the JSON serialization taking that memory. Returning smaller chunks of less data (e. G. Node properties instead of full node representation) would be better...

Darren Govoni

unread,
Apr 8, 2012, 7:31:42 AM4/8/12
to ne...@googlegroups.com
In some cases, it looks like it can return 1000 nodes. I'm reusing vertices but not edges. So there can be many duplicate edges (for counting).

How does one return just node properties? That's all I really need, but I see a lot of other stuff in nodes like URLs.

Perhaps its a question for James as well since I'm using bulbs neo4jserver graph.

Michael Hunger

unread,
Apr 8, 2012, 7:37:16 AM4/8/12
to ne...@googlegroups.com
You would extract the node-properties into a map and then return just this collection of maps.

Michael

Darren Govoni

unread,
Apr 8, 2012, 11:24:00 AM4/8/12
to ne...@googlegroups.com
Hi Mike,
  Interestingly enough, I'm doing that already, but under this circumstance, the exception occurs before  the response even comes back. Presumably because there's too much data for the heap
on the server side (i.e. inside neo4j server). I did raise the maxmemory setting for neo4j, but am worried this is only a temporary fix and when my data grows, the problem could
keep re-appearing. I suppose I shouldn't be returning 1000 nodes anyway, so I might be able to solve this at query time.

With the adjusted maxmemory setting, I'm running again normally. It was defaulted to 64MB.

Peter Neubauer

unread,
Apr 8, 2012, 11:46:21 AM4/8/12
to ne...@googlegroups.com
Darren,
any chance you can profile this? Would be interesting to see what
exactly is happening, or you could give us the dataset and a sample
CURL query?

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

Neo4j                                - Graphs rule.
Program or be programmed - Computer Literacy for kids.
http://foocafe.org/#CoderDojo

James Thornton

unread,
Apr 8, 2012, 12:58:18 PM4/8/12
to ne...@googlegroups.com


On Sunday, April 8, 2012 6:31:42 AM UTC-5, project2501 wrote:
In some cases, it looks like it can return 1000 nodes. I'm reusing vertices but not edges. So there can be many duplicate edges (for counting).

How does one return just node properties? That's all I really need, but I see a lot of other stuff in nodes like URLs.

Perhaps its a question for James as well since I'm using bulbs neo4jserver graph.



Darren, you are returning all the paths emanating from a node -- you're not returning the nodes so returning the node properties is not relevant in this case.

To get an idea of the size of the data you are returning, return a count of the paths instead:

g.idx(index_name)[[name:vname]].outE.inV.loop(2){it.loops<=3}{true}.paths.count()

- James

 

Michael Hunger

unread,
Apr 8, 2012, 1:14:28 PM4/8/12
to ne...@googlegroups.com
Aldo 64 MB is by far too little for a prod db you 
Will get the gc churn w/o finishing.

Run a real db with 2 to 8 GB RAM

and it will not add up.



Sent from mobile device

Darren Govoni

unread,
Apr 9, 2012, 3:49:01 PM4/9/12
to ne...@googlegroups.com
I will. I bumped the maxmemory to 2GB and things are working, but when I have a chance
I will run experiments with lower memory. Because I hope not to get hit with this in production
setting. So its on my TODO list.

Darren Govoni

unread,
Apr 11, 2012, 4:50:53 PM4/11/12
to ne...@googlegroups.com
Here's my basic query.

g.idx('vertex')[[name:'document']].bothE.bothV.loop(2){it.loops<=4}{true}.paths.count()

for it.loops<2, count is 288
for it.loops<3, count is 3278
for it.loops<4, count is 46,960

My dashboard reads.

156 nodes
957 properties
180 relationships
61 relationship types

Darren Govoni

unread,
Apr 11, 2012, 4:53:03 PM4/11/12
to ne...@googlegroups.com
What I see happening here is a combinatorial explosion. As you can see, my data set is not that rich..... :(

Peter Neubauer

unread,
Apr 11, 2012, 5:03:39 PM4/11/12
to ne...@googlegroups.com
Yes,
looks like it. What is the reason for this query, and is there any way
you can prune it down by e.g. including directions in the relationship
traversals?

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

Neo4j                                - Graphs rule.
Program or be programmed - Computer Literacy for kids.
http://foocafe.org/#CoderDojo

Darren Govoni

unread,
Apr 11, 2012, 8:20:37 PM4/11/12
to ne...@googlegroups.com
I want to display a graph centered on a vertex and fanning out some number of levels, most likely 3.
I was told to use paths for this and it seems to make sense.

My graph is composed of sentence triples like.

document contains text
index has documents

But for a repeat occurrences, I reuse the same vertex so path finding will work, but
I don't know how to re-use an edge (is it even possible?) so if "document contains text"
appears 20 times, then this adds a factorial of combinations to the path finding.

If I had 100 mllion nodes in my graph (which is supported by neo4j?) do these gremlin queries
scale? I'm thinking maybe not, if any vertex has a handful of unique edges, the combinations will be
off the charts when you add even half a dozen vertices with their own edges.

Is there a best practice or known limits to neo4j's ability to find paths with more than a few dozen vertex?

Marko Rodriguez

unread,
Apr 11, 2012, 8:27:50 PM4/11/12
to ne...@googlegroups.com

Hi,

Gremlin is a lazy language so you can next() results all day long. If you are trying to save all these results in memory, then you will run into problems.

What is your query scenario?

Marko.

http://markorodriguez.com

James Thornton

unread,
Apr 11, 2012, 8:55:30 PM4/11/12
to ne...@googlegroups.com


On Wednesday, April 11, 2012 3:53:03 PM UTC-5, project2501 wrote:
What I see happening here is a combinatorial explosion. As you can see, my data set is not that rich..... :(

On Wednesday, April 11, 2012 4:50:53 PM UTC-4, Darren Govoni wrote:
Here's my basic query.

g.idx('vertex')[[name:'document']].bothE.bothV.loop(2){it.loops<=4}{true}.paths.count()

for it.loops<2, count is 288
for it.loops<3, count is 3278
for it.loops<4, count is 46,960


Yes, this Gremlin "paths" expression will return all possible path combinations emanating from all vertices named "document" so it will be quite large -- is this really what you want?

Please describe your context and end goal, and maybe we can help you craft a better solution.

- James



 

Darren Govoni

unread,
Apr 11, 2012, 10:13:03 PM4/11/12
to ne...@googlegroups.com
Thanks James. Well, I guess I want all minimal paths emanating (both ins and outs) from a vertex. For example,
let's say I have the following vertex-edge-vertex triples (directed).

v1-e1-v2
v1-e2-v2
v1-e3-v2
v2-e4-v3
v2-e5-v3 
v2-e6-v3

where e1-e3 and e4-e6 have the same "name" property (but different id's).
So reducing the above list to:

v1-e1-v2
v2-e4-v3

or perhaps

v1-"buy"-v2
v2-"likes"-v3

such that any one of e1,e2,e3 would suffice because they have the same name (e.g. "buy").
I duplicate the edges because I'm not sure how to reuse one or if that concept exists,
but also because I want to count the number of edges with "buy" for Page Rank purposes.
And I use properties on edges that trace back to other objects elsewhere.

The above example is only 2 edges away, but it can vary (e.g. it.loops<X)

I'm still wrestling with gremlin syntax/semantics to understand how to affect this. 
Maybe something with simplePath. Still trying. Slow but sure.

Marko Rodriguez

unread,
Apr 11, 2012, 4:57:36 PM4/11/12
to ne...@googlegroups.com
Hi,

That is odd. Here are some notes:
1. Use 'both' instead of bothE.bothV (and then loop(1))
2. Try without paths and simply do count() (just for testing)

That is such a tiny graph that I don't know why you are having a GC overhead limit exceeded. Are you doing this from the Gremlin REPL, a Groovy class, or from the Web Admin? 

Oh reading lower, I see you are doing some Bulbs related stuff----can you speak more to that as James is the Bulbs guy.....

Thanks,
Marko.

James Thornton

unread,
Apr 12, 2012, 3:31:37 PM4/12/12
to ne...@googlegroups.com


On Wednesday, April 11, 2012 3:57:36 PM UTC-5, Marko Rodriguez wrote:
Hi,

That is odd. Here are some notes:
1. Use 'both' instead of bothE.bothV (and then loop(1))
2. Try without paths and simply do count() (just for testing)

That is such a tiny graph that I don't know why you are having a GC overhead limit exceeded. Are you doing this from the Gremlin REPL, a Groovy class, or from the Web Admin? 

Oh reading lower, I see you are doing some Bulbs related stuff----can you speak more to that as James is the Bulbs guy.....


Marko, he's returning all possible paths over REST -- all 47,000 of them -- but he doesn't really want or need this, he just wants the shortest path.

- James
 

Marko Rodriguez

unread,
Apr 12, 2012, 3:53:15 PM4/12/12
to ne...@googlegroups.com
Hey,

Marko, he's returning all possible paths over REST -- all 47,000 of them -- but he doesn't really want or need this, he just wants the shortest path.

startVertex.both.loop(1){true}{it.object == endVertex}.paths[0]

To be safe, you can do a "max loop" so you don't go on forever (e.g. not a strongly connected graph).

startVertex.both.loop(1){it.loops < 4}{it.object == endVertex}.paths[0]
Take care,
Marko.

Reply all
Reply to author
Forward
0 new messages