100% cpu on one cpu while performaing cypher queries using py2neo

874 views
Skip to first unread message

Lokesh Gidra

unread,
Dec 22, 2013, 8:06:37 AM12/22/13
to ne...@googlegroups.com
Hello,

I am running a neo4j-2.0.0 server on a linux machine with 48-cores. I run a python script on another machine. The script uses multiple threads to perform multiple shortestPath queries to the server. I am using py2neo package in the python script. In the beginning the queries are processed fine. I can see multiple cpus being used by neo4j server in the "top" output. But suddenly, the server gets into a serial phase where only 1 cpu is used 100%. During this time, the python also doesn't make any progress.

I am certain that the script is not faulty as sometimes this serial phase begins AFTER processing all the queries sent by the script, but BEFORE the script exits.

Can anyone please suggest me what causes this behaviour. And what can be done to avoid it.


Regards,
Lokesh

Michael Hunger

unread,
Dec 22, 2013, 12:48:52 PM12/22/13
to ne...@googlegroups.com
Are these queries only reading or reading and writing?

can you produce a thread dump of you neo4j server when that happens ?

either send a kill -3 <pid>
or use jstack <pid>

and send us the thread-dump and the content of your logfiles (data/log/* and data/graph.db/messages.log)

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Lokesh Gidra

unread,
Dec 22, 2013, 3:07:08 PM12/22/13
to ne...@googlegroups.com
All the queries are read requests. I am just trying to performance test the server. So I am using only shortestPath queries.

Please find attached the files. I have compressed the log dir as one of the file was pretty big.


Thanks,
Lokesh
jstack.output
messages.log
log.tar.bz2

Lokesh Gidra

unread,
Dec 22, 2013, 6:36:12 PM12/22/13
to ne...@googlegroups.com
I think the following thread is the issue. Can anyone please suggest what can I do to get rid of it?

 
    "qtp417900728-247" prio=10 tid=0x0000000000da1800 nid=0x3de2 runnable [0x00007dba38d51000]
   java.lang.Thread.State: RUNNABLE
        at org.neo4j.kernel.impl.cache.AutoLoadingCache.get(AutoLoadingCache.java:69)
        at org.neo4j.kernel.impl.core.NodeManager.getRelationshipForProxy(NodeManager.java:544)
        at org.neo4j.kernel.InternalAbstractGraphDatabase$6.lookupRelationship(InternalAbstractGraphDatabase.java:831)
        at org.neo4j.kernel.impl.core.RelationshipProxy.getOtherNode(RelationshipProxy.java:108)
        at org.neo4j.graphalgo.impl.path.ShortestPath$DirectionData.fetchNextOrNull(ShortestPath.java:336)
        at org.neo4j.graphalgo.impl.path.ShortestPath$DirectionData.fetchNextOrNull(ShortestPath.java:260)
        at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:55)
        at org.neo4j.graphalgo.impl.path.ShortestPath.internalPaths(ShortestPath.java:156)
        at org.neo4j.graphalgo.impl.path.ShortestPath.findSinglePath(ShortestPath.java:132)
        at org.neo4j.cypher.internal.compiler.v2_0.commands.expressions.SingleShortestPathStrategy.findResult(ShortestPathExpression.scala:93)
        at org.neo4j.cypher.internal.compiler.v2_0.commands.expressions.SingleShortestPathStrategy.findResult(ShortestPathExpression.scala:90)
        at org.neo4j.cypher.internal.compiler.v2_0.commands.expressions.ShortestPathExpression.getMatches(ShortestPathExpression.scala:48)
        at org.neo4j.cypher.internal.compiler.v2_0.commands.expressions.ShortestPathExpression.apply(ShortestPathExpression.scala:41)
        at org.neo4j.cypher.internal.compiler.v2_0.pipes.ShortestPathPipe$$anonfun$internalCreateResults$1.apply(ShortestPathPipe.scala:37)
        at org.neo4j.cypher.internal.compiler.v2_0.pipes.ShortestPathPipe$$anonfun$internalCreateResults$1.apply(ShortestPathPipe.scala:36)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at org.neo4j.cypher.internal.compiler.v2_0.ClosingIterator$$anonfun$hasNext$1.apply$mcZ$sp(ClosingIterator.scala:38)
        at org.neo4j.cypher.internal.compiler.v2_0.ClosingIterator$$anonfun$hasNext$1.apply(ClosingIterator.scala:37)
        at org.neo4j.cypher.internal.compiler.v2_0.ClosingIterator$$anonfun$hasNext$1.apply(ClosingIterator.scala:37)
        at org.neo4j.cypher.internal.compiler.v2_0.ClosingIterator.failIfThrows(ClosingIterator.scala:91)
        at org.neo4j.cypher.internal.compiler.v2_0.ClosingIterator.hasNext(ClosingIterator.scala:37)
        at org.neo4j.cypher.internal.compiler.v2_0.PipeExecutionResult.hasNext(PipeExecutionResult.scala:166)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29)
        at org.neo4j.cypher.internal.compiler.v2_0.PipeExecutionResult$$anon$1.hasNext(PipeExecutionResult.scala:74)
        at org.neo4j.helpers.collection.ExceptionHandlingIterable$1.hasNext(ExceptionHandlingIterable.java:44)
        at org.neo4j.helpers.collection.IteratorWrapper.hasNext(IteratorWrapper.java:42)
        at org.neo4j.server.rest.repr.ListRepresentation.serialize(ListRepresentation.java:62)
        at org.neo4j.server.rest.repr.Serializer.serialize(Serializer.java:75)
        at org.neo4j.server.rest.repr.MappingSerializer.putList(MappingSerializer.java:61)
        at org.neo4j.server.rest.repr.CypherResultRepresentation.serialize(CypherResultRepresentation.java:83)
        at org.neo4j.server.rest.repr.MappingRepresentation.serialize(MappingRepresentation.java:41)
        at org.neo4j.server.rest.repr.OutputFormat.assemble(OutputFormat.java:215)
        at org.neo4j.server.rest.repr.OutputFormat.formatRepresentation(OutputFormat.java:147)
        at org.neo4j.server.rest.repr.OutputFormat.response(OutputFormat.java:130)
        at org.neo4j.server.rest.repr.OutputFormat.ok(OutputFormat.java:67)
        at org.neo4j.server.rest.web.CypherService.cypher(CypherService.java:101)
        at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
        at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethod
DispatchProvider.java:205)
        at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
        at org.neo4j.server.rest.transactional.TransactionalRequestDispatcher.dispatch(TransactionalRequestDispatcher.java:132)
        at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
        at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
        at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
        at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
        at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
        at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
        at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:698)
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1506)
        at org.neo4j.server.rest.security.SecurityFilter.doFilter(SecurityFilter.java:112)
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1477)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:211)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1096)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:432)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1030)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136)
        at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
        at org.eclipse.jetty.server.Server.handle(Server.java:445)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:268)

       at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:229)
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:601)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532)
        at java.lang.Thread.run(Thread.java:722)

Michael Hunger

unread,
Dec 22, 2013, 6:39:14 PM12/22/13
to ne...@googlegroups.com
Lokesh,

nice machine that you have there.

I think you might have run into a long Garbage Collector pause, as you run Neo4j with 65GB heap it takes a while to do the full-stop garbage collection.

I would run Neo4j with at most 16GB heap here and rather configure the mmio settings in neo4j.properties to conform to your (expected) store sizes, esp. for nodestore and percentage-wise for rel-store and the property-stores.

013-12-22 16:45:25.635+0000 INFO  [o.n.k.i.DiagnosticsManager]: Total Physical memory: 251.78 GB
2013-12-22 16:45:25.635+0000 INFO  [o.n.k.i.DiagnosticsManager]: Free Physical memory: 198.02 GB
2013-12-22 16:45:25.635+0000 INFO  [o.n.k.i.DiagnosticsManager]: Committed virtual memory: 734.69 GB
2013-12-22 16:45:25.635+0000 INFO  [o.n.k.i.DiagnosticsManager]: Total swap space: 59.60 GB
2013-12-22 16:45:25.635+0000 INFO  [o.n.k.i.DiagnosticsManager]: Free swap space: 59.60 GB
2013-12-22 16:45:25.636+0000 INFO  [o.n.k.i.DiagnosticsManager]: JVM memory information:
2013-12-22 16:45:25.636+0000 INFO  [o.n.k.i.DiagnosticsManager]: Free  memory: 60.69 GB
2013-12-22 16:45:25.636+0000 INFO  [o.n.k.i.DiagnosticsManager]: Total memory: 61.33 GB
2013-12-22 16:45:25.636+0000 INFO  [o.n.k.i.DiagnosticsManager]: Max   memory: 80.00 GB
2013-12-22 16:45:25.636+0000 INFO  [o.n.k.i.DiagnosticsManager]: Garbage Collector: PS Scavenge: [PS Eden Space, PS Survivor Space]
2013-12-22 16:45:25.636+0000 INFO  [o.n.k.i.DiagnosticsManager]: Garbage Collector: PS MarkSweep: [PS Eden Space, PS Survivor Space, PS Old Gen, PS Perm Gen]
2013-12-22 16:45:25.637+0000 INFO  [o.n.k.i.DiagnosticsManager]: Memory Pool: Code Cache (Non-heap memory): committed=2.44 MB, used=912.69 kB, max=48.00 MB, threshold=0.00 B
2013-12-22 16:45:25.637+0000 INFO  [o.n.k.i.DiagnosticsManager]: Memory Pool: PS Eden Space (Heap memory): committed=16.00 GB, used=655.36 MB, max=16.00 GB, threshold=?
2013-12-22 16:45:25.638+0000 INFO  [o.n.k.i.DiagnosticsManager]: Memory Pool: PS Survivor Space (Heap memory): committed=2.67 GB, used=0.00 B, max=2.67 GB, threshold=?
2013-12-22 16:45:25.638+0000 INFO  [o.n.k.i.DiagnosticsManager]: Memory Pool: PS Old Gen (Heap memory): committed=42.67 GB, used=0.00 B, max=384.00 GB, threshold=0.00 B
2013-12-22 16:45:25.638+0000 INFO  [o.n.k.i.DiagnosticsManager]: Memory Pool: PS Perm Gen (Non-heap memory): committed=66.59 MB, used=18.02 MB, max=512.00 MB, threshold=0.00 B

You've got quite a lot of VM GC arguments, where did you get them from?

2013-12-22 16:45:25.640+0000 INFO  [o.n.k.i.DiagnosticsManager]: VM Arguments: [-XX:+DisableExplicitGC, -Dorg.neo4j.server.properties=conf/neo4j-server.properties, -Djava.util.logging.config.file=conf/logging.properties, -Dlog4j.configuration=file:conf/log4j.properties, -XX:ParallelGCThreads=48, -XX:+UseParallelOldGC, -XX:+UseNUMA, -XX:-UseAdaptiveNUMAChunkSizing, -XX:+UseAdaptiveSizePolicy, -XX:+BindGCTaskThreadsToCPUs, -XX:+UseGCTaskAffinity, -XX:-UseLargePages, -XX:-UseCompressedOops, -XX:-ParallelRefProcEnabled, -XX:MaxPermSize=512m, -Xms65536m, -Xmx65536m, -Dneo4j.home=/home/lokesh/code/neo4j-community-2.0.0, -Dneo4j.instance=/home/lokesh/code/neo4j-community-2.0.0, -Dfile.encoding=UTF-8]

I would probably go with just CMS for the time being.

I continue to investigate but cannot promise too much over the holidays.

Michael

<jstack.output><messages.log><log.tar.bz2>

Lokesh Gidra

unread,
Dec 22, 2013, 6:48:06 PM12/22/13
to ne...@googlegroups.com
I am pretty sure that the the thread whose stack trace is sent in the previous post is the culprit. Can you please have a look at it and suggest me future course of action. I am sure that it is not GC.

Michael Hunger

unread,
Dec 22, 2013, 7:20:28 PM12/22/13
to ne...@googlegroups.com
I rather think this is the culprit, as there are 145 threads stuck (BLOCKED) in the same place

"qtp911690433-360" prio=10 tid=0x0000000002605800 nid=0x2fbd runnable [0x00007da10e3e1000]
   java.lang.Thread.State: RUNNABLE
at org.neo4j.kernel.impl.util.ArrayMap.synchronizedGet(ArrayMap.java:218)
- locked <0x00007e5e513d6f78> (a org.neo4j.kernel.impl.util.ArrayMap)
at org.neo4j.kernel.impl.util.ArrayMap.get(ArrayMap.java:198)
at org.neo4j.kernel.impl.persistence.PersistenceManager.getResource(PersistenceManager.java:216)
at org.neo4j.kernel.impl.persistence.PersistenceManager.currentKernelTransaction(PersistenceManager.java:84)
at org.neo4j.kernel.impl.core.ThreadToStatementContextBridge.assertInTransaction(ThreadToStatementContextBridge.java:79)
at org.neo4j.kernel.impl.core.RelationshipProxy.assertInTransaction(RelationshipProxy.java:357)
at org.neo4j.kernel.impl.core.RelationshipProxy.getOtherNode(RelationshipProxy.java:107)
at org.neo4j.graphalgo.impl.path.ShortestPath$DirectionData.fetchNextOrNull(ShortestPath.java:336)
at org.neo4j.graphalgo.impl.path.ShortestPath$DirectionData.fetchNextOrNull(ShortestPath.java:260)

Without an implementation change, which I think is due, I have right now no solution for you. Except manually assigning fewer threads to the neo4j server

currently you have 465 or so threads, you could try to run your measurements with, e.g.

org.neo4j.server.webserver.maxthreads=140


HTH

Michael

Mattias Persson

unread,
Jan 2, 2014, 1:19:26 PM1/2/14
to Neo4j Development
FYI https://github.com/neo4j/neo4j/pull/1780 should help some there.


2013/12/23 Michael Hunger <michael...@neopersistence.com>



--
Mattias Persson, [mat...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
Reply all
Reply to author
Forward
0 new messages