Neo4j REST service stops working

458 views
Skip to first unread message

Khushbu Bhatewara

unread,
Oct 3, 2012, 5:53:49 AM10/3/12
to ne...@googlegroups.com
Hi,

We are facing a very weird issue in Neo4j REST service hosted on windows azure. The service becomes inaccessible through code and webadmin console. On analyzing messages.log we found nothing and the most strange thing was GC monitor log was being logged in messages.logs even when service was not accessible. The service was in started state.

When nothing was found, we just restarted the service and it is accessible now. But we need to find the root cause analysis for why this happened. Is there any other logs/traces  where we can actually find why the service behaved abnormally?

- Khushbu
 

Peter Neubauer

unread,
Oct 3, 2012, 11:20:57 AM10/3/12
to ne...@googlegroups.com
I think you should connect JConsole to it in order to see what is
happening on the JVM via JMX.

Have you been running heavy load or big returns on the server when
this happened? Try to use the streaming REST format, see
http://docs.neo4j.org/chunked/snapshot/rest-api-streaming.html

Cheers,

/peter neubauer

G: neubauer.peter
S: peter.neubauer
P: +46 704 106975
L: http://www.linkedin.com/in/neubauer
T: @peterneubauer

Wanna learn something new? Come to http://graphconnect.com
> --
>
>

Khushbu Bhatewara

unread,
Apr 11, 2013, 7:10:35 AM4/11/13
to ne...@googlegroups.com
Hi Peter, 

After a long time we faced the issue again. We ran threadprofiler of newrelic to find the cause for the same. And found some thread locking issue on an object.

Following is the screenshot of the same. Please suggest what can be done to overcome the locking behavior or is there any way through which we can drill down the goovy script which caused the issue.

Regards,
Khushbu

Peter Neubauer

unread,
Apr 11, 2013, 7:15:33 AM4/11/13
to Neo4j User
Khushbu,
can't see the screenshot, can you maybe put it online somewhere, since it might get filtered out on this list?

/peter


Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

The authoritative book on graph databases - http://graphdatabases.com
Neo4j questions? Please use SO - http://stackoverflow.com/search?q=neo4j


--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Khushbu Bhatewara

unread,
Apr 11, 2013, 10:47:25 AM4/11/13
to ne...@googlegroups.com
Hi Peter,

PFA the sceenshot.

Regards,
Khushbu

On Wednesday, 3 October 2012 15:23:49 UTC+5:30, Khushbu Bhatewara wrote:
Neo4j hang.png

Khushbu Bhatewara

unread,
Apr 19, 2013, 1:16:15 AM4/19/13
to ne...@googlegroups.com
Any findings?


On Wednesday, 3 October 2012 15:23:49 UTC+5:30, Khushbu Bhatewara wrote:

Peter Neubauer

unread,
Apr 19, 2013, 6:17:01 AM4/19/13
to Neo4j User
Khushbu,
what kind of queries are you running?


Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

The authoritative book on graph databases - http://graphdatabases.com
Neo4j questions? Please use SO - http://stackoverflow.com/search?q=neo4j


--

Khushbu Bhatewara

unread,
Apr 19, 2013, 9:17:57 AM4/19/13
to ne...@googlegroups.com
We are running gremlin queries for get query. As well as, we are using groovy scripts for transactions - creation and updation.


On Wednesday, 3 October 2012 15:23:49 UTC+5:30, Khushbu Bhatewara wrote:

Peter Neubauer

unread,
Apr 19, 2013, 9:49:34 AM4/19/13
to Neo4j User
Mmh,
I wonder if you maybe are starting a transaction that spans the whole database or so, and not committing it so that there is a transaction lock hanging?

/peter


Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

The authoritative book on graph databases - http://graphdatabases.com
Neo4j questions? Please use SO - http://stackoverflow.com/search?q=neo4j


--

Khushbu Bhatewara

unread,
Apr 19, 2013, 11:36:28 AM4/19/13
to ne...@googlegroups.com
We are commiting transaction in case of success, in case of failure transaction rollbacks. Please suggest if following query has any problems.

Sample query looks like:

import org.neo4j.graphdb.*;
neo4j = g.getRawGraph();
idxManager = neo4j.index();
msgIdxObj = idxManager.forNodes('Index_MessageId');
msgNode = msgIdxObj.get('MessageId',p0).getSingle();
if(!msgNode.equals(null))
{
try 
{
tx = neo4j.beginTx();
userLockObj=null;
        Relationship messageOutRel = msgNode.getSingleRelationship(DynamicRelationshipType.withName('MESSAGE'),Direction.OUTGOING);
        if(!messageOutRel.equals(null))
{
Node OutEndNode =messageOutRel.getEndNode();
if(!OutEndNode.hasProperty('Recepients'))
{
Lock userLockObj = tx.acquireWriteLock(OutEndNode);
};
};
        Relationship messageOutRelationship = msgNode.getSingleRelationship(DynamicRelationshipType.withName('MESSAGE'),Direction.OUTGOING);
        Lock lockObj = tx.acquireWriteLock(msgNode);
if(!messageOutRelationship.equals(null))
{
            Relationship messageInRelationship = msgNode.getSingleRelationship(DynamicRelationshipType.withName('MESSAGE'),Direction.INCOMING);
            if(!messageInRelationship.equals(null))
{
Node InStartNode =messageInRelationship.getStartNode();
Node OutEndNode =messageOutRelationship.getEndNode();
InStartNode.createRelationshipTo(OutEndNode,DynamicRelationshipType.withName('MESSAGE'));
messageInRelationship.delete();
};
messageOutRelationship.delete();
};    
        msgNode.delete();
lockObj.release();
if(!userLockObj.equals(null))
{
userLockObj.release();
};
tx.success();
}
catch(Exception e) 
{
tx.failure();
throw e;
}
finally 
{
tx.finish();
}};


On Wednesday, 3 October 2012 15:23:49 UTC+5:30, Khushbu Bhatewara wrote:

vyadav

unread,
Apr 24, 2013, 2:21:13 AM4/24/13
to ne...@googlegroups.com
We are also facing similar issue with Neo4j. Is there any solution to the problem?

We are using transactions to create relation with node. Committing and rolling back transactions properly using try catch. But if we put load on Neo4j and multiple nodes try to attachd with same node then transactions are waiting to aquire lock so that they can take lock and create relationship with same node... Neo4j is going in hang state if there is long queue pending... Ideally it should process queue. It can take time as other requests are getting processed... but unfortunately it going in hang state...

We have to restart Neo4j to bring it up.....


-Vikas Y

Michael Hunger

unread,
Apr 24, 2013, 2:45:56 AM4/24/13
to ne...@googlegroups.com, ne...@googlegroups.com
Which neo4j versions are you using?

Did you change any config?

Could you share the code which causes it to hang?


Thanks a lot!


Michael

Sent from mobile device
--

vyadav

unread,
Apr 24, 2013, 2:51:08 AM4/24/13
to ne...@googlegroups.com
Neo4j : 1.7
Yes, we increased the thread count per core to 25.
Code is something similar to snapshot which Khushbu shared.



-Vikas Y

On Wednesday, October 3, 2012 3:23:49 PM UTC+5:30, Khushbu Bhatewara wrote:

Khushbu Bhatewara

unread,
Apr 24, 2013, 2:55:56 AM4/24/13
to ne...@googlegroups.com
Hi,

I also made changes in thread configuration. Also, when neo4j hangs, if REST URL of Neo4j is requested , it responds with 502 HTTP status code.

The messages.log does not show any error. Also java.exe is running with optimal memory and CPU is at 100%.

But as soon as Neo4j is restarted it starts working. It is becoming a mystery as no error is found at Neo4j machine. We also tried profiling but did not found anything significant for help.

Regards,
Khushbu

Michael Hunger

unread,
Apr 24, 2013, 3:30:35 AM4/24/13
to ne...@googlegroups.com
How many threads / parallel requests are running against this node update?

And can you produce a thread-dump of the server when that happens?

kill -3 pid 

or 
jstack pid

Thanks

Michael

Khushbu Bhatewara

unread,
Apr 24, 2013, 4:36:26 AM4/24/13
to ne...@googlegroups.com
Around 10 parallel request coming for any update request similar to mentioned below. 

We just faced the same issue again and took the thread profiler result from newrelic. PFA the screenshot.
Also, after restarting neo4j, messages.log has non-clean shutdown entry with "Internal recovery completed, scanned 118549 log entries. Recovered 6571 transactions".In neo4j admin console, found some blank nodes created at the end.

Does it indicate any problem? 

Regards,
Khushbu
neo4j hang.png

Khushbu Bhatewara

unread,
Apr 25, 2013, 1:36:46 AM4/25/13
to ne...@googlegroups.com
Any suggestions?


On Wednesday, 3 October 2012 15:23:49 UTC+5:30, Khushbu Bhatewara wrote:

Michael Hunger

unread,
Apr 25, 2013, 1:57:00 AM4/25/13
to ne...@googlegroups.com
I would still love to see the JVM thread-dump when the service is in that state.

kill -3 pid
or
jstack pid

Cheers

Michael

Khushbu Bhatewara

unread,
Apr 25, 2013, 5:33:21 AM4/25/13
to ne...@googlegroups.com
Hi Michael,

Neo4j got in hang state again. PFA threaddump of the process.

-Khushbu


On Wednesday, 3 October 2012 15:23:49 UTC+5:30, Khushbu Bhatewara wrote:
threaddump.log

Khushbu Bhatewara

unread,
Apr 25, 2013, 11:46:11 PM4/25/13
to ne...@googlegroups.com
Any suggestions?


On Wednesday, 3 October 2012 15:23:49 UTC+5:30, Khushbu Bhatewara wrote:

Khushbu Bhatewara

unread,
Apr 26, 2013, 2:07:57 AM4/26/13
to ne...@googlegroups.com
Today again neo4j hanged. PFA another threaddump.

-Khushbu


On Wednesday, 3 October 2012 15:23:49 UTC+5:30, Khushbu Bhatewara wrote:
threaddump1.log

Khushbu Bhatewara

unread,
Apr 26, 2013, 4:55:06 AM4/26/13
to ne...@googlegroups.com
This dump was taken while load testing for different functionality wherein NEO4J Rest batch request is executed for creating node and adding index to it. This operation does not use any gremlin query.


On Wednesday, 3 October 2012 15:23:49 UTC+5:30, Khushbu Bhatewara wrote:

Michael Hunger

unread,
Apr 26, 2013, 9:35:57 AM4/26/13
to ne...@googlegroups.com
Thanks so much for those logs. Sorry for the late reply I'm at a conference all day up until Saturday but I immediately report it to the team.

Michael

Michael Hunger

unread,
Apr 26, 2013, 9:46:45 AM4/26/13
to ne...@googlegroups.com
Can you please read thouroughly through your code?

From a quick look you are acquiring locks which you never release (b/c those objects only live in that scope) and locks that are not really released will not get released or gc'ed.

This object only lives in this block and still holds a write lock.

You should also release your lock in finally blocks.

And certainly much more.

Michael

if(!OutEndNode.hasProperty('Recepients'))
{
Lock userLockObj = tx.acquireWriteLock(OutEndNode);
};

Khushbu Bhatewara

unread,
Apr 26, 2013, 9:49:38 AM4/26/13
to ne...@googlegroups.com

Thanks Michael. Looking forward for your response.

-Khushbu

You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/Es4tnl6F0o0/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.

Khushbu Bhatewara

unread,
Apr 26, 2013, 12:36:08 PM4/26/13
to ne...@googlegroups.com

Hi Michael,

As per the documentation for "acquireWriteLock" in neo4j, all locks  held if not released manually, get released when transaction finishes.

http://api.neo4j.org/current/org/neo4j/graphdb/Transaction.html

Please let me know if there is any gap in understanding. Also, today while load testing  neo4j with functionality using Rest batch operation(no gremlin/groovy), Neo4j got hanged again. Thread dump for this case is attached in previous email.

Please suggest.

-Khushbu

Mattias Persson

unread,
Apr 28, 2013, 3:09:21 PM4/28/13
to Neo4j Development
Yup, acquiring a write lock on the transaction with acquireWriteLock requires you to do one of the following:

1) Release the returned lock manually, or
2) Make sure tx.finish() is called

If you find yourself doing something other than that there might be locks/transactions kept alive in the database preventing other locks to be acquired for those same resources.


2013/4/26 Khushbu Bhatewara <khushbu....@gmail.com>



--
Mattias Persson, [mat...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com

Michael Hunger

unread,
Apr 28, 2013, 6:36:20 PM4/28/13
to ne...@googlegroups.com
So please go over your groovy/gremlin code and make sure that either tx.finish() is called in finally blocks everywhere and/or the lock is released.

Are you sure your code is working correctly?

tx is not declared before the try, so it shouldn't be accessible in the finally.

userLockObj is declared in this block, so I doubt that it will be accessible in the place where you check against it.

if(!OutEndNode.hasProperty('Recepients'))
{
Lock userLockObj = tx.acquireWriteLock(OutEndNode);
};

if(!userLockObj.equals(null))
{
userLockObj.release();
};
Michael

Khushbu Bhatewara

unread,
Apr 29, 2013, 1:49:27 AM4/29/13
to ne...@googlegroups.com
Hi Michael,

Thanks for the response. Yes, the transaction is declared inside try block, thus should not be accessible in finally block. But while executing the query, no exception is raised. It should raise an exception when "tx" variable is not declared. Or it does not raise any exception from finally block?

-Khushbu
Reply all
Reply to author
Forward
0 new messages