Hi all,
I'm using neo4j 1.8.2, EmbeddedGraphDatabase and the native Java API.
I have been testing how my service behaves under high load by making concurrent requests that each create a user node, and an INVITED_BY relashionship to another user node and then deletes them both. I get a lot of DeadlockDetectedException and I'm looking for a way to avoid that.
I made a simple test case that demonstrates that:
PersonService.java
-------------------------------------------------------------------------
package org.neo4j.test.deadlock;
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Node;
import org.neo4j.graphdb.Relationship;
import org.neo4j.graphdb.Transaction;
import org.neo4j.graphdb.index.Index;
import org.neo4j.test.ImpermanentGraphDatabase;
public class PersonService {
private static final String USERNAME_KEY = "username";
private GraphDatabaseService graphDb;
private Index<Node> userIndex;
public void setupGraphDb() {
graphDb = new ImpermanentGraphDatabase();
Runtime.getRuntime().addShutdownHook( new Thread() {
@Override
public void run() {
graphDb.shutdown();
}
});
userIndex = graphDb.index().forNodes("user");
}
public Node getUser(String username) {
return userIndex.get(USERNAME_KEY, username).getSingle();
}
public void saveUser(String username, String invitedBy) {
Transaction tx = graphDb.beginTx();
try {
Node user = graphDb.createNode();
user.setProperty(USERNAME_KEY, username);
userIndex.add(user, USERNAME_KEY, username);
if(invitedBy != null) {
user.createRelationshipTo(getUser(invitedBy), RelTypes.INVITED);
}
tx.success();
} finally {
tx.finish();
}
}
public void deleteUser(String username) {
Transaction tx = graphDb.beginTx();
try {
Node user = getUser(username);
userIndex.remove(user);
for(Relationship rel: user.getRelationships()) {
rel.delete();
}
user.delete();
tx.success();
} finally {
tx.finish();
}
}
}
-------------------------------------------------------------------------
PersonServiceTest.java
-------------------------------------------------------------------------
public class PersonServiceTest {
private static PersonService service;
static final Logger log = LoggerFactory.getLogger(PersonServiceTest.class);
@BeforeClass
public static void setup() {
service = new PersonService();
service.setupGraphDb();
}
@Test
public void testConcurrentCreateAndDelete() throws InterruptedException, ExecutionException {
service.saveUser("inviter", null);
ExecutorService threadPool = Executors.newFixedThreadPool(10);
Callable<Boolean> callable = new Callable<Boolean>() {
public Boolean call() throws Exception {
String username = UUID.randomUUID().toString();
service.saveUser(username, "inviter");
service.deleteUser(username);
return true;
}
};
Set<Future<Boolean>>futures = new HashSet<Future<Boolean>>();
for(int i=0; i< 10; i++) {
Future<Boolean> future = threadPool.submit(callable);
futures.add(future);
}
threadPool.shutdown();
threadPool.awaitTermination(60, TimeUnit.SECONDS);
assertThat(threadPool.isTerminated(), is(true));
boolean success = true;
for(Future<Boolean> future: futures) {
try {
success = success && future.get();
log.info("Request executed successfuly.");
} catch(ExecutionException e) {
success = false;
log.error("ExecutionException: ", e);
}
}
assertThat(success, is(true));
}
}
The test project is attached.
The exception that I get looks like this:
A deadlock scenario has been detected and avoided. This means that two or more transactions, which were holding locks, were wanting to await locks held by one another, which would have resulted in a deadlock between these transactions. This exception was thrown instead of ending up in that deadlock.
Details: 'Transaction(14)[STATUS_ACTIVE,Resources=2] can't wait on resource RWLock[Relationship[7]] since => Transaction(14)[STATUS_ACTIVE,Resources=2] <-[:HELD_BY]- RWLock[Node[1]] <-[:WAITING_FOR]- Transaction(15)[STATUS_ACTIVE,Resources=1] <-[:HELD_BY]- RWLock[Relationship[7]]'.
at org.neo4j.kernel.impl.transaction.RagManager.checkWaitOnRecursive(RagManager.java:218) ~[neo4j-kernel-1.8.2.jar:1.8.2]
at org.neo4j.kernel.impl.transaction.RagManager.checkWaitOnRecursive(RagManager.java:246) ~[neo4j-kernel-1.8.2.jar:1.8.2]
at org.neo4j.kernel.impl.transaction.RagManager.checkWaitOn(RagManager.java:185) ~[neo4j-kernel-1.8.2.jar:1.8.2]
at org.neo4j.kernel.impl.transaction.RWLock.acquireWriteLock(RWLock.java:349) ~[neo4j-kernel-1.8.2.jar:1.8.2]
at org.neo4j.kernel.impl.transaction.LockManager.getWriteLock(LockManager.java:164) ~[neo4j-kernel-1.8.2.jar:1.8.2]
at org.neo4j.kernel.impl.transaction.LockManager.getWriteLock(LockManager.java:130) ~[neo4j-kernel-1.8.2.jar:1.8.2]
at org.neo4j.kernel.impl.nioneo.xa.WriteTransaction.getWriteLock(WriteTransaction.java:875) ~[neo4j-kernel-1.8.2.jar:1.8.2]
at org.neo4j.kernel.impl.nioneo.xa.WriteTransaction.disconnectRelationship(WriteTransaction.java:814) ~[neo4j-kernel-1.8.2.jar:1.8.2]
at org.neo4j.kernel.impl.nioneo.xa.WriteTransaction.relDelete(WriteTransaction.java:694) ~[neo4j-kernel-1.8.2.jar:1.8.2]
at org.neo4j.kernel.impl.persistence.PersistenceManager.relDelete(PersistenceManager.java:166) ~[neo4j-kernel-1.8.2.jar:1.8.2]
at org.neo4j.kernel.impl.core.NodeManager.deleteRelationship(NodeManager.java:1005) ~[neo4j-kernel-1.8.2.jar:1.8.2]
at org.neo4j.kernel.impl.core.RelationshipImpl.delete(RelationshipImpl.java:145) ~[neo4j-kernel-1.8.2.jar:1.8.2]
at org.neo4j.kernel.impl.core.RelationshipProxy.delete(RelationshipProxy.java:62) ~[neo4j-kernel-1.8.2.jar:1.8.2]
at org.neo4j.test.deadlock.PersonService.deleteUser(PersonService.java:56) ~[classes/:na]
at org.neo4j.test.deadlock.PersonServiceTest$1.call(PersonServiceTest.java:44) ~[test-classes/:na]
at org.neo4j.test.deadlock.PersonServiceTest$1.call(PersonServiceTest.java:39) ~[test-classes/:na]
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) ~[na:1.6.0_45]
at java.util.concurrent.FutureTask.run(FutureTask.java:138) ~[na:1.6.0_45]
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) ~[na:1.6.0_45]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) ~[na:1.6.0_45]
at java.lang.Thread.run(Thread.java:662) ~[na:1.6.0_45]
I've read the documentation, I don't really see a reason why a deadlock would happen in my test case. The only shared resource that needs to be locked is the inviting user node. Why would two threads lock the same relashionship causing a deadlock?
Any help understanding the problem and possible workarounds would be greatly appreciated.
Best Regards,
Dorin