Delete one vertex freezes OrientDB for all users

65 views
Skip to first unread message

David Carr

unread,
Aug 26, 2015, 6:32:36 PM8/26/15
to OrientDB
Doing data cleanup on orientdb 2.1.0...

I issued console command "delete vertex #21:121048" and for the next 60 minutes any other (binary in our case) client connections receive this message: 

ERROR socket_read(): unable to read from socket [10060]: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

When the delete completed all clients could connect normally again. Orientdb server log files contain (repeatedly):

java.net.SocketException: Broken pipe
Error during WAL background flush 
java.lang.OutOfMemoryError: Java heap space

This is a real problem to take down an entire db host just to do data cleanup on one vertex. I'm not sure how many edges it had but presumably there must have been a lot to cause this big of a problem. Will make scheduling periodic maintenance a real burden for IT and hardship for end users.

Anybody else out there seeing this type of system behavior?

David Carr

unread,
Aug 26, 2015, 7:42:24 PM8/26/15
to OrientDB
Been down for 90 minutes now while this delete continues to run. I never know whether to wait it out or kill -9 the server process and deal with data corruption (usually have to restore from backup). Will wait it out a bit longer and see what happens.  

Andrey Lomakin

unread,
Aug 27, 2015, 3:03:56 AM8/27/15
to OrientDB
Hi,
Could you send:

1. Server log.
2. Heap dump, it is created in server installation directory once OOM is thrown.

--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Carr

unread,
Aug 27, 2015, 8:34:52 AM8/27/15
to orient-...@googlegroups.com
Update: The delete ran for more than 6 hours before I manually killed the connection via post to http://192.168.1.13:2480/connection/kill/87. Upon further inspection, this vertex had more than 75,000 edges (very unusual for our app) which I suspect is why the server ran out of memory.

Sorry the server log was automatically deleted and if the heap dump file is /usr/bin/orientdb-community-2.1.0/bin/java_pid1357.hprof then it's just under 2GB.

My takeaways from this experience are 

a) be very careful deleting things. If you delete an RID manually, make sure to interrogate the item (number of edges, etc) to determine potential impact before you issue the command. If you allow end users to delete RIDs via web interface or have scheduled processes that do it then you are completely hosed.

b) when delete query fails due to out of memory, it seems orientdb tries the delete over and over again never-ending (per the server log) which exacerbates the problem. If delete runs out of memory on the first attempt why would it succeed on the 500th attempt?

c) deleting one RID can definitely bring your whole database down for hours, affecting all end users by making their binary connections unusable. No known workaround.

d) if you can isolate the problem and figure out which exact connection ID is the culprit then doing a /connection/kill/<id> seems to resolve the situation without having to resort to a kill -9 at the operating system level and restoring the database from backup.



--

---
You received this message because you are subscribed to a topic in the Google Groups "OrientDB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/orient-database/7TjsyjMqpsk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to orient-databa...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
David Carr
801.231.4946
Reply all
Reply to author
Forward
0 new messages