Orient DB Server unreachable frequently - Urgent

831 views
Skip to first unread message

Ram Karthik

unread,
Aug 21, 2019, 8:57:17 AM8/21/19
to OrientDB
We are using OrientDB ver 2.0.18, and we are facing a critical issue for the past 5 days. The following issues we are facing
  1. Orient DB server is unreachable frequently
  2. We cannot able to shut down the server. We are forced to kill the DB
  3. Sometimes the Index gets crashed. 
The above issues occur when we open the traffic to use our application. 

This is a very critical issue, many users are unable to use the application due to this issue. We depend on the OrientDB, due to this we are facing many issues. 

Please help us to resolve this issue soon. 

Thanks,
Ram

Luigi Dell'Aquila

unread,
Aug 21, 2019, 9:28:17 AM8/21/19
to orient-...@googlegroups.com
Hi Ram

It's hard to give you a quick solution with so few information.
V 2.0 is EOL so we will hardly release a community patch, but we can try to troubleshoot the problem and see if we can work around it.
Can you provide a bit more information, eg. server logs, typical workload, some information about DB size and schema...

Thanks

Luigi


--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/orient-database/80c7ea2d-4214-432a-9679-5e09a4a1cc99%40googlegroups.com.
Message has been deleted

Ram Karthik

unread,
Aug 21, 2019, 12:35:11 PM8/21/19
to orient-...@googlegroups.com
To add more details the problem,

We have more than 500+ queries/API built on 200+ vertex based schema. All the query were written optimally with right indexes so that at any normal condition the response time of the queries will be under 50 ms. Most of the query response in less than 20 ms. You can refer the screenshot shared before.

All of the sudden one of these query freeze at the database indefinitely and all the subsequent queries fired from application also start to freeze indefinitely. This leads to an increase in concurrent connections to the database, with none of the query responding back. This leads to the maximum connection limit at the database level and the database stop accepting new connections. Looking at the database, the CPU, Memory remains stable. There is a very slight increase in CPU (due to too high concurrent connection). This indicates the query is not executed in the database and are waiting for resource/lock.

To bring the server back to normal, we have to stop the database (thus kill the connections), bounce back again to access. This happens very frequently and sometime during restart the index crashes. So we have to restore the database from backup.

We log every query being executed. After bouncing the server, we tried to run the frozen queries (same query with same parameter), they executed normally as usual and responded in usual latency (10 - 20 ms). We tried running all the queries (first query, some random query from all frozen query set), all executed as expected.

When the database goes to freeze mode, even simple query that supposes pick single record by primary Id also freezes. We have no clue why the database goes to freeze state all of sudden.

We have been using OrientDb for last 5 years and never faced such a situation.

We tried passing timeout argument along with all the read query (with timeout as 5000 ms), we reduced record.locktimeout, network level various timeout to lower the number, session time out, connection timeout, etc. None of them helped. The queries are not timing out. The connection breaks and application is getting SocketTimeoutException, but connection/query seems to be staying in frozen/lock state in the database side and not allowing the new connection.

We tried to kill the connection using Command "Kill", "interrupt", both have failed, the command just hangs in waiting to get the response from the server for the first connections.


We are currently rebuilding the index for the entire database on one go as last resort.

We are a startup, built the entire product using OrientDB. Due to this, our service is down for the last 5 days and we are losing our customer trust and we are having big crisis.

Help us identify the root cause and overcome the issue.

Regards,
Ram







On Wed, Aug 21, 2019 at 7:36 PM Ram Karthik <ramkart...@gmail.com> wrote:
Hi  Luigi,

Thanks for your reply.

Database size 40GB

Typical workload  - 50 TPS 

Added Server logs and sample schema below
we have around 200+ schema

Thanks in advance.


You received this message because you are subscribed to a topic in the Google Groups "OrientDB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/orient-database/fmo0WKnfXUc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to orient-databa...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/orient-database/CAFZLH8kGV%3Di1O1BWhtmSG4YKKXLW4x6DXEu88N%2B55QJf19opkA%40mail.gmail.com.

Luigi Dell'Aquila

unread,
Aug 22, 2019, 3:16:22 AM8/22/19
to orient-...@googlegroups.com
Hi Ram,

Thank you very much for the detailed information.
The only strange thing I see in the logs is the following

--> com.orientechnologies.orient.core.exception.OCommandExecutionException: Class 'NULL' was not found in current database [ONetworkProtocolHttpDb]
2019-08-21 12:03:39:675 SEVERE Internal server error:
com.orientechnologies.orient.core.exception.OQueryParsingException: Error on parsing query at position #5: Error on parsing query
Query:  null timeout 5000
----------^


but it seems more like a wrong query than a problem with the server, so it's unlikely to be the reason for your problem.

There is one thing that could give us more information on what's actually happening: could you please take a thread dump when the server is stuck?

Thanks

Luigi

Reply all
Reply to author
Forward
0 new messages