OrientDB very slow performance on basic unique key lookups, website painfully slow

563 views
Skip to first unread message

Jean-Sebastien Lemay

unread,
Apr 29, 2016, 10:56:40 AM4/29/16
to OrientDB
Hi all,

I'm having severe problems with OrientDB 2.1.16 at the moment. The performance of my website, a social network, has become extremely slow in the past weeks. I've enabled performance checkpoints in my code and it seems even simple unique key lookups are costing a lot of time.

Basically, all my entities have an 'ID' field of type STRING which are each indexed by a UNIQUE_HASH_INDEX. Vertex and edge lookups should be extremely fast! But my logs indicate delays of 10ms to 20ms per single entity. I'm loading a lot of relationships and in the end, my main page takes about 15 seconds to load, which is WAY to long for any webpage.

I've tried rebuilding all my indexes, but no change. I also tried rebooting the server.

The database is hosted on a single node on an AWS server (t2.medium instance).

I don't know if this is relevant, but I also seemed to be receiving more ConcurrentModificationExceptions (which I handle by retrying the transaction) than usual. In fact, I've received quite a few while I was the only user on the entire website, which was strange.

I'm also running a demo version of my website, with a lot less data, and it's not as slow. So it seems to get exponentially slower the more data I have, although I really don't have that much data to begin with (about 5,000 vertices and 10,000 edges).

Can you guys please give me some pointers on how I can further debug this? The future of my website depends on it -- I can't go on with this kind of performance, and I barely have any users online to begin with.

Best regards,
Jean

scott molinari

unread,
Apr 29, 2016, 11:01:17 AM4/29/16
to OrientDB
What language driver are you using?

Scott

Jean-Sebastien Lemay

unread,
Apr 29, 2016, 11:14:08 AM4/29/16
to orient-...@googlegroups.com

I am using Java

On Apr 29, 2016 23:01, "'scott molinari' via OrientDB" <orient-...@googlegroups.com> wrote:
What language driver are you using?

Scott

--

---
You received this message because you are subscribed to a topic in the Google Groups "OrientDB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/orient-database/niaw8jlY1a0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

scott molinari

unread,
Apr 30, 2016, 1:48:09 AM4/30/16
to OrientDB
Can you give us some code examples of queries that are slow?

Scott

Jean-Sebastien Lemay

unread,
Apr 30, 2016, 4:01:46 AM4/30/16
to OrientDB
Sure!

For example, let me give you more details about my structure.
Each user has an "Account" vertex. Each Account vertex is linked to one "Social Profile" vertex. And from that "Social Profile" vertex we have all the edges to the user's friends. Each "Account" is identified by a unique "accountId", which is a STRING (it's actually a UUID string). "accountId" has a UNIQUE_HASH_INDEX.

[Account] --> (1 to 1) --> [Social Profile] --> (1 to many) friends --> [Social Profile]
[Account] --> (1 to 1) --> [Public Profile]

When the page loads, I'm loading the user's Social Profile, and then loading each friend's Public Profile details. Currently this seems to take almost 100ms although my account only has 6 friends!

Here's how I create a connection to the database. First I create a single connection pool:

this.graphFactory = new OrientGraphFactory(
parsedConfig.connString,
parsedConfig.username,
parsedConfig.password)
.setupPool(
parsedConfig.minConnPoolSize, 
parsedConfig.maxConnPoolSize);

And then I create a transaction:

this.oTx = this.graphFactory.getTx();

Here's how I get the user's Account vertex:

// Get Account vertex
return (OrientVertex)db.getVertices(
"Account.accountId", accountId)
.iterator().next();

And then I traverse to the linked Social Profile vertex (via edge)

// Get linked Social Profile vertex
return (OrientVertex)accountVtx.getVertices(
Direction.OUT,
EDGE_ACCOUNT_SOCIAL_PROFILE)
.iterator().next();

And finally I get all the friends' account IDs and friendship status.

// Retrieve account IDs of all friends, determine friendship status
for (Direction direction 
: new Direction[]{ Direction.IN, Direction.OUT }) {
// Retrieve the iterator for the current direction
Iterator<Edge> friendshipEdgeIterator = 
vtx.getEdges(
direction, 
EDGE_SOCIAL_PROFILE_FRIENDSHIP)
.iterator();

// For each result...
while (friendshipEdgeIterator.hasNext()) {
Edge friendshipEdge = friendshipEdgeIterator.next();
// Retrieve other social profile vertex
Vertex otherSocialProfileVtx = friendshipEdge.getVertex(
(direction == Direction.IN ?
Direction.OUT : Direction.IN ));
// Retrieve other account vertex
Vertex otherAccountVtx = this.traverseVertex(
otherSocialProfileVtx,
Direction.IN,
OrientDbPersistence.EDGE_ACCOUNT_SOCIAL_PROFILE);

String otherAccountId = 
otherAccountVtx.getProperty(PROP_ACCOUNT_ID);

// Is the friendship confirmed?
if (friendshipEdge.getPropertyKeys().contains(PROP_CONFIRMED)
&& (boolean)friendshipEdge.getProperty(PROP_CONFIRMED)) {
// Yes -- confirmed friendship.                                                
socialProfile.getFriendshipMap()
.addConfirmedFriendship(otherAccountId);
}
else {
// No -- pending friendship.
if (direction == Direction.IN) {
// Pending incoming friendship
socialProfile.getFriendshipMap()
.addPendingIncomingFriendship(otherAccountId);
}
else {
// Pending outgoing friendship
socialProfile.getFriendshipMap()
.addPendingOutgoingFriendship(otherAccountId);
}
}
}
}        

And finally, once I have all these IDs, I load each friend's public profile.
I get their Account vertex (same way as above, by accountId key), and traverse to their Public Profile vertex.

// Get linked Public Profile vertex
return (OrientVertex)accountVtx.getVertices(
Direction.OUT,
EDGE_ACCOUNT_PUBLIC_PROFILE)
.iterator().next();
            
Again, this seems to be taking way too long for a few lookups by key and edge traversals. This is compounded by the fact that my website is a single-page application (SPA), so I'm essentially loading all the user's friends and his/her channels in one go. All the 100ms delays add up to many seconds, etc. 

Can you find anything wrong with my implementation?
            

scott molinari

unread,
Apr 30, 2016, 8:32:59 AM4/30/16
to OrientDB
Great explanation. 

I am not at all a Java programmer and I am also learning ODB and graph databases, so I can only suggest what might be a problem or rather point to what I think might be a problem. So, take what I say with a grain of salt.

Is there a particular reason to split the accounts from the two profile types? If the relationships are always direct, I'd not waste edges on connecting them. I'd have the profiles as subdocuments of the account. That will speed the retrieval of the profile data a lot. 

For retrieving data, do you really need a transaction? Transactions are definitely more costly from a performance perspective. I mean, would it matter if a person sees partially old data in your system? I would imagine it isn't too critical. 

Also, from a data querying perspective, constantly hitting the database in a loop is going to be slower than finding a proper query, which gets you the data you need in one shot (in most cases). To be honest, I am not sure that is actually happening, but from the code, that seems like what is happening. Unfortunately, I can't help you on a better query. I am not that good at the Java API. 

Hopefully Luca or Luigi can offer some better tips and sorry I can't offer more of help.

Scott




Jean-Sebastien Lemay

unread,
Apr 30, 2016, 8:54:03 AM4/30/16
to OrientDB
Hi Scott,

Thanks for your input. The reason I'm splitting the accounts in multiple vertices is because my application is actually more complex than I'm portraying here. Each account has multiple profiles for very different aspects of my application, simply because it makes it easier to split the responsibilities instead of having a 'god' vertex which holds all the data. For instance, my users can manage relationships with a social profile, they can also participate in channels, so they have a channel profile where channels they own and moderate and connect to are linked, they can also participate in games, so they have a gaming profile, etc. The account vertex would be HUGE and would be linked by so many edges if I consolidated everything in one place. I figured a single edge traversal from the account to the appropriate profile should not add a ton of overhead in exchange for development simplicity.

For retrieving data, yes I do need a transaction because I cannot afford to read stale data. My application retrieves the 'state' for the user connecting via said transaction, and then registers for changes using a message queue system. As such, any changes to the state afterwards are no longer read from the database but through notifications from the message queue system instead. Can't have that if the initial state I receive is stale.

As for querying the database in a loop, I don't do that unless there's a concurrent access issue that requires a retry. I send the transaction once to retrieve the initial state and everything from there onwards is handled through a messaging layer, not a loop that constantly reads from the database.

I'm really looking forward to hear from Luca, Luigi or someone else because with this performance I absolutely cannot continue growing using OrientDB, and to be honest I really like the database from a developer's perspective.

Cheers,
Jean

scott molinari

unread,
Apr 30, 2016, 3:43:03 PM4/30/16
to OrientDB
Ok. Sounds like you are right about the profiles. I hope the ODB guys can help. I think I pulled enough of the right information out of you, so they can help. Sorry, I am not more help.

Scott

Luca Garulli

unread,
May 2, 2016, 2:40:12 AM5/2/16
to OrientDB
Are you using remote protocol, right? In this case all your filtering is happening at client side paying a lot in terms of latency. 

WDYT about executing a query/command that do all your job at server side?

Start with this query executed in graph.command( new OCommandSQL("<query>") );

SELECT friendship.confirmed, inV( friendship ) FROM
  SELECT out('social').outE('friend') as friendship FROM Account WHERE uuid = 'blablablablablablablablabla'
)


Best Regards,

Luca Garulli
Founder & CEO


On 30 April 2016 at 21:43, 'scott molinari' via OrientDB <orient-...@googlegroups.com> wrote:
Ok. Sounds like you are right about the profiles. I hope the ODB guys can help. I think I pulled enough of the right information out of you, so they can help. Sorry, I am not more help.

Scott

--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.

Jean-Sebastien Lemay

unread,
May 2, 2016, 3:08:51 AM5/2/16
to OrientDB
If client-side filtering is the issue, then I have another problem. I'll need to rewrite my entire back-end, and a lot of my operations have a lot of conditions, which makes it almost impossible.

For example, consider this operation: a user connects to a channel.
1) Register a new connection edge between user and channel
2) If number of connections between user and channel jumped from 0 to 1, user is now part of channel. If so, create an active user edge between user and channel as well.
3) If user became an active user and user is a moderator of the channel, the channel is now considered live (channel vertex update)
etc.

A lot of 'IFs' which are better suited to Java code than SQL code.

What would you suggest for this situation?

Best regards,
Jean

Emrul Islam

unread,
May 2, 2016, 6:14:55 AM5/2/16
to OrientDB
Have you looked at using server-side functions:


you can write them in Java and they execute within the OrientDb server.

Also consider using hooks (even more performant) : http://orientdb.com/docs/last/Java-Hooks.html

Luca Garulli

unread,
May 2, 2016, 7:11:26 AM5/2/16
to OrientDB
Jean,
WDYT about connecting to OrientDB in plocal mode?

Best Regards,

Luca Garulli
Founder & CEO


Jean-Sebastien Lemay

unread,
May 2, 2016, 7:26:52 AM5/2/16
to OrientDB
Hi Luca, the database isn't on the same server as the web server, and will never be, so unfortunately I'm afraid I can't use plocal mode.

And Enrul, I guess indeed I will have to look at server-side function. But that still implies a somewhat tedious re-write of my application, since currently my logic is with my Java code and I'm interacting with OrientVertex/OrientEdge object and making multiple calls in separate modules (each module is responsible for a specific type of entity for increased code coherence). Now I'd have to rewrite the logic in Javascript and reconsider my current architecture. 

I'm gonna have to rewrite whether it's server-side functions or if I switch to a different database, though. I'm just curious as to why it's taking so long to perform single-record fetching based on a unique key, possibly the most fundamental operation a database should be expected to perform quickly.

Best regards,
Jean

Luca Garulli

unread,
May 2, 2016, 8:25:30 AM5/2/16
to OrientDB
Hi Jean,
The reason is that in that piece of code you do a lot of operations:
  1. lookup of vertex
  2. crossing edges
  3. loading all the edges, one per time
  4. crossing to the vertices
  5. loading connected vertices, one per time
Over the network all these operation become expensive, because the latency. 

So if you can't move the AppServer next to the database (but remember you can still go in distributed even with plocal...), my suggestion is to implement a Server side JS function that returns what you need.

Or you could create a Server Plugin or better, Server Commands written in Java that talk with your app by using HTTP/JSON calls.



Best Regards,

Luca Garulli
Founder & CEO


Jean-Sebastien Lemay

unread,
May 2, 2016, 10:46:17 AM5/2/16
to OrientDB
So if I understand correctly, the reason it's slow is because there is a back-and-forth communication between my code and the server at every step of the way? I'm not sure if that's true, because my demo website (with a lot less data) responds much faster, even though the network latency should be the same (also hosted on AWS, database on separate server, etc.) Why would my demo website, using the exact same code, respond faster?

Also, I couldn't find anything about Server Commands in the doc? Can you give me a link?

And as for the complexity of my operations, well that's inevitable, that's why transactions are important. I'm building a social network with a lot of features so I cannot further simplify my operations to be honest.

Best regards,
Jean

pieter-gmail

unread,
May 2, 2016, 4:02:04 PM5/2/16
to orient-...@googlegroups.com
Hi,

I have this problem all the time. Not on orientdb at present but its
inevitable when the logic is far away from the data. So I'd say the
problem is indeed your architecture. The logic should be on top of the
db and its one of the benefits of java dbs. When running embedded it
outperforms any server based architecture. So move your logic to the db
server and have a courser grained interaction with the ui.

If you can not do that then you have to give up on the power of java
(graph, vertex, edge) based interaction as that is by nature very fine
grained with latency being a killer. Giving up means you'll have to
construct sql queries to lift as much data as possible with as few as
possible round trips. Its a pain and kills the joy of development for me
so I'd say reconsider your architecture to have the webserver and db
collocated.

Cheers

Pieter



On 02/05/2016 16:46, Jean-Sebastien Lemay wrote:
> So if I understand correctly, the reason it's slow is because there is
> a back-and-forth communication between my code and the server at every
> step of the way? I'm not sure if that's true, because my demo website
> (with a lot less data) responds much faster, even though the network
> latency should be the same (also hosted on AWS, database on separate
> server, etc.) Why would my demo website, using the exact same code,
> respond faster?
>
> Also, I couldn't find anything about Server Commands in the doc? Can
> you give me a link?
>
> And as for the complexity of my operations, well that's inevitable,
> that's why transactions are important. I'm building a social network
> with a lot of features so I cannot further simplify my operations to
> be honest.
>
> Best regards,
> Jean
>
>
>
>
> On Monday, May 2, 2016 at 8:25:30 PM UTC+8, l.garulli wrote:
>
> Hi Jean,
> The reason is that in that piece of code you do a lot of operations:
>
> 1. lookup of vertex
> 2. crossing edges
> 3. loading all the edges, one per time
> 4. crossing to the vertices
> 5. loading connected vertices, one per time
>
> Over the network all these operation become expensive, because the
> latency.
>
> So if you can't move the AppServer next to the database (but
> remember you can still go in distributed even with plocal...), my
> suggestion is to implement a Server side JS function that returns
> what you need.
>
> Or you could create a Server Plugin or better, Server Commands
> written in Java that talk with your app by using HTTP/JSON calls.
>
>
>
> Best Regards,
>
> Luca Garulli
> Founder & CEO
> OrientDB <http://orientdb.com/>
>
>
> On 2 May 2016 at 13:26, Jean-Sebastien Lemay
> OrientDB <http://orientdb.com/>
> OrientDB <http://orientdb.com/>
>
>
> On 30 April 2016 at 21:43, 'scott molinari'
> via OrientDB <orient-...@googlegroups.com> wrote:
>
> Ok. Sounds like you are right about the
> profiles. I hope the ODB guys can help. I
> think I pulled enough of the right
> information out of you, so they can help.
> Sorry, I am not more help.
>
> Scott
> --
>
> ---
> You received this message because you are
> subscribed to the Google Groups "OrientDB"
> group.
> To unsubscribe from this group and stop
> receiving emails from it, send an email to
> orient-databa...@googlegroups.com.
> For more options, visit
> https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
>
> ---
> You received this message because you are subscribed
> to the Google Groups "OrientDB" group.
> To unsubscribe from this group and stop receiving
> emails from it, send an email to
> orient-databa...@googlegroups.com.
> For more options, visit
> https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
>
> ---
> You received this message because you are subscribed to the
> Google Groups "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to orient-databa...@googlegroups.com
> <javascript:>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
>
> ---
> You received this message because you are subscribed to the Google
> Groups "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to orient-databa...@googlegroups.com
> <mailto:orient-databa...@googlegroups.com>.

Jean-Sebastien Lemay

unread,
May 10, 2016, 1:31:06 PM5/10/16
to orient-...@googlegroups.com
Thanks for the replies everyone. Ultimately what I need to do is come to terms with the reality that I need to rethink my data layer architecture.

As such, can someone tell me how to use OrientGraph on a unique key index with multiple values to get vertices in a SINGLE round-trip request, including vertices created within this transaction (as opposed to running a server-side SQL query)? My code below doesn't work since it seems to be taking my set of values as a single parameter (as if my index was on an EMBEDDEDLIST property instead of a STRING property)

// Input parameters
Set<String> channelIds = [set of channel IDs];

// Retrieve vertices
Iterator<Vertex> channelVtxIterator = db.getVertices(
OrientDbPersistence.KEY_CHANNEL_ID,
channelIds)
.iterator();

// Parse vertices
Map<String, Channel> channels = new HashMap<>();
while (channelVtxIterator.hasNext()) {
OrientVertex channelVtx = 
(OrientVertex)channelVtxIterator.next();
channels.put(
channelVtx.getProperty(PROP_CHANNEL_ID),
this.parseChannel(channelVtx));
}

Reply all
Reply to author
Forward
0 new messages