New to OrientDB: Object DB, Queries, Sorting and Performance

277 views
Skip to first unread message

Zapp El

unread,
Jul 28, 2015, 1:13:06 PM7/28/15
to orient-...@googlegroups.com
Hello Community, 
hello OrientDB developers!

I work in public services, and currently we're evaluating different technologies (OrientDB, Hibernate/postgres, Fedora4 and neo4j) in order to find the best possible backend-solution for our data. 
Over the course of the last month we developed a fairly straight-forward Java-Class-Model that we like to use regardless of the underlying technology. 

In future applications we're going to have more than 1,5 Mill. objects to persist, manage and retrieve.

Handling Java-Objects directly seems so much more intuitive and flexible instead of mapping them with JPA, so we were eager to try something new, like for example, OrientDBs Object Database functionalities.  

But somehow we can't figure out how to get a decent performance out of our experimental setup.

So far we've persisted about 95,601 of our Objects (books and other media), resulting in
46,995,663 ORecords (see screen-shot) and about 18,8 GB of data on our NAS.
  
Our test-system:

- Virtual Machine on VMware
- SUSE 10 OS
- 1 TB of NAS.
- Java 1.8
- OrientDD PE Version 2.0.12
- QuadCore CPU



Select a book with a specific relation (like a triple): 

select from RecordImpl
where
  type
.uniqueKey = 'TOME'
and
  relationNode
.relations
  contains
(
   
(predicate.uniqueKey = 'IS_PUB_PLACE_OF')
   
and
    subject
.relationNodeContainers contains (uniqueKey = 'MILANO')
 
)
Query executed in 9.26 sec. Returned 20 record(s)      

9.26 sec. , how can we accelerate this query? We have indexes on all the uniqueKeys. 



We managed to accelerate this query a little by rewriting the statement like this:

select from RecordImpl
where 
  type contains [#46:0]
and (
  relationNode.relations 
  contains (
    predicate contains [#34:0] and subject contains [#30:18]
  ) 
)
Query executed in 7.122 sec. Returned 20 record(s) 

 7.122 sec. , sadly not acceptable. And that is one of the more simple questions we'd like to get answered in a decent time. 



Now this one with a simple order by: 

select from RecordImpl
where 
  type contains [#46:0]
and (
  relationNode.relations 
  contains (
    predicate contains [#34:0] and subject contains [#30:18]
  ) 
)
order by sortIndex desc
Query executed in 133.423 sec. Returned 20 record(s) 



So, any ideas how we could accelerate our queries? What do we wrong? 


Best regards & thanks,

Sebastian 
             

Edit: Added number of cores (4) at sys specs
schema.jpg
indexes.jpg

scott molinari

unread,
Jul 29, 2015, 12:28:46 AM7/29/15
to orient-...@googlegroups.com, sebastian.l...@gmail.com
I'm a noob at OrientDB too, but I'd ask these questions from experience. How much RAM does your VM have? If it isn't more than 18.8 GB, then you'll have an issue with disk I/O (even with a fast NAS).

Did you add any indexes (which means you'd need considerably more RAM than 18.8 GB)?

Is the media you mentioned actually media, like pictures or video? Although you can store data like that, it would be wise to store it in something like a distributed file system and just have the locations to the media stored in the database. Edit: this last note I don't think would be a cause for your slow performance, but if you can reduce the size of the database considerably, it might help.

Scott 

Zapp El

unread,
Jul 29, 2015, 6:15:55 AM7/29/15
to OrientDB, scottam...@googlemail.com
Hi Scott,

thanks for your reply!

The VM has 16 GB of RAM. Plenty I'd say, and since we want to store more than ten times than we got now, we have to add new VMs for clustering pretty early.
The Data Model is pretty fleshed out right now, we'll have to boil it down at some point in the future. No, we don't store actual binary data in OrientDB, we store, like you said, just references on files.

But I thought OrientDBs memory consumption is limited by
MAXHEAP="-Xmx512m"
anyway?

Thanks & best regards,

Sebastian

scott molinari

unread,
Jul 29, 2015, 7:35:59 AM7/29/15
to OrientDB, sebastian.l...@gmail.com

Zapp El

unread,
Jul 29, 2015, 7:49:42 AM7/29/15
to OrientDB, scottam...@googlemail.com
Yes. But thx for the hint anyway.

Just dropped all indexes and dropped all unneeded Classes. No improvement.

My greatest concern lies still with the "ORDER BY" clause. I think, all other queries could be easily improved by horizontal scaling (more server-nodes) and a leaner ClassModel.

Our current $ORIENT_HOME/bin/server.sh looks like this:

LOG_FILE=$ORIENTDB_HOME/config/orientdb-server-log.properties
WWW_PATH
=$ORIENTDB_HOME/www
ORIENTDB_SETTINGS
="-Dprofiler.enabled=true -XX:+AggressiveOpts -XX:CompileThreshold=200 -XX:+PerfDisableSharedMem"
JAVA_OPTS_SCRIPT
="-Djna.nosys=true -XX:+HeapDumpOnOutOfMemoryError -Djava.awt.headless=true -Dfile.encoding=UTF8 -Drhino.opt.level=9"

#  -Ddb.mvcc=true -Dstorage.useWAL=false -Dstorage.wal.syncOnPageFlush=false

# ORIENTDB MAXIMUM HEAP. USE SYNTAX -Xmx<memory>, WHERE <memory> HAS THE TOTAL MEMORY AND SIZE UNIT. EXAMPLE: -Xmx512m
MAXHEAP
="-Xmx512m"
# ORIENTDB MAXIMUM DISKCACHE IN MB, EXAMPLE, ENTER -Dstorage.diskCache.bufferSize=8192 FOR 8GB
MAXDISKCACHE
="-Dstorage.diskCache.bufferSize=12288"

MAXHEAP and
MAXDISKCACHE sizes are okay for 16GB of RAM?

Regards & thxs,

Sebastian

Zapp El

unread,
Jul 31, 2015, 6:46:51 AM7/31/15
to OrientDB, sebastian.l...@gmail.com

So, we officially gave up on OrientDB.

Performance is just too bad for our use-case. We did try a couple of things more, but none of them helped. Even with a very small amount of Data (2 GB, ~ 4,859,173 Records), Performance is abysmal.

That is really a pity. I totally like the basic concept of OrientDBs ObjectDB.

Since I work with several hundred GBs of index data with Lucene and Solr on a daily base and never had a problem to achieve a decent performance my only guess right now is that OrientDBs ObjectDB suffers from a poorly written query optimization.

And BTW, we really hoped that one of the developers would chime in here.
In the end, we are potential customers, but as long as we can't get a basic proof of concept going or at least get confirmation, that our data isn't a complete mismatch for OrientDB, we won't buy any licences.

Best regards,

Sebastian

Luca Garulli

unread,
Jul 31, 2015, 9:32:41 AM7/31/15
to OrientDB, sebastian.l...@gmail.com
Hi Sebastian,
Sorry to have seen this now, I hope it's not too late.

Starting from your last query (about 7 seconds):

select from RecordImpl
where 
  type contains [#46:0]
and (
  relationNode.relations 
  contains (
    predicate contains [#34:0] and subject contains [#30:18]
  ) 
)

I see the bottleneck is the expression: relationNode.relations contains ( predicate contains [#34:0] and subject contains [#30:18] ). In facts with such expression OrientDB does a full scan of many records. You can try by prefixing EXPLAIN to the query:

explain select from RecordImpl where type contains [#46:0] and ( relationNode.relations contains ( predicate contains [#34:0] and subject contains [#30:18] ) )

The secret for fast queries is, in any DBMS, using indexes as much as you can. When you use the dot notation (.) OrientDB can't use the indexes. By reading the original query:

select from RecordImpl
where 
  type
.uniqueKey = 'TOME'
and 
  relationNode
.relations 
  contains 
(
   
(predicate.uniqueKey = 'IS_PUB_PLACE_OF') 
    
and 
    subject
.relationNodeContainers contains (uniqueKey = 'MILANO')
  
)

You have 3 conditions to match. If you'd use the Graph API you'd have bidirectional edges, so you can start from any point in the graph and cross in any direction. For example you can lookup for all the place of type "IS_PUB_PLACE_OF" and start crossing the graph matching the other conditions. Or you could do the same with "MILANO".

To help you more I'd need the schema of the entities involved in this query.

Best Regards,

Luca Garulli
Founder & CEO


--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Zapp El

unread,
Jul 31, 2015, 1:17:21 PM7/31/15
to OrientDB, sebastian.l...@gmail.com, l.ga...@orientdb.com
Hi Luca,

thanks for your response.

Yeah it's kinda too late now. Hibernate/postgres guys won. This time, at least. We had to decide today because of a current project.
But for future projects, jury is still out. We still want and we have to build a large, flexible network of related information for our Library.

Regarding your response, I'm confused and terrified at the same time. But at least now I know that I know nothing about graph databases. (lol)

Since I had zero experience with graph databases the whole concept with these links and navigating through the data with "java-like access-paths" (like accessing java-class properties) felt super natural to me.
I was able to write queries in a few days without much learning.

I experimented a little with TRAVERS and I was able to build a query which performs way better than the others:
select from (
        traverse
* from (
         
select from RelationImpl where predicate.uniqueKey = 'IS_PUB_PLACE_OF'
         
and subject.relationNodeContainers contains (uniqueKey = 'MILANO')
       
) while $depth <= 3
   
)
) where @class = 'RecordImpl' and type.uniqueKey = 'TOME'
order
by sortIndex asc

Less than 2 seconds with order by, compared to 14 seconds for the first query I've posted.
TBH, I have no idea what kind of black magic I've done there.

Anyways, we have to move on for now. 

Best regards & thanks again,

Sebastian

Martin gan

unread,
Mar 14, 2016, 2:12:44 AM3/14/16
to OrientDB
Orientdb is really worst than better in market as the following query take more than 20 seconds for 30k records. The following query is the prove.
our company is thinking about give up this database for future development. haiz..... feel very upset about this database. 


select @rid.in() AS rid
distance(lat, lng, 3.0797, 101.5186) AS distance,
@rid.in().out("act")[@class="activity"].total[0] AS activity
from place
where @rid.in()[0].@class in ['user', 'hotel'] 
ORDER BY (distance ASC), (activity DESC)
limit 10 

On Wednesday, July 29, 2015 at 1:13:06 AM UTC+8, Zapp El wrote
Reply all
Reply to author
Forward
0 new messages