Embedded List Query Performance In OrientDB

347 views
Skip to first unread message

Wise Jack

unread,
Apr 6, 2014, 10:33:05 PM4/6/14
to orient-...@googlegroups.com
I'm testing orientdb for a storage database of a knowledge base.

The database can be something like this:

[
    {
        fieldA: ['a','b','c']
    },
    {
        fieldA: ['c','d','e']
    },
]

and the query is something like this:

select from ClassA where 'c' in fieldA

The query is very very slow, the explain of the query is as below

{
    "@type":"d","@version":0,
     "involvedIndexes":["ClassA.fieldA"],
     "current":"#11:960477",
     "fetchingFromTargetElapsed":160596,
     "documentReads":959211,
     "documentAnalyzedCompatibleClass":959211,
     "recordReads":959211,
     "elapsed":160596.25,
     "resultType":"collection",
     "resultSize":1,
     "@fieldTypes":"involvedIndexes=e,fetchingFromTargetElapsed=l,documentReads=l,documentAnalyzedCompatibleClass=l,recordReads=l,elapsed=f"
 }
As you can see, even OrientDB used the fieldA index, it still costs 16 seconds to query a million records, it is unacceptable.

Is there any good way to make this query faster?

https://stackoverflow.com/questions/22896528/embedded-list-query-performance-in-orientdb

Andrey Lomakin

unread,
Apr 7, 2014, 5:25:27 AM4/7/14
to orient-database
Yes too slow.
What amount of RAM do you have ?-


--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Best regards,
Andrey Lomakin.

Orient Technologies
the Company behind OrientDB

Wise Jack

unread,
Apr 8, 2014, 1:51:06 AM4/8/14
to orient-...@googlegroups.com
Hi, Andrey.

Thanks for your reply. The memory information is as below:

[root@root ~]# cat /proc/meminfo
MemTotal:        8063160 kB
MemFree:          228968 kB

As you can see 

     "involvedIndexes":["ClassA.fieldA"],
     "current":"#11:960477",
     "fetchingFromTargetElapsed":160596,
     "documentReads":959211,
Even the database can see the index, but it still iterate all the documents in the database, I think that's the reason for the slow.

The same data in mysql(that using fieldA's index), can return data in 0.015second, so I think this is not the fault of the data, maybe there is a better way for creating index or querying using index for embedded list of OrientDB.

Andrey Lomakin

unread,
Apr 8, 2014, 9:09:30 AM4/8/14
to orient-database
Could you provide database sample ?

Wise Jack

unread,
Apr 8, 2014, 10:13:36 AM4/8/14
to orient-...@googlegroups.com
Hi, Andrey:

Sure. I'll send you a sample document of the database, I can't send the whole database to you since it's too large:

This is a sample record of the database, I'm immigrating a chemical compounds database from MySQL to OrientDB.
--------------------------------------------------
ODocument - Class: Compound   id: #11:5111   v.1
--------------------------------------------------
      iupac_cas_name : chloro(trifluoro)methane
         create_date : Sat Jan 17 00:00:00 CST 1970
iupac_traditional_name : chloro(trifluoro)methane
cactvs_hbond_acceptor : 3
     component_count : 1
  cactvs_tauto_count : 1
     nonstandardbond : null
    molecular_weight : 104.45891
     coordinate_type : 1
5
255
 monoisotopic_weight : 103.964066
      iupac_inchikey : AFYPFACVUDMOHA-UHFFFAOYSA-N
          exact_mass : 103.964066
              xlogp3 : 2.0
          iupac_name : chloro(trifluoro)methane
  openeye_iso_smiles : C(F)(F)(F)Cl
compound_canonicalized : 1
 isotopic_atom_count : 0
     cactvs_subskeys : AAADcQAAAYAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQIAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==
atom_udef_stereo_count : 0
   cactvs_complexity : 28
iupac_systematic_name : chloranyl-tris(fluoranyl)methane
bond_udef_stereo_count : 0
bond_def_stereo_count : 0
  cactvs_hbond_donor : 0
     bondannotations : undefined
         cactvs_tpsa : 0
                 cas : [75-72-9, 185009-43-2
75-72-9, 50815-73-1, 000075-72-9, 185009-43-2, 4-01-00-00034 (Beilstein Handbook Reference)]
  openeye_can_smiles : C(F)(F)(F)Cl
    heavy_atom_count : 5
  iupac_openeye_name : chloro(trifluoro)methane
         iupac_inchi : InChI=1S/CClF3/c2-1(3,4)5
         modify_date : Sat Jan 17 00:00:00 CST 1970
   molecular_formula : CClF3
        total_charge : 0
        compound_cid : 6392
atom_def_stereo_count : 0
cactvs_rotatable_bond : 0

The embedded list field is the CAS field.

The schema of Class Compound is as the attachment.
compound.txt

Andrey Lomakin

unread,
Apr 11, 2014, 6:15:39 AM4/11/14
to orient-database
HI,
Could you try now ?

Wise Jack

unread,
Apr 14, 2014, 10:00:19 PM4/14/14
to orient-...@googlegroups.com
Hi, Andrey:

The result is the same, unacceptable slow, here is the explain:

orientdb {compounds}> explain select * from Compound where '000075-72-9' in cas

Profiled command '{involvedIndexes:[1],current:#11:960477,fetchingFromTargetElapsed:327390,documentReads:959211,documentAnalyzedCompatibleClass:959211,recordReads:959211,elapsed:327501.62,resultType:collection,resultSize:1}' in 327.528992 sec(s):
{"@type":"d","@version":0,"involvedIndexes":["Compound.cas"],"current":"#11:960477","fetchingFromTargetElapsed":327390,"documentReads":959211,"documentAnalyzedCompatibleClass":959211,"recordReads":959211,"elapsed":327501.62,"resultType":"collection","resultSize":1,"@fieldTypes":"involvedIndexes=e,fetchingFromTargetElapsed=l,documentReads=l,documentAnalyzedCompatibleClass=l,recordReads=l,elapsed=f"}

Wise Jack

unread,
Apr 20, 2014, 1:07:08 AM4/20/14
to orient-...@googlegroups.com
Has anyone got a suggestion?

Luca Garulli

unread,
Apr 20, 2014, 4:38:59 AM4/20/14
to orient-database
Hi Jack,

if you execute this: 

select count(*) from Compound where '000075-72-9' in cas

How many records are retrieved? How much time does it take?

Lvc@

Wise Jack

unread,
Apr 20, 2014, 1:12:09 PM4/20/14
to orient-...@googlegroups.com
Hi, Lvc:

Here is the result:

orientdb {compounds}> select count(*) from Compound where '000075-72-9' in cas

----+-----+-----
#   |@RID |count
----+-----+-----
0   |#-2:0|1
----+-----+-----

1 item(s) found. Query executed in 222.285 sec(s).

Wise Jack

unread,
Apr 29, 2014, 12:35:38 AM4/29/14
to orient-...@googlegroups.com
Looks this problem is not attracting many concerns.

That's fine, I'll change the schema of the database, and make CAS more important field.

I'll try to find out the problem in the index from the code when done this work.
Reply all
Reply to author
Forward
0 new messages