Indexing and Queries with Java API

58 views
Skip to first unread message

John J. Szucs

unread,
Aug 16, 2016, 6:39:16 PM8/16/16
to OrientDB
In my OrientDB-based application, I need to do an INSERT-IF-NOT-EXISTS operation using the Java (TinkerPop) API.

I have created a vertex type "Identifier." It has a single property, "identifier," which contains a URI (effectively a String for purposes of this discussion).

I have also created an index like this:

ParametersBuilder builder=new ParametersBuilder(); 
builder.add("class", "Identifier"); 
builder.add("type", "UNIQUE_HASH_INDEX");
graph.createKeyIndex("identifier", Vertex.class, builder.build());

Then, I perform the INSERT-IF-NOT-EXISTS operation in a loop like this. This snippet is using the Google Guava libraries and is obviously a simplification of our real application:

int n=10000;
for (int i=0; i<n; i++)
{
String myUriStr="http://example.org/"+i.toString();
Iterable<Vertex> vertices=graph.getVertices("identifier", myUriStr);
Vertex vertex=Iterables.getOnlyElement(vertices);
if (null==vertex)
{
// Create vertex
...
}
// Use vertex
...
}

What I am seeing is that the throughput of this loop rapidly diminishes as more vertices are added, like this (with the throughput relative to the n=1,000 baseline):

n=1,000 throughput=100%
n=2,000 throughput=58.8%
n=5,000 throughput=29.7%
n=10,000 throughput=16.5%

This obviously suggests that indexing is not working, so I tried a SQL EXPLAIN command.

explain select from identifier where identifier='http://example.org/1'
documentReads=1
fullySortedByIndex=false
documentAnalyzedCompatibleClass=1
recordReads=1
fetchingFromTargetElapsed=0
indexIsUsedInOrderBy=false
compositeIndexUsed=1
current=Identifier#153:0{identifier:http://example.org/1,out_id:[size=1]} v2
involvedIndexes=[Identifier.identifier]
limit=-1
evaluated=1
user=#5:0
elapsed=2.387001
resultType=collection
resultSize=1 
 
The documentation at http://orientdb.com/docs/master/SQL-Explain.html does not seem to be 100% current on how to interpret the output of the EXPLAIN command, but my interpretation is that the query did recognize and use the index that I created.

I also tried some profiling (with JProfiler) and see a hot spot at com.tinkerpop.blueprints.impls.orient.OrientElementIterator.hasNext.

All of this is with OrientDB running in embedded mode, on a fairly high-end Linux machine and with a fresh, empty database at the beginning of each test.

I have to believe I am doing something wrong to see such a rapid drop-off in query performance under such relatively small data volumes.

I have been struggling with this for several days off-and-on now and it's time to ask for help. Has anyone else encountered a similar issue? What can I do to address this?

Thanks in advance!

-- John

Luca Garulli

unread,
Aug 16, 2016, 7:01:10 PM8/16/16
to OrientDB
It looks like you're not using the index from the Graph API. Look at the documentation:


If it's not clear, please write here again, we will help you on this ;-)

Best Regards,

Luca Garulli
Founder & CEO

Want to share your opinion about OrientDB?
Rate & review us at Gartner's Software Review


--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-database+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John J. Szucs

unread,
Aug 17, 2016, 8:45:48 AM8/17/16
to OrientDB
Luca,

I just tried this. The only change was:
Iterable<Vertex> vertices=graph.getVertices("identifier", myUriStr);
to:

Iterable<Vertex> vertices=graph.getVertices("Identifier.identifier", myUriStr);

The results speak for themselves:

Created 10000 entities in 00:02:05.755, 79.52 per second

This is the kind of performance I was expecting!

Thank you!!!

I will note that this was a very subtle change. Essentially, it seems that for the graph API's getVertices() method to use the indices, the property names have to be qualified with the vertex type name. Would you like for me to add an issue on GitHub to improve the documentation around this?

Thanks again!

-- John

To unsubscribe from this group and stop receiving emails from it, send an email to orient-databa...@googlegroups.com.

Luca Garulli

unread,
Aug 17, 2016, 11:55:51 PM8/17/16
to OrientDB
Hi John,

Happy to help. Yes, please, could you open a new issue for the documentation?

Best Regards,

Luca Garulli
Founder & CEO

Want to share your opinion about OrientDB?
Rate & review us at Gartner's Software Review


To unsubscribe from this group and stop receiving emails from it, send an email to orient-database+unsubscribe@googlegroups.com.

John J. Szucs

unread,
Aug 18, 2016, 4:29:49 PM8/18/16
to OrientDB
New issue opened at https://github.com/orientechnologies/orientdb/issues/6589.

BTW, the performance test results I shared yesterday where running under the debugger and with extensive instrumentation. Here are the "clean" results. Wow!

Created 10000 entities in 00:00:05.840, 1712.33 per second
Retrieving 10000 entities...
Retrieved 10000 entities in 00:00:01.561, 6406.15 per second
Deleting 10000 entities...
Deleted 10000 entities in 00:00:01.960, 5102.04 per second

Thanks again!

-- John

Luca Garulli

unread,
Aug 18, 2016, 5:08:11 PM8/18/16
to OrientDB
Cool you solved.

Anyway we have to improve the docs, because I'm sure many users just drop OrientDB after the first problem and maybe it's something trivial like this ;-)


Best Regards,

Luca Garulli
Founder & CEO

Want to share your opinion about OrientDB?
Rate & review us at Gartner's Software Review


To unsubscribe from this group and stop receiving emails from it, send an email to orient-database+unsubscribe@googlegroups.com.

John J. Szucs

unread,
Aug 18, 2016, 5:15:42 PM8/18/16
to orient-...@googlegroups.com


---
John J. Szucs (on my iPhone)
You received this message because you are subscribed to a topic in the Google Groups "OrientDB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/orient-database/iPU0QlY1yl4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to orient-databa...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages