Performance level of Empire-RDF

46 views
Skip to first unread message

Nicolas Delsaux

unread,
Nov 17, 2011, 10:42:33 AM11/17/11
to Empire
Hi,
As a follow-up to previous message, I have acknowledged some
performance issues when using Empire-RDF on my stack.
First, some details.
You may now know I'm developping a client/server application, relying
upon a typical Java EE6 server-side stack : a Glassfish server hosts a
bunch of EJBs, containing by buisness logic, and all that logic is
invoked through RMI-IIOP by the grace of remote ejb invokations (using
the classical JNDI location approach).
This foundation stack hosts an EAR containing the code that calls
Empire-RDF which uses as sail graph the blueprints ouplementation of
SailGraph running over Neo4J. So my persistence stack is something
like

JPA
Empire-RDF
Tinkerpop-Blueprints
Neo4J

So, during the most easily reproducable user-leve action, I do exactly
two EJB method calls : a "findById" call followed by a "getSubItems"
one.
After some measures (made using JVisualVM on my Java7x64 VM), I have
to confess that -obviously due to my misunderstandings, and absolutely
not due to the stack I use (that's my work hypothesis), each of these
calls, made through RMI, consumes on the server side an icnredible
amount of time :

findById consumes around 1700 ms per call (this method does a simple
EntityManager#find(Class, String) method call)
getSubItems consumes around 3000 ms per call (this method does a
someway more complex Query#getResultList())

Well, I don't know you, but to me, those numbers are two or three
orders of magnitude too bigs.
A little drilldown in call tree reveals that (these figures are a
little too big to me, that's not to say they're unnormal ... or they
are, i can't guess) each call to DataSourceUtil#describe consums
roughly 200 ms (a time spent for most of its part - 120 ms - in
org.neo4j.kernel.EmbeddedGraphDbImpl$AllNodesIterator#hasNext()).
More drilldown reveals that, as far I understand things, Empire6RDF
performances may not be the corner stone, but rather the poor
messenger of neo4j lack of performance in my context.
I've already sent a ticket to neo4j-jca-connector allowing me to have
optimized connections (https://github.com/alexsmirnov/neo4j-connector/
issues/1).
Anyway, has anyone here any advice on possible performances
optimizations of Empire-RDF.

Mike Grove

unread,
Nov 17, 2011, 11:07:54 AM11/17/11
to empir...@googlegroups.com

There are some shortcuts/cleanups that Empire could take, caching some
work, lazy loading, etc. which might help spread out the work load,
but at its core, Empire is just sending SPARQL queries to the database
and more often than not, the database is the performance bottleneck
and not Empire. As you noticed w/ that call to
DataSourceUtil.describe, 60% of the execution time was spent for a
single method call on the database. Similarly, Empire's
EntityManager.find method is just a SPARQL query against the database
to retrieve what you're attempting to find.

In my experience, the neo4j Sail implementation is rubbish. The data
load performance was horrible -- I could not load datasets of a few
million triples at all, and query performance was by far worse than
every other dedicated RDF database on the market. Granted, this was
over a year ago, and things could have improved since then, but it was
not a promising start.

I'd first recommend that unless its an absolute requirement that you
use neo4j as an RDF database, that you switch to an actual RDF
database and re-measure performance. The easiest thing to switch to
for prototype and trying to evaluate this performance issue is to use
Sesame's memory Sail repository. The blueprints stuff should work
fine over top of that if you need to use it, and I know Empire works
fine with the normal Sesame connectors. I suspect this will be
significantly more performant in your application than using neo4j as
an RDF database.

Cheers,

Mike

Reply all
Reply to author
Forward
0 new messages