There are some shortcuts/cleanups that Empire could take, caching some
work, lazy loading, etc. which might help spread out the work load,
but at its core, Empire is just sending SPARQL queries to the database
and more often than not, the database is the performance bottleneck
and not Empire. As you noticed w/ that call to
DataSourceUtil.describe, 60% of the execution time was spent for a
single method call on the database. Similarly, Empire's
EntityManager.find method is just a SPARQL query against the database
to retrieve what you're attempting to find.
In my experience, the neo4j Sail implementation is rubbish. The data
load performance was horrible -- I could not load datasets of a few
million triples at all, and query performance was by far worse than
every other dedicated RDF database on the market. Granted, this was
over a year ago, and things could have improved since then, but it was
not a promising start.
I'd first recommend that unless its an absolute requirement that you
use neo4j as an RDF database, that you switch to an actual RDF
database and re-measure performance. The easiest thing to switch to
for prototype and trying to evaluate this performance issue is to use
Sesame's memory Sail repository. The blueprints stuff should work
fine over top of that if you need to use it, and I know Empire works
fine with the normal Sesame connectors. I suspect this will be
significantly more performant in your application than using neo4j as
an RDF database.
Cheers,
Mike