Performance

62 views
Skip to first unread message

uoccou

unread,
Jun 16, 2010, 7:53:41 AM6/16/10
to jenabean-dev
Hi - are there any performance figures for JeanBean ?
Im looking at the performance of my webapp with a MySQL backed SDB
repository. Writes seem to be pretty slow. Im not sure if this is to
be expected or not, with Jena etc really built for publishing data. Do
you have any figures for say persisting a foaf:Person with say 2 of
the Document properties and a Collection property set ?

Heres some data from a test I did profiling saves of an Object that
has an Id, a name, 2 Double properties and references to 3 other
objects - one based on foaf:Person, the other 2 based on foaf:Agent.
Im calling Bean2RDF.save(Object).

The figures you see are for only 10-12 writes of these objects, only
difference being the Id and values for the Doubles. As you can see
most of the time is spent on socket reads, and in jenas
LoaderTuplesNodes. Ive tuned MySQL some, and played with various
driver settings, but it seems that Jena is taking about 0.6-1 sec for
each Object. The jena code is taking about 25-30% of the time, sockets
(MySQL readAheadBuffer I think) the rest. This seems excessive from
working with other ORM kind of database apps. How does it compare to
any tests anyone else has done ? I want to ask here first, so I can
say Ive had your opinions before I ask the Jena dev team.

Im not tied to MySQL, so running against Postgres (or perhaps with
Neo4j?) is an option.

Hot Spots - Method Self time [%] Self time Invocations
java.net.SocketInputStream.read(byte[], int, int) 68.449265 15729.795
ms 18376
com.hp.hpl.jena.sdb.layout2.LoaderTuplesNodes$Commiter.run() 23.56751
5415.867 ms 9
com.mysql.jdbc.MysqlIO.send(com.mysql.jdbc.Buffer, int) 0.75534946
173.581 ms 9189
java.net.PlainSocketImpl.available() 0.6250809 143.645 ms 36754
com.hp.hpl.jena.sdb.layout2.NodeLayout2.hash(String, String, String,
int) 0.29060203 66.781 ms 5184
com.mysql.jdbc.PreparedStatement.fillSendPacket(byte[][],
java.io.InputStream[], boolean[], int[]) 0.19704768 45.282 ms 7020
com.mysql.jdbc.ConnectionPropertiesImpl
$BooleanConnectionProperty.getValueAsBoolean() 0.18906255 43.447 ms
109512
com.mysql.jdbc.Buffer.writeBytesNoNull(byte[]) 0.17081644 39.254 ms
49068
java.net.SocketInputStream.available() 0.17009407 39.088 ms 36754
com.mysql.jdbc.MysqlIO.buildResultSetWithUpdates(com.mysql.jdbc.StatementImpl,
com.mysql.jdbc.Buffer) 0.14851464 34.129 ms 9180
com.mysql.jdbc.util.ReadAheadInputStream.fill(int) 0.13861483 31.854
ms 18376
com.mysql.jdbc.Buffer.ensureCapacity(int) 0.1267307 29.123 ms 78804
com.mysql.jdbc.util.ReadAheadInputStream.checkClosed() 0.122044064
28.046 ms 73510
com.mysql.jdbc.MysqlIO.sqlQueryDirect(com.mysql.jdbc.StatementImpl,
String, String, com.mysql.jdbc.Buffer, int, int, int, boolean, String,
com.mysql.jdbc.Field[]) 0.11212684 25.767 ms 9189
com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(byte[],
int, int) 0.11210073 25.761 ms 18378
org.apache.catalina.startup.HostConfig.checkResources(org.apache.catalina.startup.HostConfig.DeployedApplication)
0.10843671 24.919 ms 72

Taylor Cowan

unread,
Jun 16, 2010, 10:25:18 AM6/16/10
to jenabe...@googlegroups.com
This feels like a good question for the jena developers group.
JenaBean should perform almost as good as raw jena, save for querying
very very large results.

I use TDB, it's much faster. I've never been impressed with triple
store performance...it's always been an issue when trying to create a
consumer facing web portal with Jena.

Most mature Jena utilzations place Jena fairly far back, in tiers,
there was a discussion along these lines at the SFO semantic web users
group.

Neo4j performs better, it's not RDBMS backed but a full fledged graph
database, but you loose some of the inferencing goodness of jena.

Taylor

> --
> You received this message because you are subscribed to the Google Groups "jenabean-dev" group.
> To post to this group, send email to jenabe...@googlegroups.com.
> To unsubscribe from this group, send email to jenabean-dev...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/jenabean-dev?hl=en.
>
>

uoc

unread,
Jun 16, 2010, 10:55:17 AM6/16/10
to jenabe...@googlegroups.com, Taylor Cowan
Thanks - interesting.

I'm going to try first and put my database on a more powerful machine, rather than the one from prehistory Im running with atm. I dont have comparable results for the same app Im working with with Hibernate say, so I cant really say that Jena is better or worse. It just doesnt feel good.

I've tried also creating skeleton objects that just have only an id property set when I need reference objects when composing others, and Im assuming that a save on that wont overwrite the persisted reference properties, or wipe them out. That is for an existing and persisted object a of type A that will be a property, by reference on an Object of type B, and itself has properties a.p1, a,p2, a.p3 etc, if I create a new A - call it "aa", and set its Id to that of "a"'s (ie its a Clone by virtue of its ID, but an empty one),

//a or its id comes from a webapp param, or by RDF2Bean.load(id)
A aa = new A(a.getId());
B b = new B();
b.setA(aa)
Bean2RDF.save(b)

then the existing instance of A in the repository,and all its attendant properties a.p1, a.p2, a.p3 etc will remain, an not be zapped because aa has the same id but it's properties are empty. Initial tests, seem to support this, and where A is a "heavy" object, help performance, not insignificantly. Am I correct ?

If you're using TDB/Neo4j in a webapp cluster, how do you work it ? What do you do with concurrent, multithreaded, per JVM access - aren't those parallel requests now funelled into a sequence ? How do you get multiple webapp instances to scale with it, across JVMs ? The app I'm working on is going to need to scale and be geodistributed (hopefully).

Do you have a ref to that doc from the SFO ? I'm interested in understanding what you mean by having jena far back, in tiers. I hope my assumptions to date, that semantic web technologies are capable of servicing a cloud based webapp in-toto arent wrong ! Dont want to have to stick an RDBMS in there as a shoehorn, or do all my work in Sparql against Joseki or somesuch...

thewebs...@gmail.com

unread,
Jun 24, 2010, 9:38:00 PM6/24/10
to jenabean-dev
Good questions on the concurrency stuff. This is where Jena shows
it's immaturity. You handle concurrency in the app (your app) using
model.enterCriticalSection() ( or something to that effect). The Jena
docs are clear on that. JenaBean does this for you, so interactions
with multiple threads against one model will block until the first
commer ends the locked section of code. scaling out Jena requires you
to have one update model, that is replicated into multiple shared
static models, or you migh wrap the model in some kind of RESTful api
that services a web tier.

http://jena.sourceforge.net/how-to/concurrency.html
"Applications need to be aware of the concurrency issues in access
Jena models"

Sorry, but this is do it yourself stuff. If you search the jena user
goup list you'll find similar questions being asked (even by me a few
years ago).

For neo4j you might use the RESTful api to service your web tier,
that's how the ruby apps utilize neo4j.

One thing we all need to learn regarding Jena apps is that good jena
apps have many models. It's difficult for those of us steeped in
relational data applications. The nice part regarding models is that
it is very easy to merge them. The presenter was Sanjiva Nath (http://
www.meetup.com/The-San-Francisco-Semantic-Web-Meetup/members/8918624/).
I cannot find his slides but I believe he would share them if asked.
> >> com.mysql.jdbc.MysqlIO.buildResultSetWithUpdates(com.mysql.jdbc.StatementIm­pl,
> >> com.mysql.jdbc.Buffer)  0.14851464      34.129 ms       9180
> >> com.mysql.jdbc.util.ReadAheadInputStream.fill(int)      0.13861483      31.854
> >> ms      18376
> >> com.mysql.jdbc.Buffer.ensureCapacity(int)       0.1267307       29.123 ms       78804
> >> com.mysql.jdbc.util.ReadAheadInputStream.checkClosed()  0.122044064
> >> 28.046 ms       73510
> >> com.mysql.jdbc.MysqlIO.sqlQueryDirect(com.mysql.jdbc.StatementImpl,
> >> String, String, com.mysql.jdbc.Buffer, int, int, int, boolean, String,
> >> com.mysql.jdbc.Field[]) 0.11212684      25.767 ms       9189
> >> com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessar­y(byte[],
> >> int, int)       0.11210073      25.761 ms       18378
> >> org.apache.catalina.startup.HostConfig.checkResources(org.apache.catalina.s­tartup.HostConfig.DeployedApplication)
> >> 0.10843671      24.919 ms       72
>
> >> --
> >> You received this message because you are subscribed to the Google Groups "jenabean-dev" group.
> >> To post to this group, send email to jenabe...@googlegroups.com.
> >> To unsubscribe from this group, send email to jenabean-dev...@googlegroups.com.
> >> For more options, visit this group athttp://groups.google.com/group/jenabean-dev?hl=en.- Hide quoted text -
>
> - Show quoted text -

uoc

unread,
Jun 25, 2010, 5:46:14 AM6/25/10
to jenabe...@googlegroups.com, thewebs...@gmail.com
Thanks Taylor.

I got around my problems by using a batching technique with the SDB
model. This gives acceptable performance on write, and is a common
technique (some might say gotcha) for ORM layer performance improvements
also. I was surprised at the relative difference even with a small
Object - like one I described earlier.

So now, for me, each web connection gets a pooled database connection,
which in turn used to build a model to SDB for the duration of that
request. Where the request doesnt spawn other threads that need access
to the model, then enterCriticalSection and method sync doesnt seem to
be neccessary to me - any conflict is ultimately handled by database
ACID heuristics (I hope ! - still need to load test this). That said, Im
not about to change any code until I need to.

I still cant see how an in JVM repository can scale across a "cluster"
or farm of webapps. As you say, blocking on write is arguably just not
good enough, but also replicating an update model to several read-only
models, and having to restart or reload the JVM wont cut the mustard
either. (Tell me if Ive misunderstood you please). At least with a
database you can replicate and not need webapp/jvm restarts, eg MySQL
shards, and this approach is very useful where speed and reactivity
expectations are different dependent on what the client is doing. A
client request to post a deep object can be expected to be a slower
operation than reading a 100 items out of the repository.

I had also started to look at AllegroGraph which uses Http rather than
jdbc to provide access to a repository, but on reflection I cant see how
this would be any better - theres still a need to batch up changes to
avoid chatty network and repository calls, and some of the services
provided by Allegro Ive also tackled with Joseki and LARQ for instance.
Still, it looks interesting and Ive asked Jeremy Carroll for comment on
this.[1] Think hes busy at SemTech tho...

[1] http://tech.groups.yahoo.com/group/jena-dev/message/44330

Reply all
Reply to author
Forward
0 new messages