NHibernate is taking an unexpectedly long time to load a large number
of entities. I'm willing to investigate further, but I'd like
suggestions on what to investigate.
First off, my scenario is to load a large number of objects at process
start and then cache the data for hours - this is because this
particular entity represents a tree... ie not an ideal structure for
querying with SQL.
* # Entities: 61,000
* Time to materialize entities from local DB: 10s
So, obviously, NHibernate is very very fast here. Simply retrieving
the data takes about 2 seconds locally.
* Time for toad to execute same query & download rows from remote DB:
62s
Yes - perhaps I'm working over a slow connection to the remote
database - but I would then expect NHibernate to take a further 8
seconds to materialize these entities.
* Time for NHibernate to materialize entities from remote DB: 246s
eh? That doesn't make sense. Can anyone suggest areas I can probe to
determine the cause?
I don't think it's relevant, but NH is configured to use LinFu. My
query is a simple "from Entity e" HQL query.
Thanks,
Richard
Questions:
Do you need all 61000 at once?
Can you process the rows in batch?
Can the processing be done offline and then persisted to local
storage?
Do you require NH to retrieve the rows?
I ask because what you describe above is an intensive operation that
is more akin to ETL processing than ORM data mapping. You may want to
consider processing the tree offline through an asynchronous process
and storing the results in a local persistent storage.
On Mar 19, 2:13 pm, Jason Meckley <jasonmeck...@gmail.com> wrote:
> 10 seconds just to retrieve/hydrate the data isn't fast at all. I
> would want that to take milliseconds. Caching 61000 entities doesn't
> solve the problem either, you have just moved the problem to another
> location.
>
> Questions:
> Do you need all 61000 at once?
As I explained, yes, they are cached for hours
> Can you process the rows in batch?
No - they need to be loaded into a process space all at once and kept
in that process space for hours.
> Can the processing be done offline and then persisted to local storage?
No. This is not an offline scenario
> Do you require NH to retrieve the rows?
Ideally I would like to keep NH in charge for other reasons (eg
database agnosticity). I'm pragmatic about the solution - I don't have
any problems using a DataReader. However NHibernate is a better
architectural fit for this problem (1st level cache/domain entities
etc) and thus I'd like to keep it.
>
> I ask because what you describe above is an intensive operation that
> is more akin to ETL processing than ORM data mapping. You may want to
No it's not ETL. As I said, it's a tree graph which needs to be
queried in real time (milliseconds). And we can afford the data to be
hours out of date.
> consider processing the tree offline through an asynchronous process
> and storing the results in a local persistent storage.
The use case of my domain is correct and it is not ETL or a batch job.
So, Can you suggest which parts of the NH source code I should
investigate to determine why it is taking 180 seconds longer than
would be expected? Obviously, I'm looking at NHibernate.Loader.Loader
- and possibly at the Batchers - but it's difficult to tell whether
it's the materialization or IDataReader or a combination thereof that
is adding this extra 3 minutes. Logging at DEBUG doesn't help too
much.
Richard
--
You received this message because you are subscribed to the Google Groups "nhusers" group.
To post to this group, send email to nhu...@googlegroups.com.
To unsubscribe from this group, send email to nhusers+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nhusers?hl=en.
Richard
On Mar 19, 4:05 pm, John Davidson <jwdavid...@gmail.com> wrote:
> You really need a profiler to figure out where execution times are being
> consumed
>
> http://www.red-gate.com/products/ants_performance_profiler/index.htmhttp://www.nhprof.com/
>
> both options provide free trial downloads
>
> John Davidson
>
> > nhusers+u...@googlegroups.com<nhusers%2Bunsu...@googlegroups.com>
First off I didn't design the system, those people are long long gone.
Whether it's right or wrong to store a tree in a database is
irrelevant. The tree needs to be processed at speed. Our use case
allows this data to be many hours out of date.
This is standard practice for a scalable system. Let's ask Google how
up to date their in-memory caches are compared to their MySql
databases?
> Are you really using the 1st level cache for 61k objects for hours? This
> means you have an incredibly bloated singleton session. Remarkably bad
> design.
No - the session only lives for the time to load this data in. These
entities are then transformed into a set of immutable entities for
concurrency reasons. Ignoring the fact that it may be better to use a
NOSQL graph store, this is a Good Design.
Richard
What is your round-trip-time to the remote DB machine?
Can you have a look at the network traffic generated for the database
connection to the remote database, both for the NHibernate case and
when using the query analyzer? Is the number of packets sent in each
direction significantly different for the two cases? Could it be that
NHibernate generates more round trips to the server in order to get
all the data?
/Oskar
2010/3/19 RichB <rbi...@gmail.com>:
local = sql
remote = oracle
I elided that from my original description to keep the problem
simpler...
> Are you using some sort of query analyzer to
> get the 2s and 62s figures?
Wall-clock time
> What is your round-trip-time to the remote DB machine?
96ms
> Can you have a look at the network traffic generated for the database
> connection to the remote database, both for the NHibernate case and
> when using the query analyzer? Is the number of packets sent in each
> direction significantly different for the two cases? Could it be that
> NHibernate generates more round trips to the server in order to get
> all the data?
That's certainly something to try, but I think I've got other steps to
try before then.
Richard
There was no lazy loading/select n+1.
I watched the NH logging at DEBUG and Hydration was very simple,
Richard
Would that be mysql, postgresql or mssql? Never mind actually, it's
probably not important.
> remote = oracle
>
> I elided that from my original description to keep the problem
> simpler...
>
>> Are you using some sort of query analyzer to
>> get the 2s and 62s figures?
>
> Wall-clock time
I meant, what type of program? Is it
type-in-any-sql-and-get-a-basic-grid-of-results?
>
>> What is your round-trip-time to the remote DB machine?
>
>
> 96ms
That really is quite a lot.
Doesn't oracle have something like "fetchsize" which governs how many
rows are transmitted at a time? I don't know a lot about oracle, I
just tried googling a bit for it. If the oracle driver for .net is
using a default fetchsize of about 20 rows, you're going to get
(61000/25)*96ms = ~234s delay in just network latency. This is what I
was hinting at below.
>
>> Can you have a look at the network traffic generated for the database
>> connection to the remote database, both for the NHibernate case and
>> when using the query analyzer? Is the number of packets sent in each
>> direction significantly different for the two cases? Could it be that
>> NHibernate generates more round trips to the server in order to get
>> all the data?
>
> That's certainly something to try, but I think I've got other steps to
> try before then.
>
>
/Oskar
Cheers,
John
Oskar, thanks for being extremely helpful.
I've overridden OracleDataClientDriver::OnBeforePrepare and added:
OracleCommand cmd = command as OracleCommand;
cmd.FetchSize *= 50;
and with this my 246s time has come down to 60-80s.
I also found that NHibernate has this to say about FetchSize:
// H2.1 handles FetchSize here - not ported
Again, thanks for the hint
Richard
--
You received this message because you are subscribed to the Google Groups "nhusers" group.
To post to this group, send email to nhu...@googlegroups.com.
To unsubscribe from this group, send email to nhusers+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nhusers?hl=en.
If the information is cached it's stale. After all that's what a cache
is. If you can serve stale data to the user, why not move the
processing completely off line so a user never has to wait for the
full processing? the processed data can be stored so all the UI needs
to do is 'select * from' display, no aggregation, mapping,
transforming, etc. before displaying. It was done off line. The
service can run this process every 60 minute keep inline with cache
limit. with the results in a persistent store all ready to go there
is not need for in-memory caching.
If this approach seems too foreign/abstract/complex then you don't
have to solve it this way. It's simply another option.
There is no UI. This is a service. A service in which all 61,000 nodes
in a tree must be visited in milliseconds.
I totally agree an architecture should be constantly re-evaluated both
in response to changing technologies, but also to new domain insights.
> If the information is cached it's stale. After all that's what a cache
> is. If you can serve stale data to the user, why not move the
> processing completely off line so a user never has to wait for the
> full processing?
Not if the client expects the server to respond in milliseconds with a
tailored response.
> the processed data can be stored so all the UI needs
> to do is 'select * from' display, no aggregation, mapping,
> transforming, etc. before displaying. It was done off line.
It sounds like you're suggesting a true CQRS architecture. In my case,
the client sends a complex set of inputs to the server. The server
will use the data in the tree along with the inputs to create a
response. This can't be done ahead of time as the possible number of
outputs is close to infinity.
> If this approach seems too foreign/abstract/complex then you don't
> have to solve it this way. It's simply another option.
Thanks for trying to put a different slant on things.
Richard
You should try to flag these entities as mutable=false, use a stateless
session, that should resolve your nh bottleneck (or at least, you'll get
the maximum speed for creating entities)
This could give you some extra ms.
With no information on your data structure, it is difficult to
understand if you have any db bottleneck.
The batch size could also be tunned.
Fred.