SV: [nhusers] SQL Server Identity and Nhibernate.... what is the latest story?

339 views
Skip to first unread message

Roger Kratz

unread,
Nov 27, 2012, 3:08:44 PM11/27/12
to nhu...@googlegroups.com
The link you refer to (primarly) says it's a bad choice choosing a not sequential guid identifier strategy. Are you aware of guid.comb to avoid (most of the) fragmentation?

Probably noone can't give you a perfect answer "what's the best" though. Do some measurements on your important use cases. However - I doubt that using identity as primary key is the best choise (unless you have very few writes making the extra roundtrips negliable). If index size is important you should probably have a look at hilo strategy.


________________________________
Från: nhu...@googlegroups.com [nhu...@googlegroups.com] för Sam Jonesom [rafael.z...@gmail.com]
Skickat: den 27 november 2012 20:13
Till: nhu...@googlegroups.com
Ämne: [nhusers] SQL Server Identity and Nhibernate.... what is the latest story?

Hello All!

I do not mean to beat a dead horse, but the blog posts and articles I have read on this are 3 or more years old, and refer to NH 2.x.

Why I am asking:
We have a large SQL DB. We are finding that the use of GUIDs as primary keys and clustered indexes is creating LARGE indexes. Very large (many GB). Our research has clearly shown that using an int / identity as a primary key will dramatically cut our db size (like by %50), and simultaneously give our SQL server a huge perf boost (like over %100).

(You can see one write up on this topic here: http://www.sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx )

So, I have started an internal process to evaluate migrating our .NET-4 / NH-3.2 application to use SQL Server Identity columns as primary keys.

Please advise as to any write ups, success/horror stories, etc.

Thank you very much!




--
You received this message because you are subscribed to the Google Groups "nhusers" group.
To view this discussion on the web visit https://groups.google.com/d/msg/nhusers/-/K1K_5Ylri4YJ.
To post to this group, send email to nhu...@googlegroups.com.
To unsubscribe from this group, send email to nhusers+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nhusers?hl=en.

costa

unread,
Nov 28, 2012, 1:31:45 PM11/28/12
to nhu...@googlegroups.com
I personally don't understand why people try to use guid PKs everywhere and I hate them with a passion. They've probably seen Microsoft using them all over the place and have thought they should use them in all tables because it's smart. It's a brain fart.

If you need surrogate keys use int identity field (or the appropriate type, if you have billions of records, even though in that case I would probably question your database of choice hint, hint - that's a discussion for another time and only over beer). If you have varchar PKs use meaningful codes.

I wouldn't think twice about getting rid of them. They might be appropriate in some cases (maybe you have some data that needs to interface with a external system?)  but don't use them in all of your tables!



costa

unread,
Nov 28, 2012, 1:54:54 PM11/28/12
to nhu...@googlegroups.com
One more thing, your database designer should always use the simplest type for the surrogate primary keys. If (s)he decides to use PK Guids instead (s)he has to justify the use of them. You should always question the use of the GUIDs and add them only if there is a real benefit.


Gunnar Liljas

unread,
Nov 28, 2012, 2:55:01 PM11/28/12
to nhu...@googlegroups.com
It's very easy to justify the use of something that doesn't require
database roundtrips in order to retreive a new PK value.

2012/11/28 costa <costa...@gmail.com>:
> --
> You received this message because you are subscribed to the Google Groups
> "nhusers" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/nhusers/-/HojLX00brgsJ.

costa

unread,
Nov 28, 2012, 3:39:28 PM11/28/12
to nhu...@googlegroups.com
@Gunnar Liljas: If you are joking ignore my message. If you are not joking this is the query that gets executed by NH when a new record is inserted in a table with an identity(1, 1) field:

    INSERT
    INTO
        [
        dbo].[DataType] (
            [Name]
        )
    VALUES
        (@p0);
    select
        SCOPE_IDENTITY();

Note the select SCOPE_IDENTITY(); statement that's returning the new id in the same database call. Yes, it's an additional statement to get it but it's embedded in the same call and it should be very fast to get that value.

As Sam pointed out, there is a price to pay for using GUIDS. I actually have done performance tests because I had an idiot at work that started to use them everywhere, and the performance starts to degrade pretty soon. They are also ugly when you write and run queries.




Oskar Berggren

unread,
Nov 29, 2012, 4:10:13 AM11/29/12
to nhu...@googlegroups.com
Using 'identity' or similar will still disable INSERT batching. Using
hilo or pooled will give you integer identifiers and allow batching.

/Oskar


2012/11/28 costa <costa...@gmail.com>:
> --
> You received this message because you are subscribed to the Google Groups
> "nhusers" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/nhusers/-/FtEb2VQpsYYJ.

Oskar Berggren

unread,
Nov 29, 2012, 5:02:27 AM11/29/12
to nhu...@googlegroups.com
2012/11/28 costa <costa...@gmail.com>:
> I wouldn't think twice about getting rid of them. They might be appropriate
> in some cases (maybe you have some data that needs to interface with a
> external system?) but don't use them in all of your tables!

"interface to external system" should not be underestimated. (Don't
most systems nowadays? Or often will do at a later stage?) Anyway, I
find that everything becomes so much easier when "any" involved
component can easily create a new identifier and reference it even
before some particular storage component have blessed the object. GUID
is one possible solution.

/Oskar

Pete Appleton

unread,
Nov 29, 2012, 5:20:15 AM11/29/12
to nhu...@googlegroups.com
Ooh, this has all the makings of a religious war <mg>.

Just to weigh in, I think that the 'right' strategy is use-case
dependent. They all have their well-known pro's and con's, and the
appropriate balance depends upon your application. As an example, one
of our systems uses identity columns exclusively because making data
manipulation easy for a DBA is very important in that application;
another uses a mix of hi/lo integers for efficiency with GUID's in a
couple of tables that need 'guaranteed' uniqueness. It really depends
on your particular circumstance, and IMHO all these blanket statements
simply show a narrow-minded approach to problem solving.

That said, I personally loathe GUID's because I think that they're a
'lazy' and fragile solution to the guaranteed-unique-id problem (and
they're only unique if generated in the "approved" manner - which is not
what happens with most of them). I'd prefer some form of namespacing
construct (think SNMP OID's, or DNS if you're not old enough) for things
which _must_ be unique, but in the real world GUID's are the best we've
got so I use them when appropriate.

Gunnar - sorry, looks like I've brought out my bucket now!!

/Pete

-----Original Message-----
From: nhu...@googlegroups.com [mailto:nhu...@googlegroups.com] On
Behalf Of Oskar Berggren
Sent: 29 November 2012 10:02
To: nhu...@googlegroups.com
Subject: Re: [nhusers] Re: SQL Server Identity and Nhibernate.... what
is the latest story?

--
You received this message because you are subscribed to the Google
Groups "nhusers" group.
To post to this group, send email to nhu...@googlegroups.com.
To unsubscribe from this group, send email to
nhusers+u...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/nhusers?hl=en.


!DSPAM:1,50b732d05501085516068!


Elio Batista

unread,
Nov 29, 2012, 5:35:36 AM11/29/12
to nhu...@googlegroups.com
Are you awared of CpBT? When you have such requirement in your application Identity PK are the last choise. Generally if i cannot use guid.comb i then use HILO instead, but never identity int. 

Gunnar Liljas

unread,
Nov 29, 2012, 10:13:10 AM11/29/12
to nhu...@googlegroups.com
" IMHO all these blanket statements simply show a narrow-minded
approach to problem solving."


That sums it up perfectly. Know how the keys work, in NHibernate and
on the DBMS, and make an informed choice.

/G

2012/11/29 eag <enrique.alb...@gmail.com>:
> Costa, Oskar is correct, using Identity columns does not perform, you are
> better off using HiLo instead. You can easily prove it, if you have a long
> transaction inserting many records in the table with an identity column,
> even the INSERT commands are delayed to the end of the process, the Save
> method in NHibernate returns the Id, that round trip is a killer. I have
> seen processes were 10k records had to be created that were 10 times slower
> using Identity instead of HiLo. Any communication between the AppServer and
> Database server on the wire is expensive.
>
> As Oskar also mentions, batching can also improve performance, again,
> because you might reduce the number of loops between AppServer and Database.
> --
> You received this message because you are subscribed to the Google Groups
> "nhusers" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/nhusers/-/HoICwo6my_8J.

costa

unread,
Nov 29, 2012, 3:09:16 PM11/29/12
to nhu...@googlegroups.com
@PeteaA & Gunnar: Agreed.

You are right, each problem is different and it needs a specific solution and guids might be good in some scenarios and they might be bad in others. As we say, the devil is in the details. Perhaps I am bit more sensitive about GUIDs because of my painful past experience working with people that used them just because. I.e. no reasoning, no thought. So, sorry, I didn't want this to be another religious war, even though sometimes is good to argue philosophically while drinking beer ;-)

I am just curious about one thing, people here seem to come from an ORM background. I am a bit surprised to see the posts against the use of identity PKs (see @Elio Batista's post and I also found other articles) for performance reasons. In my case, if performance was an issue I would probably not use nhibernate at all because I like to be in full control of the database access. If you have a table whose PK is updated by, let's say a HiLo algorithm, what happens if you have to have another process that that needs to insert data in the same table, and this process doesn't use NHibernate or even .Net? You really need to know how it works in NHibernate...

Oskar Berggren

unread,
Nov 29, 2012, 3:15:37 PM11/29/12
to nhu...@googlegroups.com
2012/11/29 costa <costa...@gmail.com>:
> performance reasons. In my case, if performance was an issue I would
> probably not use nhibernate at all because I like to be in full control of

I suppose that should be understood as "ok, we use nhibernate for
SomeReason - how do we get best performance from i?" But as someone
mentioned earlier, it is mostly relevant if you insert a reasonable
large amount of data in the same unit.


> the database access. If you have a table whose PK is updated by, let's say a
> HiLo algorithm, what happens if you have to have another process that that
> needs to insert data in the same table, and this process doesn't use
> NHibernate or even .Net? You really need to know how it works in
> NHibernate...

That is a drawback. HiLo is documented and not too difficult, but in
my opinion the "pooled_low" generator should be the primary option,
since then you don't really have to implement any "mathematics" - you
just reserve a block of numbers by incrementing a counter by the
amount of numbers you need.

/Oskar

costa

unread,
Nov 29, 2012, 3:26:56 PM11/29/12
to nhu...@googlegroups.com
@Oskar:

Do you have more information about the pooled_low generator?

Also, what happens if you have server generated version columns? Do they lead to extra-trips to the server?

Thanks

Oskar Berggren

unread,
Nov 30, 2012, 6:34:13 AM11/30/12
to nhu...@googlegroups.com
2012/11/29 costa <costa...@gmail.com>:
> @Oskar:
>
> Do you have more information about the pooled_low generator?

It's part of the enhanced id generators. Not mentioned in the
nhibernate docs unfortunately, but they are supported since NH 3.3.

The documentation from Hibernate, section 5.1.2.3 and 5.1.2.3.1 applies:
http://docs.jboss.org/hibernate/orm/4.1/manual/en-US/html/ch05.html#mapping-declaration-id-enhanced

Basically you can choose between a table based generator or a sequence
like generator. The latter will use
sequences if supported by the DB, otherwise emulate sequence semantics
using a table.

In addition to the optimizers mentioned in the documentation, there is
also the "pooled-lo" optimizer.

The pooled optimizers are IMHO nicer than HiLo since the relationship
between used id values and the current value stored in the generator
table/sequence is more clear. Also, it becomes easier to use different
increment sizes over time. If you use the enhanced TableGenerator and
pooled-lo (not pooled), you can even let different concurrent clients
use different increment sizes. And the DBA can use increment-by-1 to
reserve a single value to use for some table.

pooled and pooled-lo will grab the "nextvalue" from the sequence or
table. The sequence (automatically) or table (by UPDATE from NH) will
then be updated so that the next nextvalue is the read value +
increment size.
IIRC, pooled will now use values in the range
(readvalue-incrementsize) up to (readvalue-1) (so all clients must
agree on the increment size).
pooled-lo will instead use the range readvalue up to (readvalue +
incrementsize -1), which allows for different clients to use different
increments.

/Oskar


> Also, what happens if you have server generated version columns? Do they
> lead to extra-trips to the server?
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "nhusers" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/nhusers/-/k0AqXOOg70QJ.
Reply all
Reply to author
Forward
0 new messages