int vs varchar recordset retrieval performance differences

Peter

unread,

Oct 12, 2009, 1:16:01 PM10/12/09

to

Hello.

We are creating a new SS 2005 database from 3 ragtag ss2000 databases. Some
of the tables used between the 3 are almost identical, but not exact. Our
goal is to create the fastest performance table possible. For example, a
Customer table exists in all 3 but each has a different "ID" field, custnum,
customernumber and customernum, all int data types. We want to name the new
database field customerID and give it a varchar data type, but only if
there's not a performance hit by creating the "ID" field as a varchar.

Ignoring indexing, is there a performance hit by having the "linkable" field
a varchar instead of an int?

Russell Fields

unread,

Oct 12, 2009, 2:17:44 PM10/12/09

to

Peter,

Yes, there is some difference because of I/O. A varchar column will take
more space than an integer, requiring more I/O to read and write data.
(Therefore, it does involve indexes, too.)

Namely, the longer the entry in the index, the fewer the number of rows kept
in an index page, the more pages needed for the index, and the longer it
takes to do the I/O (physical or virtual) across the additional pages. (Or,
to reverse the equation, the smaller the column used for the index, the
faster the indexes can be traversed.)

Now, how much difference it will make to you depends on the size of the
database, the complexity of the joins, and so forth. If there are just a
few thousand rows the measurable impact will be smaller (maybe not even
noticable) than if there are millions of rows.

If the customerID is intended to be strictly an integer value, then an
integer will perform better and it cannot be polluted by non-integer values.
But if the customerID is more complex and may contain alphabetics and
numerics, then you have to use varchar.

Definitely just for what it is worth,
RLF

"Peter" <Pe...@discussions.microsoft.com> wrote in message
news:4391FA07-03E7-47DD...@microsoft.com...

Peter

unread,

Oct 12, 2009, 3:11:01 PM10/12/09

to

Thanks Russell!

Our Customer table has only about 10k rows, but we will want to link to our
Sales table which is about 3M rows. (again we would want to have a
varchar(8) CustomerID in the "new" Sales table as well.

I googled "SQL Server data type best practices" and "int vs varchar
performance difference" with sporadic almost-hits at best. Is there any MSDN
link or Books Online subject that I can use as a reference in my research?

Pete

Russell Fields

unread,

Oct 12, 2009, 4:20:11 PM10/12/09

to

Peter,

You will find a lot of people expressing the opinion (correct) that INT is
better than VARCHAR for performance, but your mileage may vary. I don't
have a good link for you, but in a similar (but not identical) vein, here is
a post from Kimberley Tripp that may help.
http://www.sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx

If you read her blog and search indexes you will get a lot of good advice.

Assuming 3,000,000 rows times INT (4 bytes) the leaf index entries take up
12 MB
Assuming 3,000,000 rows times VARCHAR with an average of 5 bytes, plus 2
bytes for the length, the leaf entries take up 21 MB.

Provided I did not mess up, the VARCHAR will be approaching twice the size
of the INT.

RLF

"Peter" <Pe...@discussions.microsoft.com> wrote in message

news:46047380-630F-4FE7...@microsoft.com...

Erland Sommarskog

unread,

Oct 12, 2009, 5:39:16 PM10/12/09

to

In additions to Russel's posts, comparing integers is faster than
comparing varchar(8), since character data is more complex to compare.
You can cut some of this time, by choosing a binary collation for the
id. This also has the advantage that an ID cannot appears as "alfki"
in one place and "ALFKI" in another.

There is a second thing to be aware of. A common problem with varchar
data is that many modern client API tend to favour Unicode types. Java
is the prime examples, but I've seen it happen with .Net too. This
can lead to code like:

WHERE indexedvarcharcol = @nvarcharvalue

This results in an implicit conversion of the varchar column to nvarchar.
If you have a Windows collation, the index is still used, but in a more
roundabout way. Expect an overhead of 200-300%. But if you have an
SQL collation, and this is the default when you use install SQL Server
with US English as you system locale, the index is not usable, and this
can lead to severe performance issues.

My recommendation is that you go with int, unless you have a compelling
business reason to use varchar.

--
Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx