Compare two tables of equal structure?

Roland Zumkeller

unread,

Jan 31, 2000, 3:00:00 AM1/31/00

to

I got two tables of equal structure (same field names, same datatypes).
The second table is in fact the same table as the first one, but some of
the content may have changed.
Now I want to get all records that have changed, i.e. which differ in at
least one field. Is there a way to accomplish this without writing
100-line long SQL-Query with a WHERE clause like this?:

...
WHERE
(old.a <> new.a) OR
(old.b <> new.b) OR
(old.c <> new.c) OR
(old.d <> new.d) OR
...

I'm grateful for any hint.

DaveSatz

unread,

Jan 31, 2000, 3:00:00 AM1/31/00

to

This code we use for audit tables should get you started:
/*
'Description: this will generate code to do a column by column comparison
against a table
*/
SET NOCOUNT ON
DECLARE @x sysname

SELECT @x = 'training_invoice_det_breakout' -- invoice_detail_breakout' --
invoice_detail' -- header' -- training_invoice_header' -- detail' --
training_invoice_det_breakout' -- invoice_detail_breakout' --
invoice_detail' -- = 'invoice_header'

CREATE TABLE #x
(SQL varchar(255)
, colid int )

INSERT #x
select 'OR IsNull( i.' + name + ', "12/31/9999" ) <> IsNull( d.' + name +
', "12/31/9999" )'
, colid
from syscolumns
where id = object_id(@x)
and status = 8
AND name <> 'timestamp'
AND usertype = 12
union
select 'OR IsNull( i.' + name + ', -9999) <> IsNull( d.' + name +
', -9999)'
, colid
from syscolumns
where id = object_id(@x)
and status = 8
AND name <> 'timestamp'
and usertype = 8
union
select 'OR IsNull( i.' + name + ', "" ) <> IsNull( d.' + name + ', "" )'
, colid
from syscolumns
where id = object_id(@x)
and status = 8
AND name <> 'timestamp'
and scale IS NULL
AND usertype NOT IN ( 12, 8 )
union
select 'OR IsNull( i.' + name + ', 0) <> IsNull( d.' + name + ', 0)'
, colid
from syscolumns
where id = object_id(@x)
and status = 8
AND name <> 'timestamp'
and prec = 3
AND scale = 0
AND usertype NOT IN ( 12, 8 )
union
select 'OR IsNull( i.' + name + ', -9999) <> IsNull( d.' + name +
', -9999)'
, colid
from syscolumns
where id = object_id(@x)
and status = 8
AND name <> 'timestamp'
and scale IS NOT NULL
and prec <> 3
AND usertype NOT IN ( 12, 8 )
union
select 'OR i.' + name + ' <> d.' + name
, colid
from syscolumns
where id = object_id(@x)
and status <> 8
AND name <> 'timestamp'
ORDER BY 2

SELECT SQL
FROM #x

DROP TABLE #X
SET NOCOUNT OFF
GO

--
Thanks,
David Satz
Principal Software Engineer
Hyperion Solutions
(using VB6 SP3/MTS/SQL Server 6.5 SP5a)
(Please respond to the newsgroup.)

Roland Zumkeller <Zumk...@cu-muc.de> wrote in message
news:3895931E...@cu-muc.de...

joe_...@my-deja.com

unread,

Jan 31, 2000, 3:00:00 AM1/31/00

to

>> I got two tables of equal structure (same field names, same
datatypes). The second table is in fact the same table as the first
one, but some of the content may have changed. Now I want to get all
records that have changed, i.e. which differ in at least one field. Is
there a way to accomplish this without writing 100-line long SQL-Query
with a WHERE clause <<

I am not sure if this will work, but here goes:

SELECT 'old: ', OldTable.*
FROM (SELECT *
FROM OldTable
UNION ALL
SELECT 'New: ', NewTable.*
FROM NewTable) AS EveryThing
GROUP BY <all the columns>
HAVING COUNT(*) = 1;

--CELKO--

Sent via Deja.com http://www.deja.com/
Before you buy.

Alan Enderby

unread,

Jan 31, 2000, 3:00:00 AM1/31/00

to

You can use the getchecksum function (see BOL for details).
It is used in validating the synchonization of replicated databases.

Here is an example that will compare 2 tables of same definition.
As long as you have a unique column (primary key) it will highlight
different rows,

We use this technique for comparing all out test, acceptace and live
versions against one another.

We can comapre the data in two 30Gig databases in under 15mins

select TOP 10 DB="Database1",m.* ," "=char(10) + "Database2" ,
s.*, " "=char(10)
from database1..tablename m
full outer join database2..tablename s
on m.pkeycolumnname = s.pkeycolumnname
where isnull(getchecksum(m.pkeycolumnname,1),0) <> isnull(getchecksum
(s.pkeycolumnname,1),0)

Where
- database1 and database 2 are the databases you are comparing accross
- tablename is the table you are comparing
- pkeycolumnname is the name of the column(s) comprising the primary
key or some column containg a unique value.
- the top 10 is just to restrict the errors coming back.

In article <3895931E...@cu-muc.de>,

Roland Zumkeller <Zumk...@cu-muc.de> wrote:
> I got two tables of equal structure (same field names, same
datatypes).
> The second table is in fact the same table as the first one, but some
of
> the content may have changed.
> Now I want to get all records that have changed, i.e. which differ in
at
> least one field. Is there a way to accomplish this without writing

> 100-line long SQL-Query with a WHERE clause like this?:
>
> ...
> WHERE
> (old.a <> new.a) OR
> (old.b <> new.b) OR
> (old.c <> new.c) OR
> (old.d <> new.d) OR
> ...
>
> I'm grateful for any hint.
>
>

Roland Zumkeller

unread,

Jan 31, 2000, 3:00:00 AM1/31/00

to

The idea sounds good. But I couldn't find any documentation on the
getchecksum function, not in MSDN, even not on MS's website. How comes
you know it?
In case this is a secret you found out, can you please tell me what the
two parameters for getchecksum mean?

Thanks a lot!

Roland

Alan Enderby

unread,

Jan 31, 2000, 3:00:00 AM1/31/00

to

Do a search for checksum in BOL
Should get about 15 hits

Description of function is found under "sp_table_validation"
And example similar to my code under "Troubleshooting Replication"

By the way its a SQL 7 function not ^.5

Heres the definition.

Checksum computes a 32-bit cyclic redundancy check (CRC) on the entire
row image on the page. It does not selectively check columns and cannot
operate on a view or vertical partition of the table. Also, the
checksum skips the contents of text and image columns (by design).

When doing a checksum, the structure of the table must be identical
between the two servers; that is, the tables must have the same columns
created and existing in the same order, same data types and lengths,
and same NULL/NOT NULL conditions. For example, if the Publisher did a
CREATE TABLE, then an ALTER TABLE to add columns, but the script
applied at the Publisher is a simple CREATE table, the structure is NOT
the same. If you are not certain that the structure of the two tables
is identical, look at syscolumns and confirm that the offset in each
table is the same.

Floating point values are likely to generate checksum differences if
character-mode bcp was used, which is the case if the publication has
heterogeneous subscribers. These are due to minor and unavoidable
differences in precision when doing conversion to and from character
mode.

In article <3895C2D0...@cu-muc.de>,

Alan Enderby

unread,

Jan 31, 2000, 3:00:00 AM1/31/00

to

And in answer to what are the 2 parameters for getchecksum

I've no idea

I just used the the example in BOL , looked at SP_TABLE_VALIDATION and
did some experimenting.

Here is the example

SELECT authors.au_id AS Local_PK, remote.au_id AS Remote_PK,

(getchecksum(authors.au_id,1),0) AS Local_chksum,remchksum

FROM authors

FULL OUTER JOIN

OPENROWSET('SQLOLEDB','SUBSRV';'sa';'sa_password',"SELECT au_id,
(getchecksum(NULL,1),0) AS remchksum FROM pubs.dbo.authors" ) AS remote

ON (authors.au_id=remote.au_id)

WHERE

-- Find rows with same primary key but different checksums

(getchecksum(authors.au_id,1),0) <> remchksum

OR

-- Find rows which do not exist on one side or the other

authors.au_id IS NULL OR remote.au_id IS NULL