Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How To Find Duplicate Entries Using SQL

1,385 views
Skip to first unread message

aleatory

unread,
Mar 6, 2002, 5:50:31 AM3/6/02
to
Hello all,

My question could be simple, but I don't know how to approach
the task I've just asked.

I'm asked to write an SQL report which is for finding duplicate
serial numbers. All the serial numbers are supposed to be unique,
but for some reason, there are duplicate serial numbers in the
database.

I first thought the following code was okay since I was looking
for not distinct serial numbers:

| SELECT NOT DISTINCT serial_no
| FROM serial

This does not run. I get a syntax error message.

If there is a keyword for finding identical items, I would appreciate
your advice.

Thanks in advance!

alea

Allen

unread,
Mar 6, 2002, 9:39:24 AM3/6/02
to
select serial_no from serial a where 1 <
(select count(*) from serial b where a.serial_no = b.serial_no) ;

Make sure you have an index on the serial_no column...

Allen

Paul Watson

unread,
Mar 6, 2002, 9:45:05 AM3/6/02
to
select serial_no, count(*)
from eric
group by 1
having count(*) != 1;

--
Paul Watson #
Oninit Ltd # Growing old is mandatory
Tel: +44 1436 672201 # Growing up is optional
Fax: +44 1436 678693 #
www.oninit.com #

Bill Dare

unread,
Mar 6, 2002, 10:02:25 AM3/6/02
to

This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_001_01C1C51F.ED52D010
Content-Type: text/plain

> -----Original Message-----
> From: alea...@hotmail.com [SMTP:alea...@hotmail.com]
> Sent: Wednesday, March 06, 2002 5:51 AM
> To: inform...@iiug.org
> Subject: How To Find Duplicate Entries Using SQL
>
> Hello all,
>
> My question could be simple, but I don't know how to approach
> the task I've just asked.
>
> I'm asked to write an SQL report which is for finding duplicate
> serial numbers. All the serial numbers are supposed to be unique,
> but for some reason, there are duplicate serial numbers in the
> database.
>
> I first thought the following code was okay since I was looking
> for not distinct serial numbers:
>
> | SELECT NOT DISTINCT serial_no
> | FROM serial
>
> This does not run. I get a syntax error message.
>
> If there is a keyword for finding identical items, I would appreciate
> your advice.
>
> Thanks in advance!
>
> alea

------_=_NextPart_001_01C1C51F.ED52D010
Content-Type: text/html
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
5.5.2653.12">
<TITLE>RE: How To Find Duplicate Entries Using SQL</TITLE>
</HEAD>
<BODY>
<BR>
<BR>
<UL>
<P><FONT SIZE=3D1 FACE=3D"Arial">-----Original Message-----</FONT>
<BR><B><FONT SIZE=3D1 FACE=3D"Arial">From:&nbsp;&nbsp;</FONT></B> <FONT =
SIZE=3D1 FACE=3D"Arial">alea...@hotmail.com =
[SMTP:alea...@hotmail.com]</FONT>
<BR><B><FONT SIZE=3D1 FACE=3D"Arial">Sent:&nbsp;&nbsp;</FONT></B> <FONT =
SIZE=3D1 FACE=3D"Arial">Wednesday, March 06, 2002 5:51 AM</FONT>
<BR><B><FONT SIZE=3D1 =
FACE=3D"Arial">To:&nbsp;&nbsp;&nbsp;&nbsp;</FONT></B> <FONT SIZE=3D1 =
FACE=3D"Arial">inform...@iiug.org</FONT>
<BR><B><FONT SIZE=3D1 =
FACE=3D"Arial">Subject:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</FONT>=
</B> <FONT SIZE=3D1 FACE=3D"Arial">How To Find Duplicate Entries Using =
SQL</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Arial">Hello all,</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Arial">My question could be simple, but I =
don't know how to approach</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">the task I've just asked.</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Arial">I'm asked to write an SQL report which =
is for finding duplicate</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">serial numbers. All the serial =
numbers are supposed to be unique,</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">but for some reason, there are =
duplicate serial numbers in the</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">database.</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Arial">I first thought the following code was =
okay since I was looking</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">for not distinct serial =
numbers:</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Arial">&nbsp;&nbsp;&nbsp;&nbsp; | SELECT NOT =
DISTINCT serial_no</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">&nbsp;&nbsp;&nbsp;&nbsp; =
|&nbsp;&nbsp; FROM serial</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Arial">This does not run. I get a syntax =
error message.</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Arial">If there is a keyword for finding =
identical items, I would appreciate</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">your advice.</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Arial">Thanks in advance!</FONT>
</P>

<P><FONT SIZE=3D2 FACE=3D"Arial">alea</FONT>
</P>
</UL>
</BODY>
</HTML>
------_=_NextPart_001_01C1C51F.ED52D010--

Bill Dare

unread,
Mar 6, 2002, 10:04:35 AM3/6/02
to


> -----Original Message-----
> From: alea...@hotmail.com [SMTP:alea...@hotmail.com]
> Sent: Wednesday, March 06, 2002 5:51 AM
> To: inform...@iiug.org
> Subject: How To Find Duplicate Entries Using SQL
>
> Hello all,
>
> My question could be simple, but I don't know how to approach
> the task I've just asked.
>
> I'm asked to write an SQL report which is for finding duplicate
> serial numbers. All the serial numbers are supposed to be unique,
> but for some reason, there are duplicate serial numbers in the
> database.
>
> I first thought the following code was okay since I was looking
> for not distinct serial numbers:
>
> | SELECT NOT DISTINCT serial_no
> | FROM serial
>

Try this:

select unique COLUMN_NAME,count(*) from TABLE_NAME group by 1 order
by 2 desc

Will give you each unique value and the number of duplicates for
each value.

Regards,
Bill

Dirk Moolman

unread,
Mar 6, 2002, 10:08:13 AM3/6/02
to

Are you talking about the "serial" data type, or do you have your own user
defined column called serial number ? The serial data type should never
duplicate, and is handled internally by Informix.
Or are you perhaps talking about the rowid column ?


Anyway, one simple select will be

select serial_no, count(*)
from tablename
group by 1
having count(*) > 1


Dirk

-----Original Message-----
From: alea...@hotmail.com [mailto:alea...@hotmail.com]
Sent: 06 March 2002 12:51
To: inform...@iiug.org
Subject: How To Find Duplicate Entries Using SQL


Hello all,

My question could be simple, but I don't know how to approach the task I've
just asked.

I'm asked to write an SQL report which is for finding duplicate serial
numbers. All the serial numbers are supposed to be unique, but for some
reason, there are duplicate serial numbers in the database.

I first thought the following code was okay since I was looking for not
distinct serial numbers:

| SELECT NOT DISTINCT serial_no
| FROM serial

This does not run. I get a syntax error message.

Simmons, Keith

unread,
Mar 6, 2002, 10:39:22 AM3/6/02
to

Alea

Try the following
SELECT serial_no, count(*)
FROM serial
GROUP BY serial_no
HAVING count(*) > 1

Keith Simmons
Banner Business Supplies Limited
Informix Certified Professional


-> -----Original Message-----
-> From: alea...@hotmail.com [mailto:alea...@hotmail.com]
-> Sent: Wednesday, March 06, 2002 10:51 AM
-> To: inform...@iiug.org
-> Subject: How To Find Duplicate Entries Using SQL
->
->
-> Hello all,
->
-> My question could be simple, but I don't know how to approach
-> the task I've just asked.
->
-> I'm asked to write an SQL report which is for finding duplicate
-> serial numbers. All the serial numbers are supposed to be unique,
-> but for some reason, there are duplicate serial numbers in the
-> database.
->
-> I first thought the following code was okay since I was looking
-> for not distinct serial numbers:
->
-> | SELECT NOT DISTINCT serial_no
-> | FROM serial
->
-> This does not run. I get a syntax error message.
->
-> If there is a keyword for finding identical items, I would appreciate
-> your advice.
->
-> Thanks in advance!
->
-> alea
->

**********************************************************************************
This message may contain information which is confidential and subject to
legal privilege. If you are not the intended recipient, you may not peruse,
use, disseminate, distribute or copy this message. If you have received this
message in error, please notify the sender immediately by email, facsimile
or telephone and return and/or destroy all copies of the message.
This footnote also confirms that this email message has been swept for
the presence of computer viruses, however we cannot guarantee that this
message is free from such problems.
**********************************************************************************

Bernstein, Rick

unread,
Mar 6, 2002, 10:44:40 AM3/6/02
to

There is not a keyword for finding duplicates.
Instead try:
SELECT count(*) Count, serial_no
FROM serial
GROUP BY 2
HAVING COUNT(*) > 1
ORDER BY 2

-----Original Message-----
From: alea...@hotmail.com [mailto:alea...@hotmail.com]

Sent: Wednesday, March 06, 2002 02:51
To: inform...@iiug.org
Subject: How To Find Duplicate Entries Using SQL


Hello all,

My question could be simple, but I don't know how to approach

the task I've just asked.

I'm asked to write an SQL report which is for finding duplicate


serial numbers. All the serial numbers are supposed to be unique,

but for some reason, there are duplicate serial numbers in the

database.

I first thought the following code was okay since I was looking

for not distinct serial numbers:

| SELECT NOT DISTINCT serial_no
| FROM serial

This does not run. I get a syntax error message.

If there is a keyword for finding identical items, I would appreciate

vivek.c...@st.com

unread,
Mar 6, 2002, 10:57:17 AM3/6/02
to

SELECT serial_no,count(*)

FROM serial
GROUP BY serial_no
HAVING COUNT(*) > 1;

Vivek Chaudhary
STMicroelectronics

Laurie Gustin

unread,
Mar 6, 2002, 11:01:40 AM3/6/02
to

The easiest way I have found is just to join the table to itself and look for
matching values, but they have different rowids.

Following is an example of a query that I run all the time. Hope it helps

select serial_no
from serial_table a, serial_table b
where a.serial_no = b.serial_no
and a.rowid < b.rowid

Laurie Gustin
Department of Public Safety
801-965-4410

>>> aleatory <alea...@hotmail.com> 03/06/02 03:50AM >>>

Colin Marchant Parker

unread,
Mar 6, 2002, 11:43:58 AM3/6/02
to
alea...@hotmail.com (aleatory) wrote in message news:<a68a4ee0.02030...@posting.google.com>...

easy

select unique(serial_no),count(*) from serial
group by serial_no
having count(*) > 1

enjoy

Jack Parker

unread,
Mar 6, 2002, 11:05:46 AM3/6/02
to

Select column_name from table group by column_name having count(*) > 1;

cheers
j.


----- Original Message -----
From: "aleatory" <alea...@hotmail.com>
To: <inform...@iiug.org>
Sent: Wednesday, March 06, 2002 5:50 AM
Subject: How To Find Duplicate Entries Using SQL

Paul Watson

unread,
Mar 6, 2002, 12:39:58 PM3/6/02
to
This is wrong, there is no guarentee a serial is unique unless you
put the unique index on it. When you insert with a zero you will
get the next number, if you hit the 2G limit you will wrap around
to 1 and start again. Assuming that you've delete most of the rows
but for some reason the serial id 1 is still there then you will
get a duplicate.

We have site's that rely on the duplication of serial's, they the
first row, get a new serial and then insert the rest of the data
with the same serial.

--

Leon Pappadopoulos

unread,
Mar 6, 2002, 11:53:04 AM3/6/02
to

Hi Alea

Here's yet another way

select distinct a.rowid, a.serial_no, b.rowid, b.serial_no
from serial a,serial b
where a.serial_no = b.serial_no
and a.rowid != b.rowid -- very important

This way you wil not only get the duplicates but also their rowids
you can then use these to trash the duplicates that you want to trash
remember that one of the results returned will be
the row that you wanna keep.

eg: four duplicate rows:

a.rowid a.serial_no b.rowid b.serial_no
256 100 312 100
257 100 417 100

You will find that at least one of these will be the one you want eg rowid
256
you then need to delete rowid's 312 257 and 417 etc

Be sure to make a backup before you begin deleteing...

Hope this helps

Leon

CPP

unread,
Mar 7, 2002, 9:54:36 AM3/7/02
to


-----Původní zpráva-----
Od: Dirk Moolman
Odesláno: 6. března 2002 16:08
Komu: 'alea...@hotmail.com'; inform...@iiug.org
Předmět: RE: How To Find Duplicate Entries Using SQL

Are you talking about the "serial" data type, or do you have your own user
defined column called serial number ? The serial data type should never
duplicate, and is handled internally by Informix.

It is not true. If you have e.g. table a with columns b serial, c int and you perform e.g.
Insert into a values (1,100) x times, you will have x duplicate values in serial-column b.
Regards,
Vaclav

Or are you perhaps talking about the rowid column ?


Anyway, one simple select will be

select serial_no, count(*)
from tablename
group by 1
having count(*) > 1


Dirk

-----Original Message-----
From: alea...@hotmail.com [mailto:alea...@hotmail.com]
Sent: 06 March 2002 12:51

Andrew Hamm

unread,
Mar 10, 2002, 8:15:14 PM3/10/02
to
Paul Watson wrote in message <3C86546E...@oninit.com>...

>
>We have site's that rely on the duplication of serial's, they the
>first row, get a new serial and then insert the rest of the data
>with the same serial.
>
That's a cute trick we haven't considered..... although i would expect you'd
typically have a master record with the serial, and related detail rows
storing the stuff. Smacks of redundancy as such...

Andrew Hamm

unread,
Mar 10, 2002, 8:33:56 PM3/10/02
to
aleatory wrote in message ...
Many people have mentioned that serials should automagically have a
uniqueness enforced by the engine, but this is not infact true.

If you build the table and add the indexes using isql or dbaccess, their
table editors automatically and silently add a unique index to a serial
column if you define one, but that's a "value added" action on their part,
not something that the engine does all by itself.

A serial type is however enforced to be NOT NULL by the engine.

All the suggestions to use the count(*) and having clauses is the correct
way to *find* all duplicate values of fields (whatever they may be) but the
ugly part is then deleting or correcting the sucky rows.

first, let me restate the suggested SQLS (in my slightly rearranged and
preferred manner, 'cos it's more flexible as you'll maybe be able to work
out for yourself):

select count(*), key1, key2, .....
from dirty_table
group by 2, 3, ....
having count(*) > 1
into temp duplicate_sods with no log;

NOTE: the ...... represent as many fields as you are trying to dup-check,
and also the .... in the group by clause must be filled with 2 ...
number-of-key-fields. Gettit? Good.

Now, you've got the dup sets listed in a temp table. The really ugly part
starts now. If all rows with duplicate keys are truly duplicate, then you
really don't give a damn which of the duplicates stays behind when you
delete all but one of the duplicates. So, you can do this (NOTE: this
depends on having a rowid, so it can't work for fragmented tables that do
not have rowids):

select dt.rowid dt_rowid, dt.key1, dt.key2, ....
from dirty_table dt, duplicate_sods sods
where dt.key1 = sods.key1
and dt.key2 = sods.key2
and .....
into temp getting_closer with no log;

Now you have a selection of all the rowids from the table, still associated
with the keys.

Next step is to get a temp table which contains only the lowest rowid (an
arbitrary choice):

select min(dt_rowid) dt_rowid
from getting_closer
group by key1, key2 ...
into temp really_close_now with no log;

REMOVE those rowids from the previous table:

delete from getting_closer
where dt_rowid in (select rcn.dt_rowid from really_close_now rcn);

Finally, delete the remaining rows from the original table:

delete from dirty_table
where rowid in (select dt_rowid from getting_closer);

This is a purely mechanical procedure to remove duplicates. You should first
check that you can trust the mechanical selection of rows; if there is a
reason to prefer one row over the other, then you need to put human input,
or at least chew around with smarter selects etc in the intermediate stages.

At the hardest level, you might need to make a little program which selects
the dups, presents them to a poor user, and they must decide which ones are
to be saved. Or unload them to files and mess around in editors. Whatever. I
hate it when you need human intervention for this process ;-)

There is also a method using violations tables, which is quite good. Ask if
you are interested in an essay about that.

PS - ruthlessly check the logic presented here - I've just typed it straight
in! Somebody's gotta test it carefully. The best advice given in this thread
so far is to take a backup copy of the entire table before you start.

0 new messages