My question could be simple, but I don't know how to approach
the task I've just asked.
I'm asked to write an SQL report which is for finding duplicate
serial numbers. All the serial numbers are supposed to be unique,
but for some reason, there are duplicate serial numbers in the
database.
I first thought the following code was okay since I was looking
for not distinct serial numbers:
| SELECT NOT DISTINCT serial_no
| FROM serial
This does not run. I get a syntax error message.
If there is a keyword for finding identical items, I would appreciate
your advice.
Thanks in advance!
alea
Make sure you have an index on the serial_no column...
Allen
--
Paul Watson #
Oninit Ltd # Growing old is mandatory
Tel: +44 1436 672201 # Growing up is optional
Fax: +44 1436 678693 #
www.oninit.com #
------_=_NextPart_001_01C1C51F.ED52D010
Content-Type: text/plain
> -----Original Message-----
> From: alea...@hotmail.com [SMTP:alea...@hotmail.com]
> Sent: Wednesday, March 06, 2002 5:51 AM
> To: inform...@iiug.org
> Subject: How To Find Duplicate Entries Using SQL
>
> Hello all,
>
> My question could be simple, but I don't know how to approach
> the task I've just asked.
>
> I'm asked to write an SQL report which is for finding duplicate
> serial numbers. All the serial numbers are supposed to be unique,
> but for some reason, there are duplicate serial numbers in the
> database.
>
> I first thought the following code was okay since I was looking
> for not distinct serial numbers:
>
> | SELECT NOT DISTINCT serial_no
> | FROM serial
>
> This does not run. I get a syntax error message.
>
> If there is a keyword for finding identical items, I would appreciate
> your advice.
>
> Thanks in advance!
>
> alea
------_=_NextPart_001_01C1C51F.ED52D010
Content-Type: text/html
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
5.5.2653.12">
<TITLE>RE: How To Find Duplicate Entries Using SQL</TITLE>
</HEAD>
<BODY>
<BR>
<BR>
<UL>
<P><FONT SIZE=3D1 FACE=3D"Arial">-----Original Message-----</FONT>
<BR><B><FONT SIZE=3D1 FACE=3D"Arial">From: </FONT></B> <FONT =
SIZE=3D1 FACE=3D"Arial">alea...@hotmail.com =
[SMTP:alea...@hotmail.com]</FONT>
<BR><B><FONT SIZE=3D1 FACE=3D"Arial">Sent: </FONT></B> <FONT =
SIZE=3D1 FACE=3D"Arial">Wednesday, March 06, 2002 5:51 AM</FONT>
<BR><B><FONT SIZE=3D1 =
FACE=3D"Arial">To: </FONT></B> <FONT SIZE=3D1 =
FACE=3D"Arial">inform...@iiug.org</FONT>
<BR><B><FONT SIZE=3D1 =
FACE=3D"Arial">Subject: </FONT>=
</B> <FONT SIZE=3D1 FACE=3D"Arial">How To Find Duplicate Entries Using =
SQL</FONT>
</P>
<P><FONT SIZE=3D2 FACE=3D"Arial">Hello all,</FONT>
</P>
<P><FONT SIZE=3D2 FACE=3D"Arial">My question could be simple, but I =
don't know how to approach</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">the task I've just asked.</FONT>
</P>
<P><FONT SIZE=3D2 FACE=3D"Arial">I'm asked to write an SQL report which =
is for finding duplicate</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">serial numbers. All the serial =
numbers are supposed to be unique,</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">but for some reason, there are =
duplicate serial numbers in the</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">database.</FONT>
</P>
<P><FONT SIZE=3D2 FACE=3D"Arial">I first thought the following code was =
okay since I was looking</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">for not distinct serial =
numbers:</FONT>
</P>
<P><FONT SIZE=3D2 FACE=3D"Arial"> | SELECT NOT =
DISTINCT serial_no</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial"> =
| FROM serial</FONT>
</P>
<P><FONT SIZE=3D2 FACE=3D"Arial">This does not run. I get a syntax =
error message.</FONT>
</P>
<P><FONT SIZE=3D2 FACE=3D"Arial">If there is a keyword for finding =
identical items, I would appreciate</FONT>
<BR><FONT SIZE=3D2 FACE=3D"Arial">your advice.</FONT>
</P>
<P><FONT SIZE=3D2 FACE=3D"Arial">Thanks in advance!</FONT>
</P>
<P><FONT SIZE=3D2 FACE=3D"Arial">alea</FONT>
</P>
</UL>
</BODY>
</HTML>
------_=_NextPart_001_01C1C51F.ED52D010--
> -----Original Message-----
> From: alea...@hotmail.com [SMTP:alea...@hotmail.com]
> Sent: Wednesday, March 06, 2002 5:51 AM
> To: inform...@iiug.org
> Subject: How To Find Duplicate Entries Using SQL
>
> Hello all,
>
> My question could be simple, but I don't know how to approach
> the task I've just asked.
>
> I'm asked to write an SQL report which is for finding duplicate
> serial numbers. All the serial numbers are supposed to be unique,
> but for some reason, there are duplicate serial numbers in the
> database.
>
> I first thought the following code was okay since I was looking
> for not distinct serial numbers:
>
> | SELECT NOT DISTINCT serial_no
> | FROM serial
>
Try this:
select unique COLUMN_NAME,count(*) from TABLE_NAME group by 1 order
by 2 desc
Will give you each unique value and the number of duplicates for
each value.
Regards,
Bill
Anyway, one simple select will be
select serial_no, count(*)
from tablename
group by 1
having count(*) > 1
Dirk
-----Original Message-----
From: alea...@hotmail.com [mailto:alea...@hotmail.com]
Sent: 06 March 2002 12:51
To: inform...@iiug.org
Subject: How To Find Duplicate Entries Using SQL
Hello all,
My question could be simple, but I don't know how to approach the task I've
just asked.
I'm asked to write an SQL report which is for finding duplicate serial
numbers. All the serial numbers are supposed to be unique, but for some
reason, there are duplicate serial numbers in the database.
I first thought the following code was okay since I was looking for not
distinct serial numbers:
| SELECT NOT DISTINCT serial_no
| FROM serial
This does not run. I get a syntax error message.
Try the following
SELECT serial_no, count(*)
FROM serial
GROUP BY serial_no
HAVING count(*) > 1
Keith Simmons
Banner Business Supplies Limited
Informix Certified Professional
-> -----Original Message-----
-> From: alea...@hotmail.com [mailto:alea...@hotmail.com]
-> Sent: Wednesday, March 06, 2002 10:51 AM
-> To: inform...@iiug.org
-> Subject: How To Find Duplicate Entries Using SQL
->
->
-> Hello all,
->
-> My question could be simple, but I don't know how to approach
-> the task I've just asked.
->
-> I'm asked to write an SQL report which is for finding duplicate
-> serial numbers. All the serial numbers are supposed to be unique,
-> but for some reason, there are duplicate serial numbers in the
-> database.
->
-> I first thought the following code was okay since I was looking
-> for not distinct serial numbers:
->
-> | SELECT NOT DISTINCT serial_no
-> | FROM serial
->
-> This does not run. I get a syntax error message.
->
-> If there is a keyword for finding identical items, I would appreciate
-> your advice.
->
-> Thanks in advance!
->
-> alea
->
**********************************************************************************
This message may contain information which is confidential and subject to
legal privilege. If you are not the intended recipient, you may not peruse,
use, disseminate, distribute or copy this message. If you have received this
message in error, please notify the sender immediately by email, facsimile
or telephone and return and/or destroy all copies of the message.
This footnote also confirms that this email message has been swept for
the presence of computer viruses, however we cannot guarantee that this
message is free from such problems.
**********************************************************************************
-----Original Message-----
From: alea...@hotmail.com [mailto:alea...@hotmail.com]
Sent: Wednesday, March 06, 2002 02:51
To: inform...@iiug.org
Subject: How To Find Duplicate Entries Using SQL
Hello all,
My question could be simple, but I don't know how to approach
the task I've just asked.
I'm asked to write an SQL report which is for finding duplicate
serial numbers. All the serial numbers are supposed to be unique,
but for some reason, there are duplicate serial numbers in the
database.
I first thought the following code was okay since I was looking
for not distinct serial numbers:
| SELECT NOT DISTINCT serial_no
| FROM serial
This does not run. I get a syntax error message.
If there is a keyword for finding identical items, I would appreciate
Vivek Chaudhary
STMicroelectronics
Following is an example of a query that I run all the time. Hope it helps
select serial_no
from serial_table a, serial_table b
where a.serial_no = b.serial_no
and a.rowid < b.rowid
Laurie Gustin
Department of Public Safety
801-965-4410
>>> aleatory <alea...@hotmail.com> 03/06/02 03:50AM >>>
easy
select unique(serial_no),count(*) from serial
group by serial_no
having count(*) > 1
enjoy
Select column_name from table group by column_name having count(*) > 1;
cheers
j.
----- Original Message -----
From: "aleatory" <alea...@hotmail.com>
To: <inform...@iiug.org>
Sent: Wednesday, March 06, 2002 5:50 AM
Subject: How To Find Duplicate Entries Using SQL
We have site's that rely on the duplication of serial's, they the
first row, get a new serial and then insert the rest of the data
with the same serial.
--
Here's yet another way
select distinct a.rowid, a.serial_no, b.rowid, b.serial_no
from serial a,serial b
where a.serial_no = b.serial_no
and a.rowid != b.rowid -- very important
This way you wil not only get the duplicates but also their rowids
you can then use these to trash the duplicates that you want to trash
remember that one of the results returned will be
the row that you wanna keep.
eg: four duplicate rows:
a.rowid a.serial_no b.rowid b.serial_no
256 100 312 100
257 100 417 100
You will find that at least one of these will be the one you want eg rowid
256
you then need to delete rowid's 312 257 and 417 etc
Be sure to make a backup before you begin deleteing...
Hope this helps
Leon
-----Původní zpráva-----
Od: Dirk Moolman
Odesláno: 6. března 2002 16:08
Komu: 'alea...@hotmail.com'; inform...@iiug.org
Předmět: RE: How To Find Duplicate Entries Using SQL
Are you talking about the "serial" data type, or do you have your own user
defined column called serial number ? The serial data type should never
duplicate, and is handled internally by Informix.
It is not true. If you have e.g. table a with columns b serial, c int and you perform e.g.
Insert into a values (1,100) x times, you will have x duplicate values in serial-column b.
Regards,
Vaclav
Or are you perhaps talking about the rowid column ?
Anyway, one simple select will be
select serial_no, count(*)
from tablename
group by 1
having count(*) > 1
Dirk
-----Original Message-----
From: alea...@hotmail.com [mailto:alea...@hotmail.com]
Sent: 06 March 2002 12:51
If you build the table and add the indexes using isql or dbaccess, their
table editors automatically and silently add a unique index to a serial
column if you define one, but that's a "value added" action on their part,
not something that the engine does all by itself.
A serial type is however enforced to be NOT NULL by the engine.
All the suggestions to use the count(*) and having clauses is the correct
way to *find* all duplicate values of fields (whatever they may be) but the
ugly part is then deleting or correcting the sucky rows.
first, let me restate the suggested SQLS (in my slightly rearranged and
preferred manner, 'cos it's more flexible as you'll maybe be able to work
out for yourself):
select count(*), key1, key2, .....
from dirty_table
group by 2, 3, ....
having count(*) > 1
into temp duplicate_sods with no log;
NOTE: the ...... represent as many fields as you are trying to dup-check,
and also the .... in the group by clause must be filled with 2 ...
number-of-key-fields. Gettit? Good.
Now, you've got the dup sets listed in a temp table. The really ugly part
starts now. If all rows with duplicate keys are truly duplicate, then you
really don't give a damn which of the duplicates stays behind when you
delete all but one of the duplicates. So, you can do this (NOTE: this
depends on having a rowid, so it can't work for fragmented tables that do
not have rowids):
select dt.rowid dt_rowid, dt.key1, dt.key2, ....
from dirty_table dt, duplicate_sods sods
where dt.key1 = sods.key1
and dt.key2 = sods.key2
and .....
into temp getting_closer with no log;
Now you have a selection of all the rowids from the table, still associated
with the keys.
Next step is to get a temp table which contains only the lowest rowid (an
arbitrary choice):
select min(dt_rowid) dt_rowid
from getting_closer
group by key1, key2 ...
into temp really_close_now with no log;
REMOVE those rowids from the previous table:
delete from getting_closer
where dt_rowid in (select rcn.dt_rowid from really_close_now rcn);
Finally, delete the remaining rows from the original table:
delete from dirty_table
where rowid in (select dt_rowid from getting_closer);
This is a purely mechanical procedure to remove duplicates. You should first
check that you can trust the mechanical selection of rows; if there is a
reason to prefer one row over the other, then you need to put human input,
or at least chew around with smarter selects etc in the intermediate stages.
At the hardest level, you might need to make a little program which selects
the dups, presents them to a poor user, and they must decide which ones are
to be saved. Or unload them to files and mess around in editors. Whatever. I
hate it when you need human intervention for this process ;-)
There is also a method using violations tables, which is quite good. Ask if
you are interested in an essay about that.
PS - ruthlessly check the logic presented here - I've just typed it straight
in! Somebody's gotta test it carefully. The best advice given in this thread
so far is to take a backup copy of the entire table before you start.