How to delete duplicate rows with sql

-S.POTHIREDDY

unread,

May 5, 1995, 3:00:00 AM5/5/95

to

To all sql experts:

Could any body tell me how to delete duplicate rows in a table. I was
able to list the duplicate rows in table like this:

select rowid, a.*
from a, b {a & b are aliases for sam table. basically a self join}
where a.rowid != b.rowid;

I can't this select statement as subquery because we can not the update
the table in a subquery. Any ideas ... please. BTW we have not yet
installed 4gl yet.

Thanks in advance.

Siva Pothireddy

AT&T

Lester Knutsen

unread,

May 6, 1995, 3:00:00 AM5/6/95

to

Siva,

This is a quick version of what I have done in the past. Please check
it carefully as I don't have my old code with me. I assume you want to
save one of the duplicates and delete all the other duplicates.

{ select the duplicates }
select keyfield, count(*)
from table
group by keyfield haveing count(*) > 1
into temp A;

{ select rowids of duplicates }
select rowid, keyfield
from table
where keyfield in ( select keyfield from A)
into temp B;

{ save one of the duplicates so it does not get deleted }
select keyfield, min(rowid) save_rowid
from B
group by keyfield
into temp C;

{ delete the duplicates }
delete from table
where rowid in
( select rowid from B where rowid not in
( select save_rowid from C ));

Again, test this carefully and make a backup before you delete.

Regards - lester

#############################################################################
# Lester Knutsen les...@access.digex.net #
# Advanced DataTools Corporation Voice: 703-256-0267 #
# Grant group privileges for Informix databases with DB Privileges #
#############################################################################

Wayne Harlech-Jones

unread,

May 6, 1995, 3:00:00 AM5/6/95

to

Siva Pothireddy asked:

>Could any body tell me how to delete duplicate rows in a table.

One method I use, which is long winded an inelegant, but does work is as
follows:
In SQL do
unload to "dupl.unl"
select unique * from dup_table;
delete from dup_table where 1=1;
load from "dupl.unl"
insert into dup_table;

Dup_table is the name of the table woth the duplicates.

You can speed up this process by dropping you indexes first nad recreating
them afterward. It is also a good idea to turn off logging or have
continous logging on as this process fills up the logs incredibly quikly.

Yours
Wayne

|\/\/ayne
|Wayne Harlech-Jones
|Windhoek, Namibia, Africa
|wa...@jones.mac.alt.na

John Prideaux

unread,

May 8, 1995, 3:00:00 AM5/8/95

to

In <D84ED...@nntpa.cb.att.com> si...@hogpe.ho.att.com (-S.POTHIREDDY)
writes:

>Could any body tell me how to delete duplicate rows in a table. I was
>able to list the duplicate rows in table like this:
>
>select rowid, a.*
> from a, b {a & b are aliases for sam table. basically a self
join}
> where a.rowid != b.rowid;
>

Use another table hold your unique rows. Two choices:
1) Use a temp table to hold the unique rows, delete from the
permanent
table, and insert from the temp table
or 2) Use a permanent table, insert the unique rows, drop the
original,
and rename the other.

Hope this helps.

John Prideaux
ala...@ix.netcom.com

Catalin Badea

unread,

May 8, 1995, 3:00:00 AM5/8/95

to

-S.POTHIREDDY (si...@hogpe.ho.att.com) wrote:
: Could any body tell me how to delete duplicate rows in a table. I was

: able to list the duplicate rows in table like this:

: Siva Pothireddy

Here's one solution:

select min(rowid) tmp_rowid from _tablename_ group by _key_fields_
into temp tmp_table with no log;

delete from _tablename_
where rowid not in
(select tmp_rowid from tmp_table);

Replace _tablename_ and _key_fields_ with appropriate names.

Catalin.

--
===========================================================================

Catalin Badea _/_/_/ _/_/_/ _/_/_/ _/_/_/ _/ _/ _/_/_/
_/ _/ _/ _/ _/_/ _/
Senior Technical Architect _/ _/_/ _/ _/_/_/ _/ _/_/_/
_/ _/ _/ _/ _/ _/
Tecsys Inc. _/ _/_/_/ _/_/_/ _/_/_/ _/ _/_/_/

====== c...@tecsys.com =========== include system "disclaimer.4gh" ======

John H. Frantz

unread,

May 8, 1995, 3:00:00 AM5/8/95

to

c...@tecsys.com (Catalin Badea) wrote:
>
> -S.POTHIREDDY (si...@hogpe.ho.att.com) wrote:
> : Could any body tell me how to delete duplicate rows in a table. I was
> : able to list the duplicate rows in table like this:
> : Siva Pothireddy
>
> Here's one solution:
>
> select min(rowid) tmp_rowid from _tablename_ group by _key_fields_
> into temp tmp_table with no log;
>
> delete from _tablename_
> where rowid not in
> (select tmp_rowid from tmp_table);
>
> Replace _tablename_ and _key_fields_ with appropriate names.
>
> Catalin.
>

Many solutions have appeared using 'rowid' (I like this one the best).
However, what happens when Informix stops using 'rowid' as in version
7.1 with the table fragmentation feature? How can one then distinguish
rows from one another?

Rodrigo Lopez Musa

unread,

May 10, 1995, 3:00:00 AM5/10/95

to

In article <D84ED...@nntpa.cb.att.com>, si...@hogpe.ho.att.com (-S.POTHIREDDY) says:
>
>To all sql experts:

>
>Could any body tell me how to delete duplicate rows in a table. I was
>able to list the duplicate rows in table like this:
>

>select rowid, a.*
> from a, b {a & b are aliases for sam table. basically a self join}
> where a.rowid != b.rowid;
>

I recommend you to create a temporary table with unique rows like this:
select unique a.*
from a
into temp uniq_a;
drop table a; |You can replace this two instructiona by
create table a (....); | a one delete but it is a time consuming
| process.
then
insert into a (select * from uniq_a);
If anyone kwons a better way, please tell me how!!

>I can't this select statement as subquery because we can not the update
>the table in a subquery. Any ideas ... please. BTW we have not yet
>installed 4gl yet.
>
>Thanks in advance.
>

>Siva Pothireddy
>
>AT&T

Rodrigo Lopez Musa

unread,

May 10, 1995, 3:00:00 AM5/10/95

to

In article <D84ED...@nntpa.cb.att.com>, si...@hogpe.ho.att.com (-S.POTHIREDDY) says:
>
>To all sql experts:
>
>Could any body tell me how to delete duplicate rows in a table. I was
>able to list the duplicate rows in table like this:
>
>select rowid, a.*
> from a, b {a & b are aliases for sam table. basically a self join}
> where a.rowid != b.rowid;
>

Rodrigo Lopez Musa

unread,

May 10, 1995, 3:00:00 AM5/10/95

to

Gnana1

unread,

May 19, 1995, 3:00:00 AM5/19/95

to

From the example provided by you, I am unable to appreciate the difficulty
experienced by you. What is the purpose of this query ? Could you give
us a realistic example of how you intend to use a similar query in real
life ?

Thanx.

Ravi.