Delete from table only if there are duplicates

61 views
Skip to first unread message

Luigi Siciliano

unread,
Jun 28, 2024, 7:02:34 AM (5 days ago) Jun 28
to firebird...@googlegroups.com
Hello,

  I have a table populated like:

ID     NOME
---    -----------
FEI    CARLA
FEI    CARLA
FEI    GIULIA
FEI    SABRINA
FEI    WILMA


I need to delete the rows with column NOME duplicate:

FEI    CARLA
FEI    CARLA

I tried with this query but delete all records:

delete from
  TABLE
where
  ((SELECT
    COUNT(*)
   from
     TABLE
   where
     NOME = 'CARLA') > 1)

What I'm wrong?

Thanks

--
Luigi
--------------------------

Dimitry Sibiryakov

unread,
Jun 28, 2024, 7:15:33 AM (5 days ago) Jun 28
to firebird...@googlegroups.com
Luigi Siciliano wrote 28.06.2024 13:02:
> What I'm wrong?

Wrong condition: "delete every record if number of 'CARLA's in table is more
than one". You got exactly what you asked for.
Try to use EXISTS predicate.

https://firebirdsql.org/file/documentation/chunk/en/refdocs/fblangref50/fblangref50-commons-predicates.html#fblangref50-commons-exists

--
WBR, SD.

Virgo Pärna

unread,
Jun 28, 2024, 7:17:01 AM (5 days ago) Jun 28
to firebird...@googlegroups.com
On 28.06.2024 14:02, Luigi Siciliano wrote:
> Hello,
>
>   I have a table populated like:
>
> ID     NOME
> ---    -----------
> FEI    CARLA
> FEI    CARLA
> FEI    GIULIA
> FEI    SABRINA
> FEI    WILMA
>
>
> I need to delete the rows with column NOME duplicate:
>
> FEI    CARLA
> FEI    CARLA
>
> I tried with this query but delete all records:
>
> delete from
>   TABLE
> where
>   ((SELECT
>     COUNT(*)
>    from
>      TABLE
>    where
>      NOME = 'CARLA') > 1)
>

You are deleting records, when table has more than one NOME = 'CARLA'
records.
I'm suprised that it deletes all records. I would have expected one
CARLA record to remain.

execute block as
declare nome type of column TABLE.NOME;
begin
for select X.NOME
from (
select T.NOME, COUNT(*)
from TABLE t
group by T.NOME
having COUNT(*) > 1
) X
into :nome DO
begin
delete from TABLE where NOME = :nome;
end
end

should delete records with repeating NOME. Although it does it even, if
ID is different.

--
Virgo Pärna
Gaiasoft OÜ
vi...@gaiasoft.ee

Tomasz Tyrakowski

unread,
Jun 28, 2024, 7:19:31 AM (5 days ago) Jun 28
to firebird...@googlegroups.com, Luigi Siciliano
On 28.06.2024 at 13:02, Luigi Siciliano wrote:
>   I have a table populated like:
>
> ID     NOME
> ---    -----------
> FEI    CARLA
> FEI    CARLA
> FEI    GIULIA
> FEI    SABRINA
> FEI    WILMA
>
>
> I need to delete the rows with column NOME duplicate:
>
> FEI    CARLA
> FEI    CARLA
>

If you wish to delete both occurences of CARLA (and that's how I
understood your description), this should work:

delete from TABLE
where NOME in (
select NOME from TABLE
group by NOME
having count(*) > 1
)

regards
Tomasz

Virgo Pärna

unread,
Jun 28, 2024, 7:26:47 AM (5 days ago) Jun 28
to firebird...@googlegroups.com
On 28.06.2024 14:16, Virgo Pärna wrote:
>
> You are deleting records, when table has more than one NOME = 'CARLA'
> records.
> I'm suprised that it deletes all records. I would have expected one
> CARLA record to remain.
>

My mistake, I was thinking the way exists works, where subselect is
executed again for every row (if correctly referencing original table).
But this was not the case.

Tomasz Tyrakowski

unread,
Jun 28, 2024, 7:35:08 AM (5 days ago) Jun 28
to firebird...@googlegroups.com, Luigi Siciliano
On the other hand, if you wish to just de-duplicate CARLA (leave just
one CARLA out of N occurences), it would be nice if the following worked:

delete from TABLE t
where NOME in (
select NOME from TABLE
group by NOME
having count(*) > 1
)
rows (select count(*) from TABLE tt where tt.NOME=t.nome)-1;

but it doesn't (error: Invalid parameter to FETCH or FIRST).

So, to de-dupe you'd probably need to execute a block or a stored
procedure (unless someone can point out a clever DELETE - I'd love to
learn it).

set term ^;
execute block
as
declare variable N varchar(50); -- whatever the type of NOME is
declare variable C integer;
begin
for select NOME, count(*)
from TABLE
group by NOME
having count(*) > 1
into :N, :C
do
begin
delete from TABLE
where NOME=:N
rows :C-1;
end
end
^

If the ID for all duplicated rows is the same (and it is in your
example), you can't use it do point out which row is to stay, and which
ones are to be deleted.

regards
Tomasz


Dimitry Sibiryakov

unread,
Jun 28, 2024, 7:45:26 AM (5 days ago) Jun 28
to firebird...@googlegroups.com
Tomasz Tyrakowski wrote 28.06.2024 13:35:
> On the other hand, if you wish to just de-duplicate CARLA (leave just one CARLA
> out of N occurences), it would be nice if the following worked:
>
> delete from TABLE t
> where NOME in (
>   select NOME from TABLE
>   group by NOME
>   having count(*) > 1
> )
> rows (select count(*) from TABLE tt where tt.NOME=t.nome)-1;
>
> but it doesn't (error: Invalid parameter to FETCH or FIRST).

It can work if you change condition:

delete from TABLE t
where exists (
select * from TABLE t2 where t.NOME=t2.NOME and t2.RDB$DB_KEY>t.RDB$DB_KEY
)

RDB$DB_KEY is a handy way to distinguish two absolutely similar records.

--
WBR, SD.

Tomasz Tyrakowski

unread,
Jun 28, 2024, 8:13:02 AM (5 days ago) Jun 28
to firebird...@googlegroups.com
On 28.06.2024 at 13:45, 'Dimitry Sibiryakov' via firebird-support wrote:
>> [...]
>
>   It can work if you change condition:
>
> delete from TABLE t
> where exists (
>    select * from TABLE t2 where t.NOME=t2.NOME and
> t2.RDB$DB_KEY>t.RDB$DB_KEY
> )
>
>   RDB$DB_KEY is a handy way to distinguish two absolutely similar records.
>

Ah, yes! Forgot about the RDB$DB_KEY "super-id" :)
Thanks! Definitely a simpler solution than a block or a procedure.

regards
Tomasz

Luigi Siciliano

unread,
Jun 28, 2024, 11:52:27 AM (5 days ago) Jun 28
to firebird...@googlegroups.com
Hello,

Il 28/06/2024 13:19, Tomasz Tyrakowski ha scritto:
> If you wish to delete both occurences of CARLA (and that's how I
> understood your description), this should work:
>
> delete from TABLE
> where NOME in (
>   select NOME from TABLE
>   group by NOME
>   having count(*) > 1
> )
>
This query works fine to de-duplicate. That's effectively I want.

Thanks

--
Luigi
--------------------------

Dimitry Sibiryakov

unread,
Jun 28, 2024, 11:58:34 AM (5 days ago) Jun 28
to firebird...@googlegroups.com
Luigi Siciliano wrote 28.06.2024 17:52:
>>
>> delete from TABLE
>> where NOME in (
>>   select NOME from TABLE
>>   group by NOME
>>   having count(*) > 1
>> )
>>
> This query works fine to de-duplicate. That's effectively I want.

Be aware that behaviour of this query depends on Firebird version and in
recent ones it may be not what you want (or vice versa).

--
WBR, SD.

Elmar Haneke

unread,
Jun 28, 2024, 12:51:33 PM (5 days ago) Jun 28
to firebird...@googlegroups.com


Am 28.06.24 um 17:52 schrieb Luigi Siciliano:
Hello,

Il 28/06/2024 13:19, Tomasz Tyrakowski ha scritto:
If you wish to delete both occurences of CARLA (and that's how I understood your description), this should work:

delete from TABLE
where NOME in (
  select NOME from TABLE
  group by NOME
  having count(*) > 1
)

This query works fine to de-duplicate. That's effectively I want.


Logically from SQL standard it removes all rows for values of NOME which do have duplicates.

If there is one left over, it is a specific implementation dependent side effect.



Luigi Siciliano

unread,
Jun 28, 2024, 12:53:56 PM (5 days ago) Jun 28
to firebird...@googlegroups.com
Hello,

Il 28/06/2024 17:58, 'Dimitry Sibiryakov' via firebird-support ha scritto:
>   Be aware that behaviour of this query depends on Firebird version
> and in recent ones it may be not what you want (or vice versa).

I used Firebird 2.5.9

--
Luigi
--------------------------

moorhouse

unread,
Jul 1, 2024, 12:45:06 PM (2 days ago) Jul 1
to firebird-support
Depends what you want to do with the table once you've deduped it.  Why not do a SELECT DISTINCT view on it?

Luigi Siciliano

unread,
Jul 1, 2024, 1:07:22 PM (2 days ago) Jul 1
to firebird...@googlegroups.com
Hello,

Il 30/06/2024 14:27, moorhouse ha scritto:
> Why not do a SELECT DISTINCT view on it?

Because the table is a temporary table that is mismanaged (can accept
duplicates) by a software whose code I don't have :( but I can manage
the database eliminating the duplicates that is generated sometimes but
I cant add a primary key

Thanks

--
Luigi
--------------------------
Reply all
Reply to author
Forward
0 new messages