Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

FAQ 4.41 How can I remove duplicate elements from a list or array?

1 view
Skip to first unread message

PerlFAQ Server

unread,
May 18, 2008, 3:03:02 AM5/18/08
to
This is an excerpt from the latest version perlfaq4.pod, which
comes with the standard Perl distribution. These postings aim to
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .

--------------------------------------------------------------------

4.41: How can I remove duplicate elements from a list or array?

(contributed by brian d foy)

Use a hash. When you think the words "unique" or "duplicated", think
"hash keys".

If you don't care about the order of the elements, you could just create
the hash then extract the keys. It's not important how you create that
hash: just that you use "keys" to get the unique elements.

my %hash = map { $_, 1 } @array;
# or a hash slice: @hash{ @array } = ();
# or a foreach: $hash{$_} = 1 foreach ( @array );

my @unique = keys %hash;

If you want to use a module, try the "uniq" function from
"List::MoreUtils". In list context it returns the unique elements,
preserving their order in the list. In scalar context, it returns the
number of unique elements.

use List::MoreUtils qw(uniq);

my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7

You can also go through each element and skip the ones you've seen
before. Use a hash to keep track. The first time the loop sees an
element, that element has no key in %Seen. The "next" statement creates
the key and immediately uses its value, which is "undef", so the loop
continues to the "push" and increments the value for that key. The next
time the loop sees that same element, its key exists in the hash *and*
the value for that key is true (since it's not 0 or "undef"), so the
next skips that iteration and the loop goes to the next element.

my @unique = ();
my %seen = ();

foreach my $elem ( @array )
{
next if $seen{ $elem }++;
push @unique, $elem;
}

You can write this more briefly using a grep, which does the same thing.

my %seen = ();
my @unique = grep { ! $seen{ $_ }++ } @array;

--------------------------------------------------------------------

The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.

If you'd like to help maintain the perlfaq, see the details in
perlfaq.pod.

szr

unread,
May 18, 2008, 3:29:39 AM5/18/08
to
PerlFAQ Server wrote:

> 4.41: How can I remove duplicate elements from a list or array?

[...]


> You can write this more briefly using a grep, which does the same
> thing.
>
> my %seen = ();
> my @unique = grep { ! $seen{ $_ }++ } @array;

How about this method, which eliminates the need to declare a seperate
%seen hash?

my @array = (1, 2, 3, 6, 4, 4, 5, 6, 5, 7);
my @unique = grep { ! $::_{$_}++; } @array;
print join ', ', @unique;

[or]

my @array = (1, 2, 3, 6, 4, 4, 5, 6, 5, 7);
my @unique = grep { ! $::{seen}{$_}++; } @array;
print join ', ', @unique;


Output:

1, 2, 3, 6, 4, 5, 7

--
szr


Dr.Ruud

unread,
May 18, 2008, 7:19:32 AM5/18/08
to
szr schreef:

> How about this method, which eliminates the need to declare a seperate
> %seen hash?
>
> my @array = (1, 2, 3, 6, 4, 4, 5, 6, 5, 7);

> my @unique = grep { ! $::{seen}{$_}++; } @array;
> print join ', ', @unique;
>
> Output:
>
> 1, 2, 3, 6, 4, 5, 7


Bad practice. You are still building up a sep*a*rate hash, called
%::seen.
And if you would pick a name that was defined somewhere else with 'our',
you would be using the same variable space.

perl -Mstrict -Mwarnings -MData::Dumper -le'
my @array = qw( 1 2 3 4 6 4 4 6 5 7 );
my @unique = grep !$::seen{$_}++, @array;


print join ", ", @unique;

print Dumper(\%::seen);
our %seen; print Dumper(\%seen);
'
1, 2, 3, 4, 6, 5, 7
$VAR1 = {
'6' => 2,
'4' => 3,
'1' => 1,
'3' => 1,
'7' => 1,
'2' => 1,
'5' => 1
};

$VAR1 = {
'6' => 2,
'4' => 3,
'1' => 1,
'3' => 1,
'7' => 1,
'2' => 1,
'5' => 1
};

--
Affijn, Ruud

"Gewoon is een tijger."

shei...@my-deja.com

unread,
May 18, 2008, 9:06:06 AM5/18/08
to

What I don't quite understand here, is why perl chokes (using strict)
on

%perl
use strict;
use warnings;


my @array = (1, 2, 3, 6, 4, 4, 5, 6, 5, 7);

my @unique = grep { ! $seen{$_}++; } @array;
print join ', ', keys %main::seen;
__END__
Global symbol "%seen" requires explicit package name at - line 4.
Execution of - aborted due to compilation errors.

but not on

%perl
use strict;
use warnings;


my @array = (1, 2, 3, 6, 4, 4, 5, 6, 5, 7);
my @unique = grep { ! $::seen{$_}++; } @array;

print join ', ', keys %main::seen;
__END__
6, 4, 1, 3, 7, 2, 5

Both instances are referencing the undeclared variable %main::seen.
Why is the 2nd notation allowed?


As for the use of "$::_{$_}++" (or "$_{$_}++" for short) I understand
that the *_ typeglob is always predeclared and thus compiles.

Steffen

Ben Morrow

unread,
May 18, 2008, 9:33:06 AM5/18/08
to

Quoth shei...@my-deja.com:

>
> What I don't quite understand here, is why perl chokes (using strict)
> on
>
<code trimmed>
> %perl
> use strict;

> my @unique = grep { ! $seen{$_}++; } @array;
>
> but not on
>
> %perl
> use strict;
> my @unique = grep { ! $::seen{$_}++; } @array;
>
> Both instances are referencing the undeclared variable %main::seen.
> Why is the 2nd notation allowed?

Fully-qualified references are always allowed. %::seen is a fully-
qualified package variable, in the null package (which happens to also
be called 'main'). It's exactly equivalent to %main::seen.

Ben

--
You poor take courage, you rich take care:
The Earth was made a common treasury for everyone to share
All things in common, all people one.
'We come in peace'---the order came to cut them down. [b...@morrow.me.uk]

shei...@my-deja.com

unread,
May 18, 2008, 10:18:45 AM5/18/08
to
On May 18, 3:33 pm, Ben Morrow <b...@morrow.me.uk> wrote:
> Quoth sheinr...@my-deja.com:

>
> > Both instances are referencing the undeclared variable %main::seen.
> > Why is the 2nd notation allowed?
>
> Fully-qualified references are always allowed. %::seen is a fully-
> qualified package variable, in the null package (which happens to also
> be called 'main'). It's exactly equivalent to %main::seen.
>
> Ben
>

Oh! I never knew that.

Thank you Ben.

Steffen

szr

unread,
May 18, 2008, 3:30:47 PM5/18/08
to
Dr.Ruud wrote:
> szr schreef:
>> How about this method, which eliminates the need to declare a
>> seperate %seen hash?
>>
>> my @array = (1, 2, 3, 6, 4, 4, 5, 6, 5, 7);
>> my @unique = grep { ! $::{seen}{$_}++; } @array;
>> print join ', ', @unique;
>>
>> Output:
>>
>> 1, 2, 3, 6, 4, 5, 7
>
> Bad practice. You are still building up a sep*a*rate hash, called
> %::seen.
> And if you would pick a name that was defined somewhere else with
> 'our', you would be using the same variable space.
>
> perl -Mstrict -Mwarnings -MData::Dumper -le'
> my @array = qw( 1 2 3 4 6 4 4 6 5 7 );
> my @unique = grep !$::seen{$_}++, @array;
> print join ", ", @unique;
> print Dumper(\%::seen);
> our %seen; print Dumper(\%seen);

Point taken, though my goal was to show how to do what was shown at the
end of the FAQ with one less line.

And how about?

my @array = (1, 2, 3, 6, 4, 4, 5, 6, 5, 7);

my @unique = grep { ! $::_{$_}++; } @array;


print join ', ', @unique;


--
szr


Uri Guttman

unread,
May 18, 2008, 6:42:39 PM5/18/08
to
>>>>> "s" == szr <sz...@szromanMO.comVE> writes:

s> Dr.Ruud wrote:

>> Bad practice. You are still building up a sep*a*rate hash, called
>> %::seen.

s> Point taken, though my goal was to show how to do what was shown at the
s> end of the FAQ with one less line.

s> And how about?

s> my @array = (1, 2, 3, 6, 4, 4, 5, 6, 5, 7);
s> my @unique = grep { ! $::_{$_}++; } @array;
s> print join ', ', @unique;

so you changed the name to %_. it is still a global and could be full of
data before you run that code as it is not cleared first.

and trying to save a declare line is losing the sight of the whole
concept of hashes being a way to uniquify data.

uri

--
Uri Guttman ------ u...@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------

brian d foy

unread,
May 20, 2008, 5:56:47 AM5/20/08
to
In article <g0q05...@news4.newsguy.com>, szr <sz...@szromanMO.comVE>
wrote:

> Point taken, though my goal was to show how to do what was shown at the
> end of the FAQ with one less line.

Generally I don't golf perlfaq answers, and even take more lines than I
would in my normal programming. Beginners can see more discrete steps,
and advanced people can golf it as much as they like. :)

szr

unread,
May 20, 2008, 11:52:39 AM5/20/08
to

Agreed. It was more of a fun little mini-challenge, if you will :-)

--
szr


0 new messages