--------------------------------------------------------------------
4.41: How can I remove duplicate elements from a list or array?
(contributed by brian d foy)
Use a hash. When you think the words "unique" or "duplicated", think
"hash keys".
If you don't care about the order of the elements, you could just create
the hash then extract the keys. It's not important how you create that
hash: just that you use "keys" to get the unique elements.
my %hash = map { $_, 1 } @array;
# or a hash slice: @hash{ @array } = ();
# or a foreach: $hash{$_} = 1 foreach ( @array );
my @unique = keys %hash;
If you want to use a module, try the "uniq" function from
"List::MoreUtils". In list context it returns the unique elements,
preserving their order in the list. In scalar context, it returns the
number of unique elements.
use List::MoreUtils qw(uniq);
my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7
You can also go through each element and skip the ones you've seen
before. Use a hash to keep track. The first time the loop sees an
element, that element has no key in %Seen. The "next" statement creates
the key and immediately uses its value, which is "undef", so the loop
continues to the "push" and increments the value for that key. The next
time the loop sees that same element, its key exists in the hash *and*
the value for that key is true (since it's not 0 or "undef"), so the
next skips that iteration and the loop goes to the next element.
my @unique = ();
my %seen = ();
foreach my $elem ( @array )
{
next if $seen{ $elem }++;
push @unique, $elem;
}
You can write this more briefly using a grep, which does the same thing.
my %seen = ();
my @unique = grep { ! $seen{ $_ }++ } @array;
--------------------------------------------------------------------
The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
are not necessarily experts in every domain where Perl might show up,
so please include as much information as possible and relevant in any
corrections. The perlfaq-workers also don't have access to every
operating system or platform, so please include relevant details for
corrections to examples that do not work on particular platforms.
Working code is greatly appreciated.
If you'd like to help maintain the perlfaq, see the details in
perlfaq.pod.
> 4.41: How can I remove duplicate elements from a list or array?
[...]
> You can write this more briefly using a grep, which does the same
> thing.
>
> my %seen = ();
> my @unique = grep { ! $seen{ $_ }++ } @array;
How about this method, which eliminates the need to declare a seperate
%seen hash?
my @array = (1, 2, 3, 6, 4, 4, 5, 6, 5, 7);
my @unique = grep { ! $::_{$_}++; } @array;
print join ', ', @unique;
[or]
my @array = (1, 2, 3, 6, 4, 4, 5, 6, 5, 7);
my @unique = grep { ! $::{seen}{$_}++; } @array;
print join ', ', @unique;
Output:
1, 2, 3, 6, 4, 5, 7
--
szr
> How about this method, which eliminates the need to declare a seperate
> %seen hash?
>
> my @array = (1, 2, 3, 6, 4, 4, 5, 6, 5, 7);
> my @unique = grep { ! $::{seen}{$_}++; } @array;
> print join ', ', @unique;
>
> Output:
>
> 1, 2, 3, 6, 4, 5, 7
Bad practice. You are still building up a sep*a*rate hash, called
%::seen.
And if you would pick a name that was defined somewhere else with 'our',
you would be using the same variable space.
perl -Mstrict -Mwarnings -MData::Dumper -le'
my @array = qw( 1 2 3 4 6 4 4 6 5 7 );
my @unique = grep !$::seen{$_}++, @array;
print join ", ", @unique;
print Dumper(\%::seen);
our %seen; print Dumper(\%seen);
'
1, 2, 3, 4, 6, 5, 7
$VAR1 = {
'6' => 2,
'4' => 3,
'1' => 1,
'3' => 1,
'7' => 1,
'2' => 1,
'5' => 1
};
$VAR1 = {
'6' => 2,
'4' => 3,
'1' => 1,
'3' => 1,
'7' => 1,
'2' => 1,
'5' => 1
};
--
Affijn, Ruud
"Gewoon is een tijger."
What I don't quite understand here, is why perl chokes (using strict)
on
%perl
use strict;
use warnings;
my @array = (1, 2, 3, 6, 4, 4, 5, 6, 5, 7);
my @unique = grep { ! $seen{$_}++; } @array;
print join ', ', keys %main::seen;
__END__
Global symbol "%seen" requires explicit package name at - line 4.
Execution of - aborted due to compilation errors.
but not on
%perl
use strict;
use warnings;
my @array = (1, 2, 3, 6, 4, 4, 5, 6, 5, 7);
my @unique = grep { ! $::seen{$_}++; } @array;
print join ', ', keys %main::seen;
__END__
6, 4, 1, 3, 7, 2, 5
Both instances are referencing the undeclared variable %main::seen.
Why is the 2nd notation allowed?
As for the use of "$::_{$_}++" (or "$_{$_}++" for short) I understand
that the *_ typeglob is always predeclared and thus compiles.
Steffen
Fully-qualified references are always allowed. %::seen is a fully-
qualified package variable, in the null package (which happens to also
be called 'main'). It's exactly equivalent to %main::seen.
Ben
--
You poor take courage, you rich take care:
The Earth was made a common treasury for everyone to share
All things in common, all people one.
'We come in peace'---the order came to cut them down. [b...@morrow.me.uk]
Oh! I never knew that.
Thank you Ben.
Steffen
Point taken, though my goal was to show how to do what was shown at the
end of the FAQ with one less line.
And how about?
my @array = (1, 2, 3, 6, 4, 4, 5, 6, 5, 7);
my @unique = grep { ! $::_{$_}++; } @array;
print join ', ', @unique;
--
szr
s> Dr.Ruud wrote:
>> Bad practice. You are still building up a sep*a*rate hash, called
>> %::seen.
s> Point taken, though my goal was to show how to do what was shown at the
s> end of the FAQ with one less line.
s> And how about?
s> my @array = (1, 2, 3, 6, 4, 4, 5, 6, 5, 7);
s> my @unique = grep { ! $::_{$_}++; } @array;
s> print join ', ', @unique;
so you changed the name to %_. it is still a global and could be full of
data before you run that code as it is not cleared first.
and trying to save a declare line is losing the sight of the whole
concept of hashes being a way to uniquify data.
uri
--
Uri Guttman ------ u...@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
> Point taken, though my goal was to show how to do what was shown at the
> end of the FAQ with one less line.
Generally I don't golf perlfaq answers, and even take more lines than I
would in my normal programming. Beginners can see more discrete steps,
and advanced people can golf it as much as they like. :)
Agreed. It was more of a fun little mini-challenge, if you will :-)
--
szr