Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Uniq is not unique ?

588 views
Skip to first unread message

Eduardo M KALINOWSKI

unread,
Apr 12, 2008, 1:20:05 PM4/12/08
to
Bhasker C V wrote:
> Hi,
>
>
>
> For fairly large file 100K+ lines
> uniq command does not filter the repetitive lines.
>
> Am I doing anything wrong on the usage ?
>
> For eg:-
>
> I had run this script in my home dir
>
> find . -name \* -type f -exec basename {} \; | uniq
> or send the output to a file and then run uniq on the file
>
> Both cases, the o/p shows repeated lines
>From man uniq(1):

DESCRIPTION
Discard all but one of successive identical lines from INPUT (or
stan-
dard input), writing to OUTPUT (or standard output).

And, later:

Note: ’uniq’ does not detect repeated lines unless they are
adjacent.
You may want to sort the input first, or use ‘sort -u’ without
‘uniq’.

Since find will output the names in no particular order, you'll have to
sort first.


--
America works less, when you say "Union Yes!"

Eduardo M KALINOWSKI
eka...@gmail.com
http://move.to/hpkb


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Chris Henry

unread,
Apr 12, 2008, 1:20:10 PM4/12/08
to
Hi,

On Sun, Apr 13, 2008 at 12:57 AM, Bhasker C V <bha...@unixindia.com> wrote:
> For fairly large file 100K+ lines
> uniq command does not filter the repetitive lines.
>
> Am I doing anything wrong on the usage ?
>
> For eg:-
>
> I had run this script in my home dir
>
> find . -name \* -type f -exec basename {} \; | uniq
> or send the output to a file and then run uniq on the file
>
> Both cases, the o/p shows repeated lines

I happened to know the source code for uniq and it should filter
repeated lines. By repeated lines, do you mean consecutive repeated
lines or separated by other lines? Uniq only filters consecutive
repeated lines, e.g.

A
A
B
A

will become

A
B
A

If you need it to filter such that only 1 unique line remains, you
will need to sort first then pipe to uniq (not a good solution for
really large files).

Regards,
Chris
>
>
> --
> Bhasker C V
> Registered Linux user: #306349 (counter.li.org)
> The box said "Requires Windows 95, NT, or better", so I installed Linux.


>
>
> --
> To UNSUBSCRIBE, email to debian-us...@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
>
>

--
contact: +65 97553292
e-mail: chrish...@gmail.com / ch_m...@yahoo.com / chris...@nus.edu.sg
facebook: http://nus.facebook.com/profile.php?id=502687583

Bhasker C V

unread,
Apr 12, 2008, 1:20:10 PM4/12/08
to
Hi,



For fairly large file 100K+ lines
uniq command does not filter the repetitive lines.

Am I doing anything wrong on the usage ?

For eg:-

I had run this script in my home dir

find . -name \* -type f -exec basename {} \; | uniq
or send the output to a file and then run uniq on the file

Both cases, the o/p shows repeated lines

Allan Wind

unread,
Apr 12, 2008, 2:30:21 PM4/12/08
to
On 2008-04-12T22:27:46+0530, Bhasker C V wrote:
> For fairly large file 100K+ lines
> uniq command does not filter the repetitive lines.

If you need to sort it anyways then `sort -u` might be of interest.


/Allan

Urs Thuermann

unread,
Aug 14, 2008, 5:10:07 PM8/14/08
to
"Chris Henry" <chrish...@gmail.com> writes:

> Uniq only filters consecutive repeated lines, e.g.
>
> A
> A
> B
> A
>
> will become
>
> A
> B
> A
>
> If you need it to filter such that only 1 unique line remains, you
> will need to sort first then pipe to uniq (not a good solution for
> really large files).

I sometimes need to filter repeated lines that are not consecutive,
and I use the following simple perl script for this purpose. Runs
reasonable fast even for large (couple of tens of MB) files:

#!/usr/bin/perl

while (<>) {
if (!$h{$_}) {
$h{$_} = 1;
print;
}
}

HTH,
urs

0 new messages