Re: Uniq is not unique ?

Showing 1-5 of 5 messages
Re: Uniq is not unique ? Eduardo M KALINOWSKI 4/12/08 10:20 AM
Bhasker C V wrote:
> Hi,
>
>
>  
>  For fairly large file 100K+ lines
>  uniq command does not filter the repetitive lines.
>
>  Am I doing anything wrong on the usage ?
>
>  For eg:-
>
>  I had run this script in my home dir
>
>  find . -name \* -type f -exec basename {} \; | uniq
>   or send the output to a file and then run uniq on the file
>
>  Both cases, the o/p shows repeated lines
>From man uniq(1):

DESCRIPTION
       Discard  all but one of successive identical lines from INPUT (or
stan-
       dard input), writing to OUTPUT (or standard output).

And, later:

       Note: ’uniq’ does not detect repeated lines unless they  are
adjacent.
       You  may want to sort the input first, or use ‘sort -u’ without
‘uniq’.

Since find will output the names in no particular order, you'll have to
sort first.


--
America works less, when you say "Union Yes!"

Eduardo M KALINOWSKI
eka...@gmail.com
http://move.to/hpkb


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Re: Uniq is not unique ? Chris Henry 4/12/08 10:20 AM
Hi,

On Sun, Apr 13, 2008 at 12:57 AM, Bhasker C V <bha...@unixindia.com> wrote:
>   For fairly large file 100K+ lines
>   uniq command does not filter the repetitive lines.
>
>   Am I doing anything wrong on the usage ?
>
>   For eg:-
>
>   I had run this script in my home dir
>
>   find . -name \* -type f -exec basename {} \; | uniq
>   or send the output to a file and then run uniq on the file
>
>   Both cases, the o/p shows repeated lines
I happened to know the source code for uniq and it should filter
repeated lines. By repeated lines, do you mean consecutive repeated
lines or separated by other lines? Uniq only filters consecutive
repeated lines, e.g.

A
A
B
A

will become

A
B
A

If you need it to filter such that only 1 unique line remains, you
will need to sort first then pipe to uniq (not a good solution for
really large files).

Regards,
Chris
>
>
>  --
>  Bhasker C V
>  Registered Linux user: #306349 (counter.li.org)
>  The box said "Requires Windows 95, NT, or better", so I installed Linux.


>
>
>  --
>  To UNSUBSCRIBE, email to debian-us...@lists.debian.org
>  with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
>
>

--
contact: +65 97553292
e-mail: chrish...@gmail.com / ch_m...@yahoo.com / chris...@nus.edu.sg
facebook: http://nus.facebook.com/profile.php?id=502687583


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Uniq is not unique ? Bhasker C V 4/12/08 10:20 AM
Hi,


 
 For fairly large file 100K+ lines
 uniq command does not filter the repetitive lines.

 Am I doing anything wrong on the usage ?

 For eg:-

 I had run this script in my home dir

 find . -name \* -type f -exec basename {} \; | uniq
  or send the output to a file and then run uniq on the file

 Both cases, the o/p shows repeated lines


--
Bhasker C V
Registered Linux user: #306349 (counter.li.org)
The box said "Requires Windows 95, NT, or better", so I installed Linux.


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Re: Uniq is not unique ? Allan Wind 4/12/08 11:30 AM
On 2008-04-12T22:27:46+0530, Bhasker C V wrote:
>  For fairly large file 100K+ lines
>  uniq command does not filter the repetitive lines.

If you need to sort it anyways then `sort -u` might be of interest.


/Allan


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Re: Uniq is not unique ? Urs Thuermann 8/14/08 2:10 PM
"Chris Henry" <chrish...@gmail.com> writes:

> Uniq only filters consecutive repeated lines, e.g.
>
> A
> A
> B
> A
>
> will become
>
> A
> B
> A
>
> If you need it to filter such that only 1 unique line remains, you
> will need to sort first then pipe to uniq (not a good solution for
> really large files).

I sometimes need to filter repeated lines that are not consecutive,
and I use the following simple perl script for this purpose.  Runs
reasonable fast even for large (couple of tens of MB) files:

#!/usr/bin/perl

while (<>) {
    if (!$h{$_}) {
        $h{$_} = 1;
        print;
    }
}

HTH,
urs


--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org