DESCRIPTION
Discard all but one of successive identical lines from INPUT (or
stan-
dard input), writing to OUTPUT (or standard output).
And, later:
Note: ’uniq’ does not detect repeated lines unless they are
adjacent.
You may want to sort the input first, or use ‘sort -u’ without
‘uniq’.
Since find will output the names in no particular order, you'll have to
sort first.
--
America works less, when you say "Union Yes!"
Eduardo M KALINOWSKI
eka...@gmail.com
http://move.to/hpkb
--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
On Sun, Apr 13, 2008 at 12:57 AM, Bhasker C V <bha...@unixindia.com> wrote:
> For fairly large file 100K+ lines
> uniq command does not filter the repetitive lines.
>
> Am I doing anything wrong on the usage ?
>
> For eg:-
>
> I had run this script in my home dir
>
> find . -name \* -type f -exec basename {} \; | uniq
> or send the output to a file and then run uniq on the file
>
> Both cases, the o/p shows repeated lines
I happened to know the source code for uniq and it should filter
repeated lines. By repeated lines, do you mean consecutive repeated
lines or separated by other lines? Uniq only filters consecutive
repeated lines, e.g.
A
A
B
A
will become
A
B
A
If you need it to filter such that only 1 unique line remains, you
will need to sort first then pipe to uniq (not a good solution for
really large files).
Regards,
Chris
>
>
> --
> Bhasker C V
> Registered Linux user: #306349 (counter.li.org)
> The box said "Requires Windows 95, NT, or better", so I installed Linux.
>
>
> --
> To UNSUBSCRIBE, email to debian-us...@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
>
>
--
contact: +65 97553292
e-mail: chrish...@gmail.com / ch_m...@yahoo.com / chris...@nus.edu.sg
facebook: http://nus.facebook.com/profile.php?id=502687583
For fairly large file 100K+ lines
uniq command does not filter the repetitive lines.
Am I doing anything wrong on the usage ?
For eg:-
I had run this script in my home dir
find . -name \* -type f -exec basename {} \; | uniq
or send the output to a file and then run uniq on the file
Both cases, the o/p shows repeated lines
If you need to sort it anyways then `sort -u` might be of interest.
/Allan
> Uniq only filters consecutive repeated lines, e.g.
>
> A
> A
> B
> A
>
> will become
>
> A
> B
> A
>
> If you need it to filter such that only 1 unique line remains, you
> will need to sort first then pipe to uniq (not a good solution for
> really large files).
I sometimes need to filter repeated lines that are not consecutive,
and I use the following simple perl script for this purpose. Runs
reasonable fast even for large (couple of tens of MB) files:
#!/usr/bin/perl
while (<>) {
if (!$h{$_}) {
$h{$_} = 1;
print;
}
}
HTH,
urs