Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

prepending a counter for number of lines that match the first field

10 views
Skip to first unread message

Lloyd Houghton

unread,
Apr 29, 2023, 2:17:28 AM4/29/23
to
Hi, I had a script for this purpose, from about 30 years ago which was the last time I needed it, it doesn't seem to work and I'm very rusty, and I wonder if someone could offer a solution.

I have a file where each line has two fields. The first field is sometimes identical between one line and the next. I need to prepend a new field on every line to say how many lines (including the current one) share the same first field. We can assume the file is sorted. For example, if the file is:

abc 647389
abc 12354
abd 7563
cdf 152384
cdf 8761523
cdf 1253
ghj 78654
klm 12634
pqr 9864

then when I run the script, the output should be:

2 abc 647389
2 abc 12354
1 abd 7563
3 cdf 152384
3 cdf 8761523
3 cdf 1253
1 ghj 78654
1 klm 12634
1 pqr 9864

The script that I used to do this (as best as I guess from looking in the directory with my data) looks like this:

sort -o tempid tempid
awk 'NR>1 && $1 != key { for (i=0; ++i<n) print n, line[i]; n=0 }
{ key=$1; line[++n]=$0 }
END { for (i=0; ++i<n) print n, line[i] }' tempid >tempid2

I can't say that I understand the loop specification format, or even the overall behaviour (someone must have helped me), but this script was in the directory and appears to be related to the task...

Could anyone help me to fix this?

Many many thanks.

Janis Papanagnou

unread,
Apr 29, 2023, 4:44:48 AM4/29/23
to
This script has obvious syntactical errors.

> I can't say that I understand the loop specification format, or even the overall behaviour (someone must have helped me), but this script was in the directory and appears to be related to the task...

You need information in the lines that you can only determine by later
lines, so you need to (temporarily) store the contents of the lines as
you seem to have tried.

>
> Could anyone help me to fix this?

No, because there's a much simpler and more obvious solution; two-pass
processing across your (sorted) data.

awk '
NR==FNR { n[$1]++ ; next }
{ print n[$1], $0 }
' tempid tempid >tempid2


Janis

>
> Many many thanks.
>

Lloyd Houghton

unread,
Apr 29, 2023, 6:00:42 PM4/29/23
to
Thank you very much Janis,, this has solved my problem.

I remember your name from helping me in this same forum many years ago with a shell script. For a hobby, I end up neeing such scripts a couple of times no more than 2 or 3 times a decade, and I'm grateful to people like you who help others with problems that must seem tediously obvious to you.

regards - Lloyd

Janis Papanagnou

unread,
Apr 29, 2023, 6:32:03 PM4/29/23
to
Thanks for your feedback. Glad my suggestion helped. (It's not tedious,
don't worry.)

Janis
0 new messages