Forum: >>> Magnum BBS <<<

prepending a counter for number of lines that match the first field

From Lloyd Houghton@21:1/5 to All on Fri Apr 28 23:17:24 2023

Hi, I had a script for this purpose, from about 30 years ago which was the last time I needed it, it doesn't seem to work and I'm very rusty, and I wonder if someone could offer a solution.

I have a file where each line has two fields. The first field is sometimes identical between one line and the next. I need to prepend a new field on every line to say how many lines (including the current one) share the same first field. We can assume
the file is sorted. For example, if the file is:

abc 647389
abc 12354
abd 7563
cdf 152384
cdf 8761523
cdf 1253
ghj 78654
klm 12634
pqr 9864

then when I run the script, the output should be:

2 abc 647389
2 abc 12354
1 abd 7563
3 cdf 152384
3 cdf 8761523
3 cdf 1253
1 ghj 78654
1 klm 12634
1 pqr 9864

The script that I used to do this (as best as I guess from looking in the directory with my data) looks like this:

sort -o tempid tempid
awk 'NR>1 && $1 != key { for (i=0; ++i<n) print n, line[i]; n=0 }
{ key=$1; line[++n]=$0 }
END { for (i=0; ++i<n) print n, line[i] }' tempid >tempid2

I can't say that I understand the loop specification format, or even the overall behaviour (someone must have helped me), but this script was in the directory and appears to be related to the task...

Could anyone help me to fix this?

Many many thanks.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to Lloyd Houghton on Sat Apr 29 10:44:41 2023

On 29.04.2023 08:17, Lloyd Houghton wrote:

Hi, I had a script for this purpose, from about 30 years ago which was the last time I needed it, it doesn't seem to work and I'm very rusty, and I wonder if someone could offer a solution.

I have a file where each line has two fields. The first field is sometimes identical between one line and the next. I need to prepend a new field on every line to say how many lines (including the current one) share the same first field. We can assume

the file is sorted. For example, if the file is:

abc 647389
abc 12354
abd 7563
cdf 152384
cdf 8761523
cdf 1253
ghj 78654
klm 12634
pqr 9864

then when I run the script, the output should be:

2 abc 647389
2 abc 12354
1 abd 7563
3 cdf 152384
3 cdf 8761523
3 cdf 1253
1 ghj 78654
1 klm 12634
1 pqr 9864

The script that I used to do this (as best as I guess from looking in the directory with my data) looks like this:

sort -o tempid tempid
awk 'NR>1 && $1 != key { for (i=0; ++i<n) print n, line[i]; n=0 }
{ key=$1; line[++n]=$0 }
END { for (i=0; ++i<n) print n, line[i] }' tempid >tempid2

This script has obvious syntactical errors.

I can't say that I understand the loop specification format, or even the overall behaviour (someone must have helped me), but this script was in the directory and appears to be related to the task...

You need information in the lines that you can only determine by later
lines, so you need to (temporarily) store the contents of the lines as
you seem to have tried.

Could anyone help me to fix this?

No, because there's a much simpler and more obvious solution; two-pass processing across your (sorted) data.

awk '
NR==FNR { n[$1]++ ; next }
{ print n[$1], $0 }
' tempid tempid >tempid2

Janis

Many many thanks.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lloyd Houghton@21:1/5 to Janis Papanagnou on Sat Apr 29 15:00:39 2023

Thank you very much Janis,, this has solved my problem.

I remember your name from helping me in this same forum many years ago with a shell script. For a hobby, I end up neeing such scripts a couple of times no more than 2 or 3 times a decade, and I'm grateful to people like you who help others with problems
that must seem tediously obvious to you.

regards - Lloyd

On Saturday, April 29, 2023 at 4:44:48 AM UTC-4, Janis Papanagnou wrote:

awk '
NR==FNR { n[$1]++ ; next }
{ print n[$1], $0 }
' tempid tempid >tempid2

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Janis Papanagnou@21:1/5 to Lloyd Houghton on Sun Apr 30 00:31:56 2023

Thanks for your feedback. Glad my suggestion helped. (It's not tedious,
don't worry.)

Janis

On 30.04.2023 00:00, Lloyd Houghton wrote:

Thank you very much Janis,, this has solved my problem.

I remember your name from helping me in this same forum many years
ago with a shell script. For a hobby, I end up neeing such scripts a
couple of times no more than 2 or 3 times a decade, and I'm grateful
to people like you who help others with problems that must seem
tediously obvious to you.

regards - Lloyd

On Saturday, April 29, 2023 at 4:44:48 AM UTC-4, Janis Papanagnou
wrote:

awk ' NR==FNR { n[$1]++ ; next } { print n[$1], $0 }
' tempid tempid >tempid2

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Thu Jul 30 20:01:55 2026
  from Wales, Uk via Telnet
- Rixter
  Thu Jul 30 14:17:17 2026
  from Madison, Nc via Telnet
- Krenn
  Thu Jul 30 13:16:49 2026
  from Sydney, Nsw via Telnet
- Bob Worm
  Thu Jul 30 09:03:28 2026
  from Wales, Uk via Telnet
- Bob Worm
  Thu Jul 30 08:47:34 2026
  from Wales, Uk via Telnet
- Bob Worm
  Thu Jul 30 08:36:06 2026
  from Wales, Uk via Telnet
- Rixter
  Thu Jul 30 02:32:09 2026
  from Madison, Nc via Telnet
- Bob Worm
  Wed Jul 29 22:26:45 2026
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	98:54:06
Calls:	12,458
Calls today:	8
Files:	15,197
Messages:	6,537,965

prepending a counter for number of lines that match the first field

Who's Online

Recent Visitors

System Info