Numbering Words

32 views
Skip to first unread message

Marsden Broadbent

unread,
Jul 12, 2024, 7:35:21 AM (10 days ago) Jul 12
to BBEdit Talk
Hi, I have no experience of working with filters or AppleScript and I’m not quite sure who to ask to solve my problem, can you help?

The sample below show the problem.

01-1:2 # τὸν # 3588 # T-ASM 
01-1:2 # Ἰσαάκ, # 2464 # N-PRI 
01-1:2 # Ἰσαὰκ # 2464 # N-PRI 
01-1:2 # δὲ # 1161 # CONJ 
01-1:2 # ἐγέννησεν # 1080 # V-AAI-3S 
01-1:2 # τὸν # 3588 # T-ASM 
01-1:2 # Ἰακώβ, # 2384 # N-PRI 
01-1:2 # Ἰακὼβ # 2384 # N-PRI 
01-1:2 # δὲ # 1161 # CONJ 
01-1:2 # ἐγέννησεν # 1080 # V-AAI-3S 
01-1:2 # τὸν # 3588 # T-ASM 
01-1:2 # Ἰούδαν # 2455 # N-ASM 
01-1:2 # καὶ # 2532 # CONJ 
01-1:2 # τοὺς # 3588 # T-APM 
01-1:2 # ἀδελφοὺς # 80 # N-APM 
01-1:2 # αὐτοῦ, # 846 # P-GSM 
01-1:3 # Ἰούδας # 2455 # N-NSM 
01-1:3 # δὲ # 1161 # CONJ 
01-1:3 # ἐγέννησεν # 1080 # V-AAI-3S 
01-1:3 # τὸν # 3588 # T-ASM 
01-1:3 # Φάρες # 5329 # N-PRI 
01-1:3 # καὶ # 2532 # CONJ 
01-1:3 # τὸν # 3588 # T-ASM 
01-1:3 # Ζάρα # 2196 # N-PRI 
01-1:3 # ἐκ # 1537 # PREP 
01-1:3 # τῆς # 3588 # T-GSF 
01-1:3 # Θαμάρ, # 2283 # N-PRI 
01-1:3 # Φάρες # 5329 # N-PRI 
01-1:3 # δὲ # 1161 # CONJ 
01-1:3 # ἐγέννησεν # 1080 # V-AAI-3S 
01-1:3 # τὸν # 3588 # T-ASM 
01-1:3 # Ἑσρώμ, # 2074 # N-PRI 
01-1:3 # Ἑσρὼμ # 2074 # N-PRI 
01-1:3 # δὲ # 1161 # CONJ 
01-1:3 # ἐγέννησεν # 1080 # V-AAI-3S 
01-1:3 # τὸν # 3588 # T-ASM 
01-1:3 # Ἀράμ, # 689 # N-PRI 
01-1:4 # Ἀρὰμ # 689 # N-PRI 
01-1:4 # δὲ # 1161 # CONJ 
01-1:4 # ἐγέννησεν # 1080 # V-AAI-3S 
01-1:4 # τὸν # 3588 # T-ASM 
01-1:4 # Ἀμιναδάβ, # 284 # N-PRI 
01-1:4 # Ἀμιναδὰβ # 284 # N-PRI 
01-1:4 # δὲ # 1161 # CONJ 
01-1:4 # ἐγέννησεν # 1080 # V-AAI-3S 
01-1:4 # τὸν # 3588 # T-ASM 
01-1:4 # Ναασσών, # 3476 # N-PRI 
01-1:4 # Ναασσὼν # 3476 # N-PRI 
01-1:4 # δὲ # 1161 # CONJ 
01-1:4 # ἐγέννησεν # 1080 # V-AAI-3S 

I have many lines with the same starting pattern of (\d+-\d+:\d+ # )(Word etc.) To be exact there are 7957 sets in each file and I have about 25 files each with about 140000 lines. The shortest set is two lines and the longest might be about 50 

What I need is a simple way of numbering each set consecutively. So in the list above the last 01-1:2 # would be numbered 16 and the first 01-1:3 # would be 1 and so one. (Preferably with with the letter W in front of the number to give 01-1:2 # W16 # word etc. - although the W can be added later f needed.)

I am using BBEdit 12 on an iMac running High Sierra 10.13.6. The files in question are currently .csv

If you could point me in the right direction I would find that very helpful.

jj

unread,
Jul 12, 2024, 11:00:29 AM (10 days ago) Jul 12
to BBEdit Talk
AppleScript would be quite slow for so many lines.

Provided perl is installed on your system, here is an example perl filter.

Copy it to ~/Library/Application Support/BBEdit/Text Filters/add_w_column.pl

    #!/usr/bin/env perl

    

    use strict;

    use warnings;

    

    my $set_number = 1;

    my $current_set = "";

    while (my $line = <>) {

        if (my ($set) = $line =~ /^(\d+-\d+:\d+ # )/) {

            if ($current_set eq $set) {

                $set_number++;

            } else {

                $set_number = 1;

                $current_set = $set;

            }

            my $padded_set_number = sprintf("%02d", $set_number);

            $line =~ s/^(\d+-\d+:\d+ # )/$1W$padded_set_number # /;

        }

        print $line;

    }


Once installed you should be able to call it from menu Text > Apply Text Filter > add_w_column.


HTH

Jean Jourdain

Reply all
Reply to author
Forward
0 new messages