Need way of counting number of occurrences of names

Howard

unread,

Sep 8, 2021, 11:05:08 AM9/8/21

to BBEdit Talk

I have multiple instances of this sample data (below) in which each observation has at least three lines and is separated by a blank line. In each observation, the first line contains a name, the second a time, and then there is one or more lines of text:

SAMPLE DATA

Peter Frost

25:34

yes

Elle Claus

30:05

Third line

Elle Claus

30:25

Third line

Fourth line

Cary Clark

31:21

done

________________

First, I would like the data organized so that each observation's lines are combined and separated by two dashes:

SAMPLE COMBINED DATA

Peter Frost -- 25:34 -- yes

Elle Claus -- 30:05 -- Third line

Elle Claus -- 30:25 -- Third line -- Fourth line

Cary Clark -- 31:21 -- done

________________

Then, I would like a count of the observations by name:

SAMPLE COUNTED DATA

Peter Frost 1

Elle Claus 2

Cary Clark 1

I am familiar with using Unix command solutions.

Thanks,

Howard

Media Mouth

unread,

Sep 8, 2021, 11:59:41 AM9/8/21

to bbe...@googlegroups.com

Hi Howard, here's a NodeJS version

Put the following in a file named "converter.js"

To run in Terminal: node path/to/converter.js path/to/source

fs = require("fs"); //Load two npm packages

path = require("path");

sourcePath = process.argv[2]; //Set filepaths

formattedPath = path.join(path.dirname(sourcePath), path.parse(sourcePath).name + "-Formatted" + path.extname(sourcePath));

talliedPath = path.join(path.dirname(sourcePath), path.parse(sourcePath).name + "-Tallied" + path.extname(sourcePath));

nameTally = {};

content = fs.readFileSync(sourcePath).toString().trim().split("\n\n"); //get, process, and write formatted content

content.forEach((thisEntry, entryCount) => {

thisEntry = thisEntry.split("\n");

let thisName = thisEntry[0];

nameTally[thisName] = nameTally[thisName] ? nameTally[thisName] + 1 : 1;

content[entryCount] = thisEntry.join(" -- ");

});

fs.writeFileSync(formattedPath, content.join("\n"));

nameTallyOutput = []; //prep & write the nameTally file

for (thisName in nameTally) content.push(thisName + ": " + nameTally[thisName]);

fs.writeFileSync(talliedPath, content.join("\n"));

--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/89eb8047-72c1-4d21-b46d-fa9ad0225425n%40googlegroups.com.

Kerri Hicks

unread,

Sep 8, 2021, 12:01:07 PM9/8/21

to bbe...@googlegroups.com

This worked for me, for the first part.

Screen Shot 2021-09-08 at 11.59.25 AM.png

The second part I think would have to be scripted.

--Kerri

--

Christopher Stone

unread,

Sep 8, 2021, 4:40:42 PM9/8/21

to BBEdit-Talk

On Sep 08, 2021, at 10:05, Howard <leadwi...@gmail.com> wrote:

I have multiple instances of this sample data (below) in which each observation has at least three lines and is separated by a blank line. In each observation, the first line contains a name, the second a time, and then there is one or more lines of text:

Hey Howard,

This is a bit of a chore but not uber difficult using a Perl text filter.

--

Best Regards,

Chris

#!/usr/bin/env perl -0777 -nsw

# -------------------------------------------------------------

# Auth: Christopher Stone <script...@thestoneforge.com>

# dCre: 2021/09/08 15:31

# dMod: 2021/09/08 15:31

# Task: Concatenate Data Records and Count Same-Name-Keys.

# Tags: @ccstone, @Shell, @Script, @Perl, @Howard, @BBEdit-Talk

# -------------------------------------------------------------

# Trim vertical leading and trailing whitespace.

s!\A\s+|\s+\Z!!g;

# Trim trailing horizontal whitespace.

s!\h+$!!gm;

# Split the text records into an array.

my @array = split(/\n{2}/, $_);

# Concatenate the individual text records into a single line.

s!\n! -- !gm for @array;

# Set the Output Separator character.

$, = "\n";

print @array;

print "\n\nCounts:\n\n";

# Remove all text from records other than the name.

s!\h*--.+$!!gm for @array;

# Acquire counts for each record name (key) and print:

my %counts = ();

for (@array) {

$counts{$_}++;

}

foreach my $keys (keys %counts) {

print "$keys = $counts{$keys}\n";

}

Tim A

unread,

Sep 9, 2021, 9:01:39 PM9/9/21

to BBEdit Talk

Just when I think I am making progress understanding regular expressions ...

This surely works, but I see it as attempting a match of just three characters - a non return followed by a return followed by another non return. Can you offer some insight why it does work?

Bruce Van Allen

unread,

Sep 10, 2021, 12:15:17 AM9/10/21

to BBEdit Talk

Tim A wrote on 2021-09-09 5:57 PM:
> Just when I think I am making progress understanding regular
expressions ...
> This surely works, but I see it as attempting a match of just three
characters - a non return followed by a return followed by another non
return. Can you offer some insight why it does work?

>> On Sep 9, 2021, at 6:29 PM, Bruce Van Allen <b...@cruzio.com>
>> Does it work? Not over here,

I was wrong.

Here's what happens:

Find: ([^\r])\r([^\r])

Replace: \1 ** \2

Using Replace All, it goes through steps

Starting:

Peter Frost
25:34
yes

Here it finds the last 't' in Frost, the return following that, and the
'2' that starts the next line, yielding:

Peter Frost ** 25:34
yes

Then it finds the '4' in 25:34, the return after that, and the 'y' that
starts the next line.

Bingo:
Peter Frost ** 25:34 ** yes

Kerri's method was correct!

--
- Bruce

_bruce__van_allen__santa_cruz__ca_

Christopher Stone

unread,

Sep 10, 2021, 1:01:05 AM9/10/21

to BBEdit-Talk

On Sep 09, 2021, at 23:15, Bruce Van Allen <b...@cruzio.com> wrote:

Kerri's method was correct!

Indeed.

It's a great example of how useful a very simple regular expression can be.

:-)

--

Best Regards,

Chris

Howard

unread,

Sep 11, 2021, 12:29:08 PM9/11/21

to BBEdit Talk

When I run Kerri's method, I am getting these results, which differ from hers:

Peter Fros t -- 25:3 4 -- yes

Elle Clau s -- 30:0 5 -- Third line

Elle Clau s -- 30:2 5 -- Third lin e -- Fourth line

Cary Clar k -- 31:2 1 -- done

Anyone know why I am getting the extra spaces, e.g., the one before the "t" in "Frost"? My Find/Replace code matches hers.

Bruce Van Allen

unread,

Sep 11, 2021, 1:03:12 PM9/11/21

to bbe...@googlegroups.com

Howard wrote on 2021-09-11 9:29 AM:
> When I run Kerri's method, I am getting these results, which differ from
> hers:
>
> Peter Fros t -- 25:3 4 -- yes
> Elle Clau s -- 30:0 5 -- Third line
> Elle Clau s -- 30:2 5 -- Third lin e -- Fourth line
> Cary Clar k -- 31:2 1 -- done
>
> Anyone know why I am getting the extra spaces, e.g., the one before the
> "t" in "Frost"? My Find/Replace code matches hers.

Check to see whether your Replace expression has a space at the start. I
can duplicate your results with this (within the single quotes):
' \1 -- \2'

Note the space at the start.

HTH

--
- Bruce

_bruce__van_allen__santa_cruz__ca_

Howard

unread,

Sep 11, 2021, 2:22:32 PM9/11/21

to BBEdit Talk

Bruce,

That fixed it. Thank you very much for spotting the problem,

Howard

Reply all

Reply to author

Forward