Need way of counting number of occurrences of names

79 views
Skip to first unread message

Howard

unread,
Sep 8, 2021, 11:05:08 AM9/8/21
to BBEdit Talk
I have multiple instances of this sample data (below) in which each observation has at least three lines and is separated by a blank line. In each observation, the first line contains a name, the second a time, and then there is one or more lines of text:

SAMPLE DATA
Peter Frost
25:34
yes

Elle Claus
30:05
Third line

Elle Claus
30:25
Third line
Fourth line

Cary Clark
31:21
done
________________
First, I would like the data organized so that each observation's lines are combined and separated by two dashes: 

SAMPLE COMBINED DATA
Peter Frost -- 25:34 -- yes
Elle Claus -- 30:05 -- Third line
Elle Claus -- 30:25 -- Third line -- Fourth line
Cary Clark -- 31:21 -- done
________________
Then, I would like a count of the observations by name:

SAMPLE COUNTED DATA
Peter Frost 1
Elle Claus 2
Cary Clark 1

I am familiar with using Unix command solutions.

Thanks,
Howard

Media Mouth

unread,
Sep 8, 2021, 11:59:41 AM9/8/21
to bbe...@googlegroups.com
Hi Howard, here's a NodeJS version
Put the following in a file named "converter.js"
To run in Terminal: node path/to/converter.js path/to/source


fs = require("fs"); //Load two npm packages
path = require("path");

sourcePath = process.argv[2]; //Set filepaths
formattedPath = path.join(path.dirname(sourcePath), path.parse(sourcePath).name + "-Formatted" + path.extname(sourcePath));
talliedPath = path.join(path.dirname(sourcePath), path.parse(sourcePath).name + "-Tallied" + path.extname(sourcePath));
nameTally = {};

content = fs.readFileSync(sourcePath).toString().trim().split("\n\n"); //get, process, and write formatted content
content.forEach((thisEntry, entryCount) => {
thisEntry = thisEntry.split("\n");
let thisName = thisEntry[0];
nameTally[thisName] = nameTally[thisName] ? nameTally[thisName] + 1 : 1;
content[entryCount] = thisEntry.join(" -- ");
});
fs.writeFileSync(formattedPath, content.join("\n"));

nameTallyOutput = []; //prep & write the nameTally file
for (thisName in nameTally) content.push(thisName + ": " + nameTally[thisName]);
fs.writeFileSync(talliedPath, content.join("\n"));


--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/89eb8047-72c1-4d21-b46d-fa9ad0225425n%40googlegroups.com.

Kerri Hicks

unread,
Sep 8, 2021, 12:01:07 PM9/8/21
to bbe...@googlegroups.com
This worked for me, for the first part.

Screen Shot 2021-09-08 at 11.59.25 AM.png

The second part I think would have to be scripted.

--Kerri

--

Christopher Stone

unread,
Sep 8, 2021, 4:40:42 PM9/8/21
to BBEdit-Talk
On Sep 08, 2021, at 10:05, Howard <leadwi...@gmail.com> wrote:
I have multiple instances of this sample data (below) in which each observation has at least three lines and is separated by a blank line. In each observation, the first line contains a name, the second a time, and then there is one or more lines of text:


Hey Howard,

This is a bit of a chore but not uber difficult using a Perl text filter.


--
Best Regards,
Chris



#!/usr/bin/env perl -0777 -nsw
# -------------------------------------------------------------
# Auth: Christopher Stone <script...@thestoneforge.com>
# dCre: 2021/09/08 15:31
# dMod: 2021/09/08 15:31 
# Task: Concatenate Data Records and Count Same-Name-Keys.
# Tags: @ccstone, @Shell, @Script, @Perl, @Howard, @BBEdit-Talk
# -------------------------------------------------------------

# Trim vertical leading and trailing whitespace.
s!\A\s+|\s+\Z!!g;

# Trim trailing horizontal whitespace.
s!\h+$!!gm;

# Split the text records into an array.
my @array = split(/\n{2}/, $_);

# Concatenate the individual text records into a single line.
s!\n! -- !gm for @array;

# Set the Output Separator character.
$, = "\n";

print @array;
print "\n\nCounts:\n\n";

# Remove all text from records other than the name.
s!\h*--.+$!!gm for @array;

# Acquire counts for each record name (key) and print:
my %counts = ();

for (@array) {
   $counts{$_}++;
}

foreach my $keys (keys %counts) {
   print "$keys = $counts{$keys}\n";
}



Tim A

unread,
Sep 9, 2021, 9:01:39 PM9/9/21
to BBEdit Talk

Just when I think I am making progress understanding regular expressions ...
This surely works, but I see it as attempting a match of just three characters - a non return followed by a return followed by another non return. Can you offer some insight why it does work?

Bruce Van Allen

unread,
Sep 10, 2021, 12:15:17 AM9/10/21
to BBEdit Talk
Tim A wrote on 2021-09-09 5:57 PM:
> Just when I think I am making progress understanding regular
expressions ...
> This surely works, but I see it as attempting a match of just three
characters - a non return followed by a return followed by another non
return. Can you offer some insight why it does work?

>> On Sep 9, 2021, at 6:29 PM, Bruce Van Allen <b...@cruzio.com>
>> Does it work? Not over here,

I was wrong.

Here's what happens:

Find: ([^\r])\r([^\r])

Replace: \1 ** \2

Using Replace All, it goes through steps

Starting:

Peter Frost
25:34
yes

Here it finds the last 't' in Frost, the return following that, and the
'2' that starts the next line, yielding:

Peter Frost ** 25:34
yes

Then it finds the '4' in 25:34, the return after that, and the 'y' that
starts the next line.

Bingo:
Peter Frost ** 25:34 ** yes

Kerri's method was correct!



--
    - Bruce

_bruce__van_allen__santa_cruz__ca_

Christopher Stone

unread,
Sep 10, 2021, 1:01:05 AM9/10/21
to BBEdit-Talk
On Sep 09, 2021, at 23:15, Bruce Van Allen <b...@cruzio.com> wrote:

Kerri's method was correct!


Indeed.

It's a great example of how useful a very simple regular expression can be.

:-)


--
Best Regards,
Chris

Howard

unread,
Sep 11, 2021, 12:29:08 PM9/11/21
to BBEdit Talk
When I run Kerri's method, I am getting these results, which differ from hers:

Peter Fros t -- 25:3 4 -- yes
Elle Clau s -- 30:0 5 -- Third line
Elle Clau s -- 30:2 5 -- Third lin e -- Fourth line
Cary Clar k -- 31:2 1 -- done

Anyone know why I am getting the extra spaces, e.g., the one before the "t" in "Frost"? My Find/Replace code matches hers.

Bruce Van Allen

unread,
Sep 11, 2021, 1:03:12 PM9/11/21
to bbe...@googlegroups.com
Howard wrote on 2021-09-11 9:29 AM:
> When I run Kerri's method, I am getting these results, which differ from
> hers:
>
> Peter Fros t -- 25:3 4 -- yes
> Elle Clau s -- 30:0 5 -- Third line
> Elle Clau s -- 30:2 5 -- Third lin e -- Fourth line
> Cary Clar k -- 31:2 1 -- done
>
> Anyone know why I am getting the extra spaces, e.g., the one before the
> "t" in "Frost"? My Find/Replace code matches hers.

Check to see whether your Replace expression has a space at the start. I
can duplicate your results with this (within the single quotes):
' \1 -- \2'

Note the space at the start.

HTH

--
    - Bruce

_bruce__van_allen__santa_cruz__ca_

Howard

unread,
Sep 11, 2021, 2:22:32 PM9/11/21
to BBEdit Talk
Bruce,

That fixed it. Thank you very much for spotting the problem,

Howard

Reply all
Reply to author
Forward
0 new messages