Counting occurrences of names

73 views
Skip to first unread message

Howard

unread,
Oct 6, 2020, 2:29:54 PM10/6/20
to BBEdit Talk
I have the following data (shown below). I want to count how many times each of the four names appear (Harvey Haney, Apple Jones, Banana Herb, Sam Blue). Can this be done in BBEdit? If not, I would appreciate suggestions on how it can be done.

10:45:57 From Harvey Haney : Good morning. How is everyone today?
10:46:08 From Apple Jones : I'm doing good. How are you?
10:46:12 From Banana Herb : Good how are you!
10:46:18 From Sam Blue : I'm doing fine. How are you?
10:45:57 From Harvey Haney : Good morning. How is everyone today?
10:46:08 From Harvey Haney : I'm doing good. How are you?
10:46:12 From Apple Jones : Good how are you!
10:46:18 From Sam Blue : I'm doing fine. How are you?
10:45:57 From Harvey Haney : Good morning.

Fletcher Sandbeck

unread,
Oct 6, 2020, 2:42:38 PM10/6/20
to bbe...@googlegroups.com
I'd do this in two steps. First, isolate all the names using a search/replace with the following pattern with "Grep" turned on. Use the "Extract" button to create a new file.

Pattern: ^.*From (.*?) :.*$
Replace: \1

That gives you a file with just the names in it on separate lines.

Harvey Haney
Apple Jones
...

Save that to your desktop, e.g. to "names.txt" and then use the Terminal to run this command on that file.

cat ~/Desktop/names.txt | sort | uniq -c | bbedit

That pipes your file through sort to put the lines in order, uniq -c to replace each duplicated line with a count of how many duplicates there were, and then pipes it back to a new file in BBEdit which you can save wherever you want.

[fletcher]
> --
> This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
> ---
> You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/379b202c-463e-4d7c-8a26-af63ab83d0aan%40googlegroups.com.

MediaMouth

unread,
Oct 6, 2020, 2:46:38 PM10/6/20
to bbe...@googlegroups.com
You would probably be better of trying to tackle this with a spreadsheet and a couple of calculated columns.
If it's an ongoing tally you want to maintain and / or the dataset it too big for a spreadsheet, consider a database.
But the fastest immediate solution: Use BBEdit not to do the calculation but to scribble up a Node JS script.  From the looks of your dataset, it would take a few minutes to write, and a few seconds to execute.

On Oct 6, 2020, at 11:23 AM, Howard <leadwi...@gmail.com> wrote:

Craig W. Johnson

unread,
Oct 6, 2020, 3:06:28 PM10/6/20
to BBEdit Talk

There may be a better way, but what I would do is open up a search window, enter your target name in the find field, enter “\1\r” in the replace field, and then hit extract. Every instance hit will then be on its own line, and, presuming you have line numbering enabled, you could just check the end line for the count.

You could also just have the replace field be “\r” — you’ll still get the count — but that’s less reassuring.

If you only want occurrences within the “From” field, include that text in your find query.

Craig Johnson
c...@remexpublishing.com

--

Jeffrey Jones

unread,
Oct 6, 2020, 3:44:13 PM10/6/20
to bbe...@googlegroups.com
On 2020 Oct 6, at 14:42, Fletcher Sandbeck <flet...@cumuli.com> wrote:

I'd do this in two steps.

Simplifying on Fletcher's solution, and staying in BBEdit:

Use Find and Extract as Fletcher describes. Don't bother to save the file or go into Terminal. Instead, use 

Text > Run Unix Command…

Enter the command:

sort | uniq -c

Bruce Van Allen

unread,
Oct 6, 2020, 3:45:51 PM10/6/20
to bbe...@googlegroups.com
Hi Howard,

Try a Text Filter.

Here's one in Perl.

###### Save in a file
#!/usr/bin/perl

use strict;
use warnings;
my %names;
while (<>) {
my $name; ($name) = /From ([^:]+?)\s+:/;
$names{$name}++;
}

for my $n (sort keys %names) {
printf qq{%s\t%d\n} => $n, $names{$n};
}
#####

Save this as a BBEdit Text Filter, with a recognizable name such
as count_names.pl; the standard location for Text Filters is:
~/Library/Application Support/BBEdit/Text Filters/

Then you'll see it in BBEdit under Windows -> Palettes -> Text Filters.

Just have your data file open and frontmost, then pick this from
the Text Filters palette; you can assign it a keyboard combo if
you want.

You could also run it from the command line, with your data file
as the argument.

Happy to explain the script if that helps.

HTH
--

- Bruce

_bruce__van_allen__santa_cruz__ca_

Howard

unread,
Oct 6, 2020, 10:30:49 PM10/6/20
to BBEdit Talk
Bruce Van Allen,

Thanks for the PERL solution. I'm new to PERL, but plan to try your solution. It looks quite interesting.

Howard

Howard

unread,
Oct 6, 2020, 10:30:49 PM10/6/20
to BBEdit Talk
Hi Craig,

Thanks for the solution. I plan to try it.

Howard

Howard

unread,
Oct 6, 2020, 10:30:49 PM10/6/20
to BBEdit Talk
Harvey,

What you suggested sounds great; however, I have no idea how to write or run a Node JS script.

Howard

Howard

unread,
Oct 6, 2020, 10:30:49 PM10/6/20
to BBEdit Talk
fletcher,

Thanks for the quick response. I used 

Pattern: ^.*From (.*?) :.*$ 
Replace: \1 

to create a file, then saved it as a text file, and imported that into R where I use the table function to get the output I needed.

Howard

Howard

unread,
Oct 6, 2020, 10:30:49 PM10/6/20
to BBEdit Talk
jajls,

I just tried your solution. It is very easy to use.

Thanks,
Howard

Howard

unread,
Oct 6, 2020, 10:31:12 PM10/6/20
to BBEdit Talk
How can I revise the pattern below so that I can also find all the lines that end with a question mark?

Pattern: ^.*From (.*?) :.*$ 
Replace: \1 

On Tuesday, 6 October 2020 at 2:42:38 pm UTC-4 flet...@cumuli.com wrote:

Tom Robinson

unread,
Oct 6, 2020, 11:15:05 PM10/6/20
to BBEdit Talk
But that pattern already finds lines ending with a question mark…

> On 2020-10-07, at 15:26, Howard <leadwi...@gmail.com> wrote:
>
> How can I revise the pattern below so that I can also find all the lines that end with a question mark?
>
> Pattern: ^.*From (.*?) :.*$
> Replace: \1

If you mean only a question mark, then you have to ‘escape’ the question mark in the pattern with a backslash:

^.*From (.*?) :.*\?$

Cheers

Darren Duncan

unread,
Oct 7, 2020, 8:03:39 AM10/7/20
to bbe...@googlegroups.com, Howard
One simple way in BBEdit is to do a Find using a regular expression that can
natch any of the 4 names, then click the Extract button which will produce a
document with each occurrence on a line, then sort lines, and then if the
document is displaying line numbers its easy to see how many occurrences of each
of the 4 there are using the first and last line numbers per each. -- Darren Duncan

On 2020-10-06 11:23 a.m., Howard wrote:
> I have the following data (shown below). I want to count how many times each of
> the four names appear (Harvey Haney, Apple Jones, Banana Herb, Sam Blue). Can
> this be done in BBEdit? If not, I would appreciate suggestions on how it can be
> done.
>
> 10:45:57From Harvey Haney : Good morning. How is everyone today?
> 10:46:08From Apple Jones : I'm doing good. How are you?
> 10:46:12From Banana Herb : Good how are you!
> 10:46:18From Sam Blue : I'm doing fine. How are you?
> 10:45:57From Harvey Haney : Good morning. How is everyone today?
> 10:46:08From Harvey Haney : I'm doing good. How are you?
> 10:46:12From Apple Jones : Good how are you!
> 10:46:18From Sam Blue : I'm doing fine. How are you?

Rod Buchanan

unread,
Oct 7, 2020, 10:11:31 AM10/7/20
to bbe...@googlegroups.com

Click Text -> Run Unix Command… , then run this:

cut -d' ' -f3-4 | sort | uniq -c

When I copy/paste your data I get these results:

   2 Apple Jones
   1 Banana Herb
   4 Harvey Haney
   2 Sam Blue

-- 
Rod


-- 
Rod

Howard

unread,
Oct 7, 2020, 12:09:35 PM10/7/20
to BBEdit Talk
Rod,

What does the line below do?
cut -d' ' -f3-4 

Howard

Rod Buchanan

unread,
Oct 7, 2020, 11:12:30 PM10/7/20
to 'Duane Murphy' via BBEdit Talk

-d’ ’ tells cut to use the space character as the column/field delimiter.

-f3-4 tells it to only return columns/fields 3 and 4.


From the man page. (Type “man cut” in Terminal.app for complete info.)

cut -- cut out selected portions of each line of a file.  

SYNOPSIS
     cut -f list [-d delim] [-s] [file ...]

DESCRIPTION
     The cut utility cuts out selected portions of each line (as specified by list)
     from each file and writes them to the standard output.  If no file arguments are
     specified, or a file argument is a single dash (`-'), cut reads from the standard
     input.  The items specified by list can be in terms of column position or in
     terms of fields delimited by a special character.  Column numbering starts from
     1.

-d delim
             Use delim as the field delimiter character instead of the tab character.

-f list
             The list specifies fields, separated in the input by the field delimiter
             character (see the -d option.)  Output fields are separated by a single
             occurrence of the field delimiter character.

-- 
Rod

Howard

unread,
Oct 8, 2020, 7:54:21 AM10/8/20
to BBEdit Talk
Rod,

Your explanation is very helpful.

Here is the result your code yields:
   2 Apple Jones
   1 Banana Herb
   4 Harvey Haney
   2 Sam Blue

What if I want the output displayed like this: (How would I get that?)
   4 Harvey Haney
   2 Apple Jones
   2 Sam Blue
   1 Banana Herb

Thus, the names are ordered by the number of times they appear, and if two or more names have the same number of occurrences, they are displayed in alphabetical order.

Howard

Rod Buchanan

unread,
Oct 8, 2020, 2:08:56 PM10/8/20
to 'Duane Murphy' via BBEdit Talk

Add:

| sort -k 1nr,2

To the end of the UNIX command, i.e.

cut -d' ' -f3-4 | sort | uniq -c | sort -k 1nr,2

’-k 1nr’ says sort the first field numerically in reverse order, then sort on the second field ‘2’

Again, “man sort” for more info.

-- 
Rod

@lbutlr

unread,
Oct 10, 2020, 9:42:21 AM10/10/20
to BBEdit Talk
On 07 Oct 2020, at 08:10, Rod Buchanan <li...@sofstats.com> wrote:
> cut -d' ' -f3-4 | sort | uniq -c

And, since there are many way to do this, I generally fall back to awk

awk '{print $3, $4}' | sort | uniq -c

Which does exactly the same thing.


--
showing snuffy is when Sesame Street jumped the shark

Reply all
Reply to author
Forward
Message has been deleted
0 new messages