Newbie question: Find terms in list and highlight

51 views
Skip to first unread message

Zoe Barnett

unread,
Jun 22, 2021, 9:25:53 AM6/22/21
to BBEdit Talk

Hello. I'm a BBEdit user and not familiar with scripting. I'm looking for a script to do this:

I have a file list.txt like this:

------------

dog

cat

mouse

bear

report

popular pet

recent report

----------


And another file content.txt like this:

-------------

A recent report showed that a dog is the most popular pet, closely followed by cat and bird. Unsurprisingly, no survey respondents liked snakes.

(para 2)

(para 3)

-----------------


I want a script that finds all the phrases in list.txt, highlights them in content.txt, and outputs a text file like this:

---------------

dog 1

cat 1

mouse 0

bear 0

report 1

popular pet 1

recent report 1

------------------


The main thing is that the search terms can be more than one word.

Is there a script out there?

MediaMouth

unread,
Jun 22, 2021, 11:02:27 AM6/22/21
to bbe...@googlegroups.com
Not necessarily a ready-made script, but plenty of scripting languages can do that job in a few lines of code.

Here's a question for you:
How would the word "bearmouse" be counted?  As neither bear nor mouse because it's one word? Or as 1 bear and 1 mouse?

On Jun 22, 2021, at 06:25, Zoe Barnett <zbr...@gmail.com> wrote:


--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/79f5f025-8a5c-4c10-8261-49119ea49111n%40googlegroups.com.

Zoe Barnett

unread,
Jun 22, 2021, 4:02:44 PM6/22/21
to BBEdit Talk
On Tuesday, 22 June 2021 at 18:02:27 UTC+3 Harvey Pikelberger wrote:

Here's a question for you:
How would the word "bearmouse" be counted?  As neither bear nor mouse because it's one word? Or as 1 bear and 1 mouse?

 As neither bear nor mouse. I need to count distinct words or exact phrases separated by spaces on either side or followed by punctuation like , : ; .  So 
popular pet,
(followed by a comma) would get a hit, but
popular petshops
wouldn't.

Media Mouth

unread,
Jun 22, 2021, 10:22:31 PM6/22/21
to bbe...@googlegroups.com

On Jun 22, 2021, at 12:59 PM, Zoe Barnett <zbr...@gmail.com> wrote:

 As neither bear nor mouse. I need to count distinct words or exact phrases separated by spaces on either side or followed by punctuation like , : ; .  So 
popular pet,
(followed by a comma) would get a hit, but
popular petshops
wouldn't.

There are a lot of ways to approach this.
The BBEdit folks are brilliant w/ AppleScript and integrating that with BBEdit.
There are also ways to approach this using command line which is very fast, a little cryptic.
Here's a very simple, probably overly simple NodeJS approach, which can be modified to work in a web browser.
You would no doubt customize this to suit the nuances and particulars of your workflow.

fs = require('fs'); //file handling library for NodeJS
wordList = fs.readFileSync(__dirname + '/list.txt').toString().trim().split('\n'); //get the values in "list.txt"
content = ' ' + fs.readFileSync(__dirname + '/content.txt').toString().trim() + ' '; //gets the content
output = []; //initialize output variable
wordList.forEach(thisSearch => { //iterate through your list of words
var thisRegex, thisRslt, thisCount;
thisRegex = new RegExp('[^a-zA-Z]' + thisSearch + '[^a-zA-Z]', 'g'); //Set up the search / count
thisRslt = content.match(thisRegex); // execute the count
thisCount = (thisRslt == null ? 0 : thisRslt.length); // turn "null" results into zero
console.log(thisSearch, thisCount, thisRslt); //progress outputted to console
output.push(thisSearch + "\t" + thisCount); //add to the output variable
});
fs.writeFileSync(__dirname + '/output.txt', output.join('\n')); //puts the final result in a file named "output.txt"



Zoe Barnett

unread,
Jun 23, 2021, 3:08:44 AM6/23/21
to BBEdit Talk
Hello Harvey... Wow. This is very kind of you. Thank you! 

I hope to take some time this weekend to see if I can make this work. I'll let you know how it goes.

Zoe

MediaMouth

unread,
Jun 23, 2021, 12:27:42 PM6/23/21
to bbe...@googlegroups.com
Great, LMK

On Jun 23, 2021, at 00:08, Zoe Barnett <zbr...@gmail.com> wrote:

Hello Harvey... Wow. This is very kind of you. Thank you! 
--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.

jj

unread,
Jun 27, 2021, 2:47:24 PM6/27/21
to BBEdit Talk
Hi Zoe,

Very interesting topic. The more you think of it the more questions it brings:

  • what about capitalization, should "Dog" or "DOG" count as "dog"?
  • what about plurals, should "dogs" count as "dog"?
  • what about expressions containing sub-expressions, should the "dog" in "dog owner", "hot dog", etc. count as dog?
  
Here is a bit of AppleScript trying (naively) to solve those questions.

 1. It asks for a list.txt file.
 2. It asks for a content.txt file.
 3. It massages both files to produce the required statistics.
 4. It presents the statistics in a new BBEdit window.
 5. It copies the regular expression used to find the terms in BBEdit's Find Window 
    so you can see what was found if you search the Content.txt file.
    You can check the "Show matches" option of the Find Window if you want to have the terms highlighted.
    
Copy it to the Script Editor or save it in BBEdit's Scripts folder if you want to be able to use if from BBEdit's script menu.

I have left out this last question:

    • what about synonyms, should "chihuahua" count as "dog"?

It can be solved by using the Text > Canonize... function of BBEdit when preparing the content.txt file.

HTH

Jean Jourdain

--

```applescript

set vListFile to choose file with prompt "Choose a Terms File (with list of terms to count):" of type {"Txt", "Text"}

set vListPath to POSIX path of (vListFile as string)


set vCommand to "cat" & space

set vCommand to vCommand & (the quoted form of vListPath) & space

set vCommand to vCommand & "| sed '/^$/d'" & space -- Remove blank lines.

set vCommand to vCommand & "| sort --reverse" -- Sort in reverse order for longer expression to be searched before shorter i.e: "dog owner" before "dog".

set vList to do shell script vCommand

set vTerms to (paragraphs of vList) as list

set {vPreviousDelimiters, AppleScript's text item delimiters} to {AppleScript's text item delimiters, "|"}

set vRegex to "(" & (vTerms as string) & ")s?" -- Naive use of s? to include plural forms.

log vRegex

set AppleScript's text item delimiters to vPreviousDelimiters


set vContentsFile to choose file with prompt "Choose a Contents File (with contents to analyze):" of type {"Txt", "Text"}

set vContentsPath to POSIX path of (vContentsFile as string)

set vCommand to "/usr/local/bin/bbfind" & space

set vCommand to vCommand & (the quoted form of vRegex) & space

set vCommand to vCommand & "--grep --match-words --extract" & space

set vCommand to vCommand & (the quoted form of vContentsPath) & space

set vCommand to vCommand & "| tr [[:upper:]] [[:lower:]]" & space -- Convert result to lowercase so we count together "dog", "Dog", "DOG", etc.

set vCommand to vCommand & "| sed 's/s$//'" & space -- Remove tailing 's' so we count together "dog" and "dogs".

set vCommand to vCommand & "| sort" & space -- Sort ascending.

set vCommand to vCommand & "| uniq -c" & space -- Count unique values.

set vCommand to vCommand & "| sort --numeric-sort --reverse" -- Sort results numerically descending.

set vResult to do shell script vCommand


tell application "BBEdit"

make new text document with properties {contents:vResult}

activate find window

set vWindow to find window

delay 0.2

set text 1 of vWindow to vRegex

end tell

```

Zoe Barnett

unread,
Jun 29, 2021, 5:33:52 AM6/29/21
to BBEdit Talk
Hello again Harvey. Thanks to some wonderful resources on the internet, even without understanding a thing about node.js I got this working. Thank you!

Is there a way to make it case insensitive? So "Dog", "DOG" and "dog" all get counted as dog?

Zoe

MediaMouth

unread,
Jun 29, 2021, 8:36:36 AM6/29/21
to bbe...@googlegroups.com
Glad that worked!

Try this...

fs = require('fs'); //file handling library for NodeJS
wordList = fs.readFileSync(__dirname + '/list.txt').toString().trim().split('\n'); //get the values in "list.txt"
content = ' ' + fs.readFileSync(__dirname + '/content.txt').toString().trim() + ' '; //gets the content
output = []; //initialize output variable
wordList.forEach(thisSearch => { //iterate through your list of words
var thisRegex, thisRslt, thisCount;
thisRegex = new RegExp('[^a-zA-Z]' + thisSearch.toLowerCase() + '[^a-zA-Z]', 'g'); //Set up the search / count
thisRslt = content.toLowerCase().match(thisRegex); // execute the count
thisCount = (thisRslt == null ? 0 : thisRslt.length); // turn "null" results into zero
console.log(thisSearch, thisCount, thisRslt); //progress outputted to console
output.push(thisSearch + "\t" + thisCount); //add to the output variable
});
fs.writeFileSync(__dirname + '/output.txt', output.join('\n')); //puts the final result in a file named "output.txt"

On Jun 29, 2021, at 02:33, Zoe Barnett <zbr...@gmail.com> wrote:

Hello again Harvey. Thanks to some wonderful resources on the internet, even without understanding a thing about node.js I got this working. Thank you!
--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages