Process Duplicate Lines and count them

361 views
Skip to first unread message

James Reynolds

unread,
Jul 20, 2022, 4:47:43 PM7/20/22
to BBEdit Talk
I use the Process Duplicate Lines feature a lot and I'm discovering I'm trying to count them more and more. Does anyone know if there is a way to remove duplicate lines and put the number of lines that were removed next to them?

James Reynolds

Roland Küffner

unread,
Jul 26, 2022, 10:52:22 AM7/26/22
to BBEdit Talk
Hi James,

my approach would include two steps:
1. Text Menu > „Add/Remove Line Numbers“ (adding them (of course) – with one space after the number)
2. Text > „Process Duplicate Lines“ (options: Leaving one; Duplicate to new doc; Delete duplicate lines; Match using pattern (All sub-patterns):
^\d+ (.+)

This creates a new document with the duplicate lines and their line numbers. You might record this easy steps with the Script Editor and save it for future use. My recording gave me this script:

tell application "BBEdit"

select text 1 of text window 1

add line numbers selection of text window 1 with adding space

process duplicate lines selection of text window 1 duplicates options {match mode:leaving_one, match pattern:"^\\d+ (.+)", match subpattern key:all_subpatterns} output options {deleting duplicates:true, duplicates to new document:true}

end tell


Of course you could change the options to your likings (it could be an option to copy the Duplicates to the clipboard and add an additional step to remove the line numbers in the original document.
(And just a friendly reminder: it is always a good idea to try such stuff on a copy, not on the original data – I cannot be responsible for the integrety of your data and potential mischief …)



On Wed, Jul 20, 2022 at 10:47 PM James Reynolds <justanot...@gmail.com> wrote:
I use the Process Duplicate Lines feature a lot and I'm discovering I'm trying to count them more and more. Does anyone know if there is a way to remove duplicate lines and put the number of lines that were removed next to them?

James Reynolds

--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/6E420F07-C8F8-47C7-8903-8ADEC3F8931A%40gmail.com.

James Reynolds

unread,
Jul 26, 2022, 1:04:31 PM7/26/22
to bbe...@googlegroups.com
Thank you! Although I didn't use your solution, trying to get it to work the way I wanted led me down a rabbit hole until I finally came up with this text filter (it's been a long time since I've written AppleScript.)

#!/usr/bin/perl -w

use strict;

my $bucket;
my $counter;
while (<>) {
if ( not defined $bucket ) {
$bucket = $_;
$counter = 1;
} elsif ( $_ ne $bucket ) {
print "$counter $bucket";
$counter = 1;
$bucket = $_;
} else {
$counter++;
}
}
print "$counter $bucket\n";

Thanks again!

James

James Reynolds

unread,
Jul 26, 2022, 1:07:56 PM7/26/22
to BBEdit Talk
Just for completeness, I decided to make a version that didn't require sorted lines.

#!/usr/bin/perl -w

use strict;

my $buckets = {};
while (<>) {
if ( not defined $buckets->{$_} ) {
$buckets->{$_} = 1;
} else {
$buckets->{$_}++;
}
}
for my $key ( sort keys %$buckets ) {
print "$buckets->{$key} $key";
}

James

On Jul 26, 2022, at 8:52 AM, Roland Küffner <medien...@gmail.com> wrote:

Reply all
Reply to author
Forward
0 new messages