Including a Unix filter in a recorded script

74 views
Skip to first unread message

fgf

unread,
May 28, 2018, 12:59:44 PM5/28/18
to BBEdit Talk
I have a very large file (a 20 MB Thunderbird .mbox mail file) which contains many base64 sections interleaved in the text. 
I want to decode in-line the base64 sections back into readable text while keeping the non-base64 text.

I made a Find pattern to select the base64 sections:
  (^[0-9a-zA-z+/]{76}\r)+.*\r

Then I run a Unix Filter "decode_selected_base64.sh" to replace the selection with the decoded base64:
  #!/bin/sh
  base64 -D "$1"

 I can thus process them one-by-one by
  cmd-G to find next block of base64
  clicking Run on "decode_selected_base64.sh" in the Unix Filters palette to convert it to text

This works, but is very laborious!

Does anyone know of a way to
 a) combine both the Find and Filter into a single Filter or Script, and
 b) have it run through the entire file converting all instances?

I'm using BBEdit 9.6.3 under Mac OS X 10.12.6, but could upgrade BBEdit if necessary.

Many thanks,

Luc Bressinck

unread,
May 28, 2018, 1:04:40 PM5/28/18
to bbe...@googlegroups.com
Maybe AppleScript is the answer.
BBEdit is scriptable.


--
This is the BBEdit Talk public discussion group. If you have a
feature request or would like to report a problem, please email
"sup...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: <http://www.twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To post to this group, send email to bbe...@googlegroups.com.
Visit this group at https://groups.google.com/group/bbedit.

Luc Bressinck
August Wautersstraat 40C
B-9140 Temse
Belgium
+32 3 771 20 10 
+32 497 55 08 66
bressi...@scarlet.be



Christopher Stone

unread,
May 29, 2018, 4:43:41 PM5/29/18
to BBEdit-Talk
On 05/28/2018, at 11:59, fgf <ga...@jacqcad.com> wrote:
I have a very large file (a 20 MB Thunderbird .mbox mail file) which contains many base64 sections interleaved in the text.  

I want to decode in-line the base64 sections back into readable text while keeping the non-base64 text.

I made a Find pattern to select the base64 sections:


Hey There,

You could do something like this with AppleScript.

----------------------------------------------------------------
# Auth: Christopher Stone
# dCre: 2018/05/29 15:00
# dMod: 2018/05/29 15:00 
# Appl: BBEdit
# Task: Decode Base64 Segments in front document.
# Libs: None
# Osax: None
# Tags: @Applescript, @Script, @BBEdit, @Decode, @Base64, @Segments, @Front, @Document
----------------------------------------------------------------

set itemFound to true

tell application "BBEdit"
    tell front text window's text
        
        select insertion point before character 1
        
        repeat while itemFound
            set findRecord to find "(^[0-9a-zA-z+/]{76}\\r)+.*\\r" options {search mode:grepcase sensitive:false}
            
            if found of findRecord ≠ true then
                set itemFound to false
            else
                select found object of findRecord
                set dataStr to contents of text of found object of findRecord
                set decodedStr to decodeBase64(dataStrof me
                if decodedStr ≠ dataStr then set contents of selection to decodedStr
            end if
            
        end repeat
        
    end tell
end tell

----------------------------------------------------------------
--» HANDLERS
----------------------------------------------------------------
on decodeBase64(dataStr)
    set shCMD to "
   base64 -D <<< " & quoted form of dataStr
    do shell script shCMD
end decodeBase64
----------------------------------------------------------------

It will take your regex pattern, find the next match, transform it, and repeat until done.

You can run it from the Apple Script Editor.app, or BBEdit's script menu.

On a 20 MB file it will probably take quite a while to run.

am concerned that some of your base64 blocks will be embedded images or files, so you may end up with some gibberish in the decoded material.



If I was doing this on any regular basis I'd take a careful look at this:


It works well for simple cases, unfortunately I don't know offhand how to make it work in a big file with interspersed base64 sections.

Even so – someone has done this job before – and some research ought to bear fruit.

--
Best Regards,
Chris

Reply all
Reply to author
Forward
0 new messages