Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Need help with BBEdit script to delete large blocks of text
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  8 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
shirasagi@earthlink.net  
View profile  
 More options Jan 4 2011, 11:14 am
From: "shiras...@earthlink.net" <shiras...@earthlink.net>
Date: Tue, 4 Jan 2011 08:14:37 -0800 (PST)
Local: Tues, Jan 4 2011 11:14 am
Subject: Need help with BBEdit script to delete large blocks of text
Hello:

Just joined the group. I hope someone here can help.
The problem: when I archive my e-mail Inbox (Apple Mail), images and
graphics are saved as enormously long, unintelligible strings of
alphanumeric characters. I want to keep the archived "mbox" text files
but remove these big blocks of text, to reduce the files' sizes.
Can BBEdit, on its own or using Unix command-line operations, do
this?
I suppose the solution is something like this: many of the text
strings begin with the same set of three or four characters. I want
the BBEdit script or command to search for these sets, then delete
them and everything following until it reaches the string "--Apple-
Mail."
I'll insanely appreciate any help.

Cheers,

shirasagi


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
LuKreme  
View profile  
 More options Jan 4 2011, 2:06 pm
From: LuKreme <krem...@kreme.com>
Date: Tue, 4 Jan 2011 12:06:04 -0700
Local: Tues, Jan 4 2011 2:06 pm
Subject: Re: Need help with BBEdit script to delete large blocks of text
On 4-Jan-2011, at 09:14, shiras...@earthlink.net wrote:

> Can BBEdit, on its own or using Unix command-line operations, do
> this?

Yes, but there are better tools, specifically MIME-aware tools. Or even perl.

A grep for lines that are 76 characters long and contain no spaces or punctuation would match the actual encoded attachments, but grabbing the MIME boundaries is trickier.

^[+a-z\/0-9]{76}$

This will match all the lines in the attachment except the last one.

> I suppose the solution is something like this: many of the text
> strings begin with the same set of three or four characters. I want
> the BBEdit script or command to search for these sets, then delete
> them and everything following until it reaches the string "--Apple-
> Mail."

That way madness lies. Either get a MIME aware tool that will strip the MIME attachments from the mbox file, or simply strip the encoded lines and don't worry about the boundary lines (or at least deal with those in a second step).

your attachment will look something like this:

--Apple-Mail-2-649664677
Content-Disposition: inline;
        filename*=iso-8859-1''GMT%%A0Receipt.pdf
Content-Type: application/pdf;
        name="=?iso-8859-1?Q?GMT=A0Receipt.pdf?="
Content-Transfer-Encoding: base64

JVZERi0xLjMKJcTl8uXrp/Og0MTGCjQgMCBvYmoKPDwgL0xlbmd0aCA1IDAgUiAvRmlsdGVyIC9 G
bGF0ZURlY29kZSA+PgpddHJlYW0KeAHNW1lz3LgRfsevQLn8QLkyIx7g5afIWtlxsutL46Qq2Tz I
knVsNBrZI23if5+vG2gQIClrOONUxXYNSRBEd3994vAX/V5/0Vmj23peFZXRZV7Ym6qu5gWa9Nf P
+m/6Ru8frjN9utYZ/12ffvcjhY/Ox9uDhj3X+68w6MVap/MqLdo8q8buFBFr56Y0ZalNwXeVXuo 8

BUT, that --Apple-Mail line will appear multiple times in the email (In one message with a single attachment, "--Apple-Mail-" appeared 8 times), so you cannot just willy-nilly delete everything up to one of those line.

--
Elves are wonderful. They provoke wonder.  Elves are marvellous. They
cause marvels.  Elves are fantastic. They create fantasies.  Elves are
glamorous. They project glamour.  Elves are enchanting. They weave
enchantment.  Elves are terrific. They beget terror.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
shirasagi  
View profile  
 More options Jan 4 2011, 3:07 pm
From: shirasagi <shiras...@earthlink.net>
Date: Tue, 4 Jan 2011 12:07:05 -0800 (PST)
Local: Tues, Jan 4 2011 3:07 pm
Subject: Re: Need help with BBEdit script to delete large blocks of text
LuKreme:

Thanks for your help.

> Yes, but there are better tools, specifically MIME-aware tools. Or even perl.

Can you recommend a "MIME-aware tool"? I did a Google search and
quickly found PINE, but I'd like to know if there are others better
suited to my task. I'll look into PERL.

> A grep for lines that are 76 characters long and contain no spaces or punctuation would match the actual encoded attachments, but grabbing the MIME boundaries is trickier.

> ^[+a-z\/0-9]{76}$

> This will match all the lines in the attachment except the last one.

Thanks for the sample script. My best guess was something similar, but
I would never have thought of including a limit to the number of
characters in a line.

> your attachment will look something like this:

[...]

> BUT, that --Apple-Mail line will appear multiple times in the email (In one message with a single attachment, "--Apple-Mail-" appeared 8 times), so you cannot just willy-nilly delete everything up to one of those line.

Interesting. Is it thus too difficult—or impossible—in BBEdit to write
a script that will find lines, e.g., 76 characters long, with no
spaces or punctuation, delete them until comes to a line that begins
"--Apple Mail," move on to the next matching lines, and repeat the
process until the end of file?
Thanks again for your help.

Cheers,

shirasagi


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Matt Martini  
View profile  
 More options Jan 4 2011, 5:19 pm
From: Matt Martini <matt.mart...@gmail.com>
Date: Tue, 4 Jan 2011 17:19:05 -0500
Local: Tues, Jan 4 2011 5:19 pm
Subject: Re: Need help with BBEdit script to delete large blocks of text
Shirasagi,

I agree with LuKreme that you probably don't want to start mucking with
the mbox files without a MIME aware tool lest you run the risk of
corrupting the mobx. I would not even attempt this w/o perl and
MIME::Tools (or equiv.)

It might be simpler for you to change the Prefs in Apple Mail so that
the "Keep copies of messages for offline viewing" option (under
Accounts->Advanced) was set to "All messages, but omit attachments"

Good Luck
Matt

On Jan 4, 2011, at 11:14 AM, shiras...@earthlink.net wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David Alexander  
View profile  
 More options Jan 4 2011, 8:03 pm
From: David Alexander <listm...@cox.net>
Date: Tue, 04 Jan 2011 18:03:30 -0700
Local: Tues, Jan 4 2011 8:03 pm
Subject: Re: Need help with BBEdit script to delete large blocks of text
On Tue, 4 Jan 2011 08:14:37 -0800 (PST), "shiras...@earthlink.net"

<shiras...@earthlink.net> wrote:
>Hello:

>Just joined the group. I hope someone here can help.
>The problem: when I archive my e-mail Inbox (Apple Mail), images and
>graphics are saved as enormously long, unintelligible strings of
>alphanumeric characters. I want to keep the archived "mbox" text files
>but remove these big blocks of text, to reduce the files' sizes.

It might be a lot easier to remove the attachments before archiving.
In Mail create a new smart folder with the rule "Contains
Attachments".  You could restrict it to searching the folders you
intend to archive if needed.  Then go into that smart folder, select
all the emails, go to the "Messages" menu and select "Remove
Attachments".  Done.

Then archive them.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
LuKreme  
View profile  
 More options Jan 4 2011, 8:24 pm
From: LuKreme <krem...@kreme.com>
Date: Tue, 4 Jan 2011 18:24:50 -0700
Local: Tues, Jan 4 2011 8:24 pm
Subject: Re: Need help with BBEdit script to delete large blocks of text
On 4-Jan-2011, at 13:07, shirasagi wrote:

> Interesting. Is it thus too difficult—or impossible—in BBEdit to write
> a script that will find lines, e.g., 76 characters long, with no
> spaces or punctuation, delete them until comes to a line that begins
> "--Apple Mail," move on to the next matching lines, and repeat the
> process until the end of file?

The trouble is, those MIME lines are BOUNDARIES, so they exist at the beginning and end of each MIME part. Also, the message is marked as multipart. If you simply delete the content of the mime part and the ending boundary, you will effectively destroy the message from being properly read by most programs.

I don't have specific recommendations as the tools to do this sort of manipulation on messages are 1) command-line tools or libraries 2) tricksy 3) dangerous.

I would start over with thinking about exactly what the problem is you're trying to solve (personally, I don't want to keep emails without keeping all their contents, but that's not to say that others might feel differently).

Just as an example, if your actual need is that you want a mbox of just plain text emails without HTML, attachments, or any 'extraneous' data, then I would pipe the mbox through formail -s procmail and call a simple procmail recipe that called the command-line web browser links (or lynx) with a -dump option. I use to do this automatically for HTML email back 15 years ago or so.

$ links -dump www.google.com
   _________________________________________
   _________________________________________
   _________________________________________
   _________________________________________
   _________________________________________
   _________________________________________
   _________________________________________
   Web Images Videos Maps News Shopping Gmail more >>
   iGoogle | Settings | Sign in
                                     Google

  __________________________________________________________ Advanced            
           [ Google Search ] [ I'm Feeling Lucky ]           SearchLanguage      
                                                             Tools              

               Advertising ProgramsBusiness SolutionsAbout Google

                               (c) 2011 - Privacy

There is MIMEdefang, which is a tool designed to work with sendmail (or sendmail replacements that support milters, like postfix) and also demime which may or may not help. But as I said, these are low level tools designed to be used by people who really REALLY know what they are doing and I don't recommend them. And they are beyond the scope of this list.

SHort answer: other than writing a perl script that you execute from within BBEdit anything other than simply deleting the data lines is likely to screw up the mbox file. Deleting the data lines should not alter the messages other than to remove all encoded content. Be aware that some emails will ONLY be encoded content, however. It is possible you will lose the entire body of the message doing this, depending on how the messages were encoded.

--
Be careful what you wish for. You never know who will be listening.  Or
what, for that matter.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Marc Reavis  
View profile  
 More options Jan 5 2011, 5:18 pm
From: Marc Reavis <shiras...@earthlink.net>
Date: Wed, 5 Jan 2011 14:18:04 -0800
Local: Wed, Jan 5 2011 5:18 pm
Subject: Re: Need help with BBEdit script to delete large blocks of text
LuKreme:

Thanks for your reply. There's much useful information there.
I'll look into writing a PERL script from within BBEdit, as you suggest. I'm interested in removing only the gobbledygook text into which Mail renders attached or embedded graphics, not the (legible) content of the e-mail message itself.
I've already looked over formail's man page, and will give that option a try.
As for MIMEdefang and other MIME-aware tools, they look pretty formidable and beyond my needs (not to mention my comprehension).

Regards,

shirasagi

On Jan 4, 2011, at 5:24 PM, LuKreme wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
luminos...@gmail.com  
View profile  
 More options Jan 8 2011, 12:04 pm
From: <luminos...@gmail.com>
Date: Sat, 8 Jan 2011 09:04:53 -0800
Local: Sat, Jan 8 2011 12:04 pm
Subject: Re: Need help with BBEdit script to delete large blocks of text
Hi Marc

On 1/5/11 at 2:18 PM, shiras...@earthlink.net (Marc Reavis) wrote:

>I'm interested in removing only the gobbledygook text into
>which Mail renders attached or embedded graphics

I was browsing an instructional video site and saw the following
snippet. Perhaps it will be helpful for you.

>Enhanced AppleScript for Extracting Email Attachments

http://www.screencastsonline.com/index_files/SCO0250-macmontage15.php

-Said


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »