Just joined the group. I hope someone here can help.
The problem: when I archive my e-mail Inbox (Apple Mail), images and
graphics are saved as enormously long, unintelligible strings of
alphanumeric characters. I want to keep the archived "mbox" text files
but remove these big blocks of text, to reduce the files' sizes.
Can BBEdit, on its own or using Unix command-line operations, do
this?
I suppose the solution is something like this: many of the text
strings begin with the same set of three or four characters. I want
the BBEdit script or command to search for these sets, then delete
them and everything following until it reaches the string "--Apple-
Mail."
I'll insanely appreciate any help.
On 4-Jan-2011, at 09:14, shiras...@earthlink.net wrote:
> Can BBEdit, on its own or using Unix command-line operations, do > this?
Yes, but there are better tools, specifically MIME-aware tools. Or even perl.
A grep for lines that are 76 characters long and contain no spaces or punctuation would match the actual encoded attachments, but grabbing the MIME boundaries is trickier.
^[+a-z\/0-9]{76}$
This will match all the lines in the attachment except the last one.
> I suppose the solution is something like this: many of the text > strings begin with the same set of three or four characters. I want > the BBEdit script or command to search for these sets, then delete > them and everything following until it reaches the string "--Apple- > Mail."
That way madness lies. Either get a MIME aware tool that will strip the MIME attachments from the mbox file, or simply strip the encoded lines and don't worry about the boundary lines (or at least deal with those in a second step).
JVZERi0xLjMKJcTl8uXrp/Og0MTGCjQgMCBvYmoKPDwgL0xlbmd0aCA1IDAgUiAvRmlsdGVyIC9 G bGF0ZURlY29kZSA+PgpddHJlYW0KeAHNW1lz3LgRfsevQLn8QLkyIx7g5afIWtlxsutL46Qq2Tz I knVsNBrZI23if5+vG2gQIClrOONUxXYNSRBEd3994vAX/V5/0Vmj23peFZXRZV7Ym6qu5gWa9Nf P +m/6Ru8frjN9utYZ/12ffvcjhY/Ox9uDhj3X+68w6MVap/MqLdo8q8buFBFr56Y0ZalNwXeVXuo 8
BUT, that --Apple-Mail line will appear multiple times in the email (In one message with a single attachment, "--Apple-Mail-" appeared 8 times), so you cannot just willy-nilly delete everything up to one of those line.
-- Elves are wonderful. They provoke wonder. Elves are marvellous. They cause marvels. Elves are fantastic. They create fantasies. Elves are glamorous. They project glamour. Elves are enchanting. They weave enchantment. Elves are terrific. They beget terror.
> Yes, but there are better tools, specifically MIME-aware tools. Or even perl.
Can you recommend a "MIME-aware tool"? I did a Google search and
quickly found PINE, but I'd like to know if there are others better
suited to my task. I'll look into PERL.
> A grep for lines that are 76 characters long and contain no spaces or punctuation would match the actual encoded attachments, but grabbing the MIME boundaries is trickier.
> ^[+a-z\/0-9]{76}$
> This will match all the lines in the attachment except the last one.
Thanks for the sample script. My best guess was something similar, but
I would never have thought of including a limit to the number of
characters in a line.
> your attachment will look something like this:
[...]
> BUT, that --Apple-Mail line will appear multiple times in the email (In one message with a single attachment, "--Apple-Mail-" appeared 8 times), so you cannot just willy-nilly delete everything up to one of those line.
Interesting. Is it thus too difficult—or impossible—in BBEdit to write
a script that will find lines, e.g., 76 characters long, with no
spaces or punctuation, delete them until comes to a line that begins
"--Apple Mail," move on to the next matching lines, and repeat the
process until the end of file?
Thanks again for your help.
I agree with LuKreme that you probably don't want to start mucking with the mbox files without a MIME aware tool lest you run the risk of corrupting the mobx. I would not even attempt this w/o perl and MIME::Tools (or equiv.)
It might be simpler for you to change the Prefs in Apple Mail so that the "Keep copies of messages for offline viewing" option (under Accounts->Advanced) was set to "All messages, but omit attachments"
Good Luck Matt
On Jan 4, 2011, at 11:14 AM, shiras...@earthlink.net wrote:
> Just joined the group. I hope someone here can help. > The problem: when I archive my e-mail Inbox (Apple Mail), images and > graphics are saved as enormously long, unintelligible strings of > alphanumeric characters. I want to keep the archived "mbox" text files > but remove these big blocks of text, to reduce the files' sizes. > Can BBEdit, on its own or using Unix command-line operations, do > this? > I suppose the solution is something like this: many of the text > strings begin with the same set of three or four characters. I want > the BBEdit script or command to search for these sets, then delete > them and everything following until it reaches the string "--Apple- > Mail." > I'll insanely appreciate any help.
>Just joined the group. I hope someone here can help. >The problem: when I archive my e-mail Inbox (Apple Mail), images and >graphics are saved as enormously long, unintelligible strings of >alphanumeric characters. I want to keep the archived "mbox" text files >but remove these big blocks of text, to reduce the files' sizes.
It might be a lot easier to remove the attachments before archiving. In Mail create a new smart folder with the rule "Contains Attachments". You could restrict it to searching the folders you intend to archive if needed. Then go into that smart folder, select all the emails, go to the "Messages" menu and select "Remove Attachments". Done.
> Interesting. Is it thus too difficult—or impossible—in BBEdit to write > a script that will find lines, e.g., 76 characters long, with no > spaces or punctuation, delete them until comes to a line that begins > "--Apple Mail," move on to the next matching lines, and repeat the > process until the end of file?
The trouble is, those MIME lines are BOUNDARIES, so they exist at the beginning and end of each MIME part. Also, the message is marked as multipart. If you simply delete the content of the mime part and the ending boundary, you will effectively destroy the message from being properly read by most programs.
I don't have specific recommendations as the tools to do this sort of manipulation on messages are 1) command-line tools or libraries 2) tricksy 3) dangerous.
I would start over with thinking about exactly what the problem is you're trying to solve (personally, I don't want to keep emails without keeping all their contents, but that's not to say that others might feel differently).
Just as an example, if your actual need is that you want a mbox of just plain text emails without HTML, attachments, or any 'extraneous' data, then I would pipe the mbox through formail -s procmail and call a simple procmail recipe that called the command-line web browser links (or lynx) with a -dump option. I use to do this automatically for HTML email back 15 years ago or so.
$ links -dump www.google.com _________________________________________ _________________________________________ _________________________________________ _________________________________________ _________________________________________ _________________________________________ _________________________________________ Web Images Videos Maps News Shopping Gmail more >> iGoogle | Settings | Sign in Google
Advertising ProgramsBusiness SolutionsAbout Google
(c) 2011 - Privacy
There is MIMEdefang, which is a tool designed to work with sendmail (or sendmail replacements that support milters, like postfix) and also demime which may or may not help. But as I said, these are low level tools designed to be used by people who really REALLY know what they are doing and I don't recommend them. And they are beyond the scope of this list.
SHort answer: other than writing a perl script that you execute from within BBEdit anything other than simply deleting the data lines is likely to screw up the mbox file. Deleting the data lines should not alter the messages other than to remove all encoded content. Be aware that some emails will ONLY be encoded content, however. It is possible you will lose the entire body of the message doing this, depending on how the messages were encoded.
-- Be careful what you wish for. You never know who will be listening. Or what, for that matter.
Thanks for your reply. There's much useful information there. I'll look into writing a PERL script from within BBEdit, as you suggest. I'm interested in removing only the gobbledygook text into which Mail renders attached or embedded graphics, not the (legible) content of the e-mail message itself. I've already looked over formail's man page, and will give that option a try. As for MIMEdefang and other MIME-aware tools, they look pretty formidable and beyond my needs (not to mention my comprehension).
> The trouble is, those MIME lines are BOUNDARIES, so they exist at the beginning and end of each MIME part. Also, the message is marked as multipart. If you simply delete the content of the mime part and the ending boundary, you will effectively destroy the message from being properly read by most programs.
> I don't have specific recommendations as the tools to do this sort of manipulation on messages are 1) command-line tools or libraries 2) tricksy 3) dangerous.
> I would start over with thinking about exactly what the problem is you're trying to solve (personally, I don't want to keep emails without keeping all their contents, but that's not to say that others might feel differently).
> Just as an example, if your actual need is that you want a mbox of just plain text emails without HTML, attachments, or any 'extraneous' data, then I would pipe the mbox through formail -s procmail and call a simple procmail recipe that called the command-line web browser links (or lynx) with a -dump option. I use to do this automatically for HTML email back 15 years ago or so.
> There is MIMEdefang, which is a tool designed to work with sendmail (or sendmail replacements that support milters, like postfix) and also demime which may or may not help. But as I said, these are low level tools designed to be used by people who really REALLY know what they are doing and I don't recommend them. And they are beyond the scope of this list.
> SHort answer: other than writing a perl script that you execute from within BBEdit anything other than simply deleting the data lines is likely to screw up the mbox file. Deleting the data lines should not alter the messages other than to remove all encoded content. Be aware that some emails will ONLY be encoded content, however. It is possible you will lose the entire body of the message doing this, depending on how the messages were encoded.