Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Deleting unwanted message headers from saved email
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 76 - 94 of 94 - Collapse all  -  Translate all to Translated (View all originals) < Older 
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Steve Hayes  
View profile  
 More options Sep 19 2012, 3:42 pm
Newsgroups: comp.lang.awk
From: Steve Hayes <hayes...@telkomsa.net>
Date: Wed, 19 Sep 2012 21:46:47 +0200
Local: Wed, Sep 19 2012 3:46 pm
Subject: Re: Deleting unwanted message headers from saved email
On Wed, 19 Sep 2012 12:56:13 +0100, dave.gma+news...@googlemail.com.invalid

(Dave Gibson) wrote:
>  # Insert the following line here
>/^-- End --/ { print ; body = 0 ; next }

Brilliant, thanks.

>The gibberish is the message body encoded as base64 -- it's not associated
>with a specific header.

Ah, yes, with that addition I can see that.

--
Steve Hayes from Tshwane, South Africa
Blog: http://khanya.wordpress.com
E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Deleting unwanted message headers from saved email (was: Re: Is awk suitable for this?)" by Ed Morton
Ed Morton  
View profile  
 More options Sep 19 2012, 4:26 pm
Newsgroups: comp.lang.awk
From: "Ed Morton" <mortons...@gmail.com>
Date: Wed, 19 Sep 2012 20:26:06 GMT
Local: Wed, Sep 19 2012 4:26 pm
Subject: Re: Deleting unwanted message headers from saved email (was: Re: Is awk suitable for this?)

Strange indeed! Can you install gawk? If tolower() and reasonable support for
"--version" are missing there's no telling what else might be less than ideal
about that awk version and gawk provides a lot of VERY useful additional
functionality.

     Ed.

Posted using www.webuse.net


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Deleting unwanted message headers from saved email" by Steve Hayes
Steve Hayes  
View profile  
 More options Sep 19 2012, 4:33 pm
Newsgroups: comp.lang.awk
From: Steve Hayes <hayes...@telkomsa.net>
Date: Wed, 19 Sep 2012 22:37:57 +0200
Local: Wed, Sep 19 2012 4:37 pm
Subject: Re: Deleting unwanted message headers from saved email
On Wed, 19 Sep 2012 12:56:13 +0100, dave.gma+news...@googlemail.com.invalid

(Dave Gibson) wrote:
>Steve Hayes <hayes...@telkomsa.net> wrote:
>> I'm not sure what that X-CC-Diagnistic thingy is. It seems big.

>The gibberish is the message body encoded as base64 -- it's not associated
>with a specific header.

I've just been checking some of the messages I've been trying to save.

These ones are hard to read and save:

Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64

These are not quite as hard to read or save, but still cause some problems:

Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

These ones are easy to read and save:

Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

The ones that are hardest to read and save appear to be produced by Windows
Live Mail.

Perhaps one could tell awk to delete such messages. Would it also be able to
convert "quoted printable" into something more readable?

--
Steve Hayes from Tshwane, South Africa
Blog: http://khanya.wordpress.com
E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Is awk suitable for this?" by The Natural Philosopher
The Natural Philosopher  
View profile  
 More options Sep 19 2012, 5:54 pm
Newsgroups: comp.lang.awk, comp.os.linux.misc
From: The Natural Philosopher <t...@invalid.invalid>
Date: Wed, 19 Sep 2012 22:54:32 +0100
Local: Wed, Sep 19 2012 5:54 pm
Subject: Re: Is awk suitable for this?

I wrote some stuff in awk once. I've even got the manual. It took me
longer to get it working than the replacement which I wrote in C....and
ran a lot slower.

--
Ineptocracy

(in-ep-toc’-ra-cy) – a system of government where the least capable to
lead are elected by the least capable of producing, and where the
members of society least likely to sustain themselves or succeed, are
rewarded with goods and services paid for by the confiscated wealth of a
diminishing number of producers.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
unruh  
View profile  
 More options Sep 19 2012, 8:07 pm
Newsgroups: comp.lang.awk, comp.os.linux.misc
From: unruh <un...@invalid.ca>
Date: Thu, 20 Sep 2012 00:07:13 GMT
Local: Wed, Sep 19 2012 8:07 pm
Subject: Re: Is awk suitable for this?
On 2012-09-19, Steve Hayes <hayes...@telkomsa.net> wrote:

> On Wed, 19 Sep 2012 17:11:08 GMT, unruh <un...@invalid.ca> wrote:

>>Capturing the full lines of the headers even if they stretched over more
>>than one line is more difficult and I am sure not going to spend time
>>thinking about it since the OP never said why he wanted this, or whether
>>it was more than simply a passing curiosity to him. You are welcome to
>>do it if you care to.

> I've been wanting to do something like this for 20 years, and when I saw AWK
> and its description I thought it might be able to do something like this, but
> I didn't see how.

Well, now you have seen how. But the chance that someone is going to
write the program for you is small. The headers are not the space
problem is saving emails. The body is. Even a very large header is going
to less than 1K, where the body these days is more like 1M or more.
So it is a pretty silly task (straining at the gnats while ignoring the
camel) to reduce the headers.

> When someone asked if awk could perform a somewhat similar task, and there
> appeared to be some awk fundis who knew how to make the thing work, I then
> asked if it could do what I wanted it to do - in other words remove extraneous
> headers from saved e-mail messages, which would make it easier to import them
> into a database.

> As I said elsewhere, in spite of having a version of awk lurking on my
> computer for 20 years or so, I've never known how to used it, and I'm a
> complete novice, but I hope to learn something from those who do know how to
> use it.

Buy the Awk book and read it.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Steve Hayes  
View profile  
 More options Sep 20 2012, 3:44 am
Newsgroups: comp.lang.awk, comp.os.linux.misc
Followup-To: za.flame
From: Steve Hayes <hayes...@telkomsa.net>
Date: Thu, 20 Sep 2012 09:48:57 +0200
Local: Thurs, Sep 20 2012 3:48 am
Subject: Re: Is awk suitable for this?

On Thu, 20 Sep 2012 00:07:13 GMT, unruh <un...@invalid.ca> wrote:
>On 2012-09-19, Steve Hayes <hayes...@telkomsa.net> wrote:
>> On Wed, 19 Sep 2012 17:11:08 GMT, unruh <un...@invalid.ca> wrote:
>Well, now you have seen how. But the chance that someone is going to
>write the program for you is small. The headers are not the space
>problem is saving emails. The body is. Even a very large header is going
>to less than 1K, where the body these days is more like 1M or more.
>So it is a pretty silly task (straining at the gnats while ignoring the
>camel) to reduce the headers.

It's not a space problem, it's a readability problem. Ten years after a
message has been sent, the routing information etc will be of little interest.

>> When someone asked if awk could perform a somewhat similar task, and there
>> appeared to be some awk fundis who knew how to make the thing work, I then
>> asked if it could do what I wanted it to do - in other words remove extraneous
>> headers from saved e-mail messages, which would make it easier to import them
>> into a database.

>> As I said elsewhere, in spite of having a version of awk lurking on my
>> computer for 20 years or so, I've never known how to used it, and I'm a
>> complete novice, but I hope to learn something from those who do know how to
>> use it.

>Buy the Awk book and read it.

That's quite an expensive exercise if it turns out that awk is, after all, not
suitable for the task.

I'm glad that not all awk users are as rude and unhelpful as you.

[follow ups set]

--
Steve Hayes from Tshwane, South Africa
Blog: http://khanya.wordpress.com
E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Re (7): Is awk suitable for this?" by no.top.p...@gmail.com
no.top.p...@gmail.com  
View profile  
 More options Sep 20 2012, 8:00 am
Newsgroups: comp.lang.awk, comp.os.linux.misc
From: no.top.p...@gmail.com
Date: Thu, 20 Sep 2012 12:00:53 +0000 (UTC)
Local: Thurs, Sep 20 2012 8:00 am
Subject: Re (7): Is awk suitable for this?
In article <pspci9xcs2....@perseus.wenlock-data.co.uk>, dave.gma+news...@googlemail.com.invalid (Dave Gibson) wrote:

> In comp.lang.awk, no.top.p...@gmail.com wrote:
> > In article <k2v9om$l3...@dont-email.me>, Ed Morton <mortons...@gmail.com> wrote:
> > --snip --

Dave Gibson's current scritp is:---

This non trivial task is part of a family-of-tasks that
we need to do all the time: clean out redundant/repeated stuff.

I've previously got some very usefull scripts from USEnet
collaborations, like:
  list all files in dir-tree $1
  which are less than are N-days old,
  and which contain String1,
  ...
  and which contain StringN

I use those scripts every day, and THIS one will be valuable too.
So, I've made a test-script, to help beta-test the versions.

The task is to delete repeated/redundant blocks of lines
[although as Ed Morton pointed out, awk is not limited to lines]
from text files.

Input files have the format of:  Ha|Hb|Hc|Hd|...Hn|
where H is the repeated/redundant block of text,
and a,b...n is the valuable text, to be kept,
and | represents a one-line-section-separator:
typically "<><><><>"

My test script which assembles a,d,..H blocks into the Infile,
by using the human-edited H block, gave the following results,
which should probably be ignored, for difficulty of understanding.

The test conclusions, so far, are that:
   if chars "(", "]", "[" are in the DeleteFile: H,
   this gives problems.

Are these special-chars for `bash` ?

Thanks,

== Chris Glur.

Copy existing files for: a, b, c, d, e
  use simple 1-char: H

-> ./BuildI == construct FileIn from parts:
 len-I = 713

-> cp H R

-> TstDG ==
len Infile = 713
len DeleteFile = 1
len ouTfile = 687
==> 713 - 687 == 26 <-- expect 713 - 4 == 699
===> perhaps extra 'H' files were found.
====> test with unusual type of 'H' file

-> echo qzxv >> H
-> ./BuildI == construct FileIn from parts:
len-I = 718
-> cp H R
-> TstDG ==
len Infile = 718
len DeleteFile = 2
len ouTfile = 710
==> 718 - 710 == 8 == 2*4 == OK

====> now use a big random: H
-> ./BuildI == construct FileIn from parts:
len-I = 1538
-> cp H R
-> ./TstDG ==
len Infile = 1538
len DeleteFile = 166
len ouTfile = 1538
==> suspect <special chars> in R

====> combine
-> ./BuildNtest ==
construct FileIn from parts:
len-I = 1538
len Infile = 1538
len DeleteFile = 166
len ouTfile = 1538
==> as expected/confirmed

==> keep problematic 'H' as Horg & edit out suspected line/s
==> Let 'H'contain NO-square-brackets.
-> ./BuildNtest ==
construct FileIn from parts:
len-I = 763
len Infile = 763
len DeleteFile = 11
len ouTfile = 708
==> 763 - 708 == 55 == 5*11
===> suspect that now-reduced: H appears 6-times in FileIn
==> Yes, with difficulty, an editor confirms: H appears 6-times

-> cp H Horg2
=> edit/select 'H' to contain line:"   [44]Gravatar"
-> ./BuildNtest ==
construct FileIn from parts:
len-I = 838
len Infile = 838
len DeleteFile = 26
len ouTfile = 838
=> confirm problem with chars: "[","]"
=?=> which one or both: == deleting BOTH still FAILS
=> iteratively delete-1st-line of H until notFAIL
--------------- one line deleted between tests ---------------
construct FileIn from parts:
len-I = 793
len Infile = 793
len DeleteFile = 17
len ouTfile = 793
bash-3.1# ./BuildNtest
construct FileIn from parts:
len-I = 783
len Infile = 783
len DeleteFile = 15
len ouTfile = 723
---------------------------------------------------------
=?=!=> the line that caused the FAIL:-
 Name (required)

==> adding an un-matched "(" 10 lines before end of 'H' causes:
-> ./BuildNtest ==
construct FileIn from parts:
len-I = 783
awk: DGscript:15: (FILENAME=I FNR=7) fatal: Unmatched ( or \(: /     * (/
len Infile = 783
len DeleteFile = 15
len ouTfile = 0

==> remove "(" & test "]"
-> ./BuildNtest ==
construct FileIn from parts:
len-I = 783
len Infile = 783
len DeleteFile = 15
len ouTfile = 783

=> replace with "[]"
->  DGscript ==
:15: (FILENAME=I FNR=7) fatal: Unmatched [ or [^: /     * []/
len Infile = 783
len DeleteFile = 15
len ouTfile = 0

====================


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "MIME-encoded messages in a digest (was: Re: Deleting unwanted message headers from saved email)" by Dave Gibson
Dave Gibson  
View profile  
 More options Sep 20 2012, 9:42 am
Newsgroups: comp.lang.awk
From: dave.gma+news...@googlemail.com.invalid (Dave Gibson)
Date: Thu, 20 Sep 2012 14:40:04 +0100
Local: Thurs, Sep 20 2012 9:40 am
Subject: MIME-encoded messages in a digest (was: Re: Deleting unwanted message headers from saved email)

They're standard MIME encodings intended to prevent message data being
corrupted in transit.

  <http://tools.ietf.org/html/rfc2045>

Your mail user agent should be able to convert them to local format
while saving them.

Have a look at these:

  <http://www.convertstring.com/EncodeDecode/Base64Decode>
  <http://www.convertstring.com/EncodeDecode/QuotedPrintableDecode>

> Perhaps one could tell awk to delete such messages.

Wouldn't you rather decode them?

  <http://www.fourmilab.ch/webtools/base64/>

Anyway, assuming messages are in a digest, separated by lines containing
the string "-- End --" and none of them are multipart messages.

#v+
----script begins on next line
/^-- End --/ {
  if (!b64)
    print
  body = 0
  b64 = 0
  next

}

b64 { next }

!body && /^$/ {
  for (n = 1; n <= hlines; n++)
    print header[n]
  hlines = 0
  body = 1  

}

body { print ; next }

/^[Cc][Oo][Nn][Tt][Ee][Nn][Tt]-[Tt][Rr][Aa][Nn][Ss][Ff][Ee][Rr]-[Ee][Nn][Cc ][Oo][Dd][Ii][Nn][Gg]: [Bb][Aa][Ss][Ee]64/ {
  b64 = 1
  hlines = 0
  next

}

{ header[++hlines] = $0 }
----script ends on previous line
#v-

>  Would it also
> be able to convert "quoted printable" into something more readable?

Perl has modules for dealing with various mail formats so may well be
better suited to your requirements.

#v+
----script begins on next line
BEGIN {
  hex["0"] = 0  ; hex["1"] = 1  ; hex["2"] = 2  ; hex["3"] = 3
  hex["4"] = 4  ; hex["5"] = 5  ; hex["6"] = 6  ; hex["7"] = 7
  hex["8"] = 8  ; hex["9"] = 9  ; hex["A"] = 10 ; hex["B"] = 11
  hex["C"] = 12 ; hex["D"] = 13 ; hex["E"] = 14 ; hex["F"] = 15
  for (n = 0 ; n <= 255; n++)
    ch[n] = sprintf("%c", n)

}

/^-- End --/ { qp = 0 ; body = 0 }

/^$/ { body = 1 }

!body && /^[Cc][Oo][Nn][Tt][Ee][Nn][Tt]-[Tt][Rr][Aa][Nn][Ss][Ff][Ee][Rr]-[Ee][Nn][Cc ][Oo][Dd][Ii][Nn][Gg]: [Qq][Uu][Oo][Tt][Ee][Dd]-[Pp][Rr][Ii][Nn][Tt][Aa][Bb][Ll][Ee]/ {
  qp = 1
  $NF = "8bit"

}

body && qp && /=/ {
  s = $0
  # Brackets '[', ']' on next line contain a space and a tab
  u = sub(/=[   ]*$/, "", s)
  t = ""
  while (match(s, /=[0-9A-F][0-9A-F]/)) {
    t = t substr(s, 1, RSTART - 1) \
        ch[hex[substr(s, RSTART + 1, 1)] * 16 + hex[substr(s, RSTART + 2, 1)]]
    s = substr(s, RSTART + RLENGTH)
  }
  $0 = t s
  if (u) {
    printf "%s", $0
    next
  }

}

{ print }
----script ends on previous line
#v-

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Deleting unwanted message headers from saved email" by Loki Harfagr
Loki Harfagr  
View profile  
 More options Sep 20 2012, 10:05 am
Newsgroups: comp.lang.awk
From: Loki Harfagr <l...@thedarkdesign.free.fr.INVALID>
Date: 20 Sep 2012 14:05:12 GMT
Local: Thurs, Sep 20 2012 10:05 am
Subject: Re: Deleting unwanted message headers from saved email
Wed, 19 Sep 2012 21:46:47 +0200, Steve Hayes did cat :

> On Wed, 19 Sep 2012 12:56:13 +0100, dave.gma+news...@googlemail.com.invalid
> (Dave Gibson) wrote:

>>  # Insert the following line here
>>/^-- End --/ { print ; body = 0 ; next }

> Brilliant, thanks.

>>The gibberish is the message body encoded as base64 -- it's not associated
>>with a specific header.

> Ah, yes, with that addition I can see that.

If you want some more fun you may like and try to finish that game
I started some time in the past but stopped right when it served my
needs ATM instead of making it fully RFC compliant ;-)
That was a shell wrapper for some awk parts which would encode or
decode base64 stuff.
At least it might hopefuly amuse some people here ;-)
------------
$ cat base64_in_awk.sh

###     not yet complete nor compliant ;-)
###
###     this is a simple base64 enc/dec in awk
###     mostly made for fun but actually used in a few awk scripts I use
###     in some other tools I wrote for fine grain analysis of
###     texts, mainly emails, mainly spams to try and generate some
###     synthetic regexps (or ideas of) regarding false positives or reinforcements.
###     (and yes I know some tools exist in perl and I even use some of
###     them which is another reason why I also do it in awk ;-)
###     the wrapping is set at 72 like the 'mimencode' usage.
###     (so, to avoid wrapping do set ORS to nil).
###
Aargh(){
        r=$1
        shift
        printf "\n%s\nThats all, folks...\n\n" "${@}"
        exit $r

}

[ $# -gt 1 ] || Aargh 1 "something is direly in the unseen world"
###     most used way by default, anyway as 2 parms are mandatory this is
### only belting the suspenders
WOT=${1:-d}
shift
###     gawk -v wot=$WOT -v ORS='' '
###     gawk -v wot=$WOT -v ORS='µ' '
gawk -v wot=$WOT '
function _ba64dec(_b64str,_BASE64,_wrap,_res,_ba,_by,_len,_i,_j)
{
        _len=split(_b64str,_ba,"")  
        while (_i<=_len){
                if( 0==(++_wrap) %72){++_i;continue}
                ###     get the 4 _bytes values and find their position in BASE64 base  
                for(_j=1;_j<5;_j++){
                        _by[_j] = index(_BASE64, _ba[++_i])
                        _by[_j]--
                }
                ###     Reconstruct ASCII string  
                _res = _res sprintf( "%c", lshift(and(_by[1], 63), 2) + rshift(and(_by[2], 48), 4) )
                _res = _res sprintf( "%c", lshift(and(_by[2], 15), 4) + rshift(and(_by[3], 60), 2) )
                _res = _res sprintf( "%c", lshift(and(_by[3], 3), 6) + _by[4] )
                gsub(/[\x00\xff\xbf\x0f]/,"",_res)
        }
        return _res
}

function _ord(_char, i)
{
        while(++i<256)       if (sprintf("%c", i) == _char)        return i

}

function _ba64enc(_b64str,_BASE64,_wrap, _ba1,_ba2,_ba3,_ba4,_by1,_by2,_by3,_by4, _res)
{
        while (length(_b64str) > 0){
                ###     find the values  
                _by1 = _ord(substr(_b64str, 1, 1))
                if (length(_b64str) == 1){
                        _by2 = 0
                        _by3 = 0
                }
                if (length(_b64str) == 2){
                        _by2 = _ord(substr(_b64str, 2, 1))
                        _by3 = 0
                }
                if (length(_b64str) >= 3){
                        _by2 = _ord(substr(_b64str, 2, 1))
                        _by3 = _ord(substr(_b64str, 3, 1))
                }

                ###     transform to BASE64 values  
                _ba1 = rshift(_by1, 2)
                _ba2 = lshift(and(_by1, 3), 4) + rshift(and(_by2, 240), 4)
                _ba3 = lshift(and(_by2, 15), 2) + rshift(and(_by3, 192), 6)
                _ba4 = and(_by3, 63)

                ###     transmute values to BASE64 string  
                _res = _res substr(_BASE64, _ba1 + 1, 1)
                _res = _res substr(_BASE64, _ba2 + 1, 1)
                if (length(_b64str) == 1){
                        _res = _res "=="
                        _b64str = ""
                }
                if (length(_b64str) == 2){
                        _res = _res substr(_BASE64, _ba3 + 1, 1)
                        _res = _res "="
                        _b64str = ""
                }
                if (length(_b64str) >= 3){
                        _res = _res substr(_BASE64, _ba3 + 1, 1)
                        _res = _res substr(_BASE64, _ba4 + 1, 1)
                        _b64str = substr(_b64str, 4)
                }
                if( 0==(++_wrap) %18) _res=_res ORS
        }
        return _res

}

BEGIN{_w=0}
{      
        ###     Base64 for filenames given as alternate example, see RFC4648
        ###     _BASE64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"
        _BASE64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
        print wot=="d"?_ba64dec($0,_BASE64,_w):_ba64enc($0,_BASE64,_w)
}

' ${@}
------------

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Is awk suitable for this?" by Aharon Robbins
Aharon Robbins  
View profile  
 More options Sep 20 2012, 12:49 pm
Newsgroups: comp.lang.awk, comp.os.linux.misc
From: arn...@skeeve.com (Aharon Robbins)
Date: Thu, 20 Sep 2012 16:48:59 +0000 (UTC)
Local: Thurs, Sep 20 2012 12:48 pm
Subject: Re: Is awk suitable for this?
Hello Steve.

The short answer is that indeed awk can do what you want.  The trick in
processing mail headers is that continuation lines are marked by a leading
space or tab *after* the header they are part of. You should also take into
account that header names are case insensitive - "To:", "to:" and "TO:" are
all the same.

You can see a fairly fancy program at http://www.skeeve.com/sendout3.ps.gz
which I wrote a long time ago - part of it processes Unix mailbox files,
including the headers.  (Note that it is quite old, and tailored just for
a personal situation. Also note that all of the example email addresses
in it are invalid.)

Converting quoted printable in awk is also not hard. Basically, the "="
sign either precedes a newline that was added, or is followed by two
hexadecimal digits indicating an encoded character.

You do not need to buy any awk book. The gawk documentation is available
on line, free of charge, in a variety of formats at

        http://www.gnu.org/software/gawk/manual/

Although I'm biased, I think this a great way to learn awk.

As a general plan of action, I recommend reading the gawk doc first,
in order to come up to speed on the language (hopefully in a gentle fashion :-)
and then attempting to write some code to do what you want.  Once you have
that, if it doesn't work, come back here to ask questions.

I also recommend using the latest released version of gawk.

Good luck,

Arnold

In article <f97k58hdtpqnct0ihqvhmobn15qb54s...@4ax.com>,
Steve Hayes  <hayes...@yahoo.com> wrote:

--
Aharon (Arnold) Robbins                         arnold AT skeeve DOT com
P.O. Box 354            Home Phone: +972  8 979-0381
Nof Ayalon              Cell Phone: +972 50 729-7545
D.N. Shimshon 99785     ISRAEL

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Deleting unwanted message headers from saved email" by Steve Hayes
Steve Hayes  
View profile  
 More options Sep 20 2012, 2:49 pm
Newsgroups: comp.lang.awk
From: Steve Hayes <hayes...@telkomsa.net>
Date: Thu, 20 Sep 2012 20:54:04 +0200
Local: Thurs, Sep 20 2012 2:54 pm
Subject: Re: Deleting unwanted message headers from saved email
On 20 Sep 2012 14:05:12 GMT, Loki Harfagr <l...@thedarkdesign.free.fr.INVALID>
wrote:

Wow, and I was just thinking of playing with something that might discard the
whole message if it had base64 stuff.

But when I've learnt a bit more of the basics I might try it.

--
Steve Hayes from Tshwane, South Africa
Blog: http://khanya.wordpress.com
E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Is awk suitable for this?" by Steve Hayes
Steve Hayes  
View profile  
 More options Sep 20 2012, 2:55 pm
Newsgroups: comp.lang.awk, comp.os.linux.misc
From: Steve Hayes <hayes...@telkomsa.net>
Date: Thu, 20 Sep 2012 21:00:12 +0200
Local: Thurs, Sep 20 2012 3:00 pm
Subject: Re: Is awk suitable for this?
On Thu, 20 Sep 2012 16:48:59 +0000 (UTC), arn...@skeeve.com (Aharon Robbins)
wrote:

It seems to be capable of doing a lot more than I imagined it could.

>You do not need to buy any awk book. The gawk documentation is available
>on line, free of charge, in a variety of formats at

>    http://www.gnu.org/software/gawk/manual/

>Although I'm biased, I think this a great way to learn awk.

I've got a book on Linux (actually a library book), which has a chapter on
gawk, and I've been re-reading it now that I've seen some samples of code and
what it does. But it's just a bare-bones summary.

>As a general plan of action, I recommend reading the gawk doc first,
>in order to come up to speed on the language (hopefully in a gentle fashion :-)
>and then attempting to write some code to do what you want.  Once you have
>that, if it doesn't work, come back here to ask questions.

>I also recommend using the latest released version of gawk.

I've probably got that in my Linux partition, but most of the things I want to
use it for are in DOS.

--
Steve Hayes from Tshwane, South Africa
Blog: http://khanya.wordpress.com
E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Aharon Robbins  
View profile  
 More options Sep 20 2012, 3:05 pm
Newsgroups: comp.lang.awk, comp.os.linux.misc
From: arn...@skeeve.com (Aharon Robbins)
Date: Thu, 20 Sep 2012 19:05:54 +0000 (UTC)
Local: Thurs, Sep 20 2012 3:05 pm
Subject: Re: Is awk suitable for this?

>>You can see a fairly fancy program at http://www.skeeve.com/sendout3.ps.gz
>>...

>It seems to be capable of doing a lot more than I imagined it could.

Yes. :-)

>I've got a book on Linux (actually a library book), which has a chapter on
>gawk, and I've been re-reading it now that I've seen some samples of code and
>what it does. But it's just a bare-bones summary.

Invest some time in the gawk doc. I think it will return your investment.

>>I also recommend using the latest released version of gawk.

>I've probably got that in my Linux partition, but most of the things I want to
>use it for are in DOS.

See http://sourceforge.net/projects/ezwinports/ for MS-Windows binaries that
will run from a DOS prompt.

If you mean honest-to-goodness actual MS-DOS, then getting a version for
it will be harder. I believe that current sources can be compiled with
DJGPP, but I don't know if that will get you what you want.

You can probably find something on the Internet, but it's likely to be
an older version, and often such versions have bugs...  So, Caveat Emptor. :-)

Good luck,

Arnold
--
Aharon (Arnold) Robbins                         arnold AT skeeve DOT com
P.O. Box 354            Home Phone: +972  8 979-0381
Nof Ayalon              Cell Phone: +972 50 729-7545
D.N. Shimshon 99785     ISRAEL


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Re (7): Is awk suitable for this?" by Dave Gibson
Dave Gibson  
View profile  
 More options Sep 20 2012, 5:06 pm
Newsgroups: comp.lang.awk, comp.os.linux.misc
Followup-To: comp.lang.awk
From: dave.gma+news...@googlemail.com.invalid (Dave Gibson)
Date: Thu, 20 Sep 2012 22:02:27 +0100
Local: Thurs, Sep 20 2012 5:02 pm
Subject: Re: Re (7): Is awk suitable for this?
[ Followup-To: set to comp.lang.awk ]

bufpos is global, discard and n are only visible within that function.

>> NR == FNR {
>>   delete_list[++delmax] = $0  \\ 1stReadFile -> delete_list: ARRAY
>>   next
>> }
>>                                  \\ AFTER 1stReadFile DONE & 2ndReadFile
>> $0 ~ delete_list[bufpos + 1] {  \\ IF CurrentLine ~

The ~ is awk's match operator.

Maybe think of it as:

  IF RegexCompare(CurrentLine, delete_list[bufpos + 1]) THEN

>>   buffer[++bufpos] = $0
>>   if (bufpos >= delmax)
>>     flush_buffer(blocks_seen++)
>>   next
>> }

>> bufpos {
>>   flush_buffer(0)
>> }

That's a bug.  It's necessary to check whether $0 matches delete_list[1]
(and restart buffering if it does) after flushing the buffer.

>> { print }

>> END {
>>   flush_buffer(0)
>> }
>> ----script ends on previous line
> The test conclusions, so far, are that:
>   if chars "(", "]", "[" are in the DeleteFile: H,
>   this gives problems.

They are regular expression metacharacters with special meaning to
awk's match operator.

Here's the fixed version of the script:

----script begins on next line
#! /usr/bin/awk -f

function flush_buffer(discard,     n) {
  if (!discard)
    for (n = 1; n <= bufpos; n++)
      print buffer[n]
  bufpos = 0

}

function try_seq(s) {
  if (s ~ delete_list[bufpos + 1]) {
    buffer[++bufpos] = s
    if (bufpos >= delmax)
      flush_buffer(blocks_seen++)
    return 1
  }
  return 0

}

NR == FNR {
  delete_list[++delmax] = $0
  next

}

try_seq($0) { next }

bufpos {
  flush_buffer(0)
  if (try_seq($0))
    next

}

{ print }

END {
  flush_buffer(0)

}

----script ends on previous line

The script works by loading the first file's contents into an
array.  The array is a sequence of regular expressions.

The second file is scanned for sequences of lines in which each
line matches the corresponding entry in the array of regular
expressions.

When a complete sequence of matches is made the matched lines are
discarded if they are not the first occurrence of a match-sequence.

Input file 1 (FileDelete2) contains three lines:
a
b
[cz]

Input file 2 (FileIn2) contains 17 lines:
1 : nothing
2 : a           MATCHES THE FIRST PATTERN IN THE SEQUENCE
3 : b           MATCHES THE SECOND PATTERN IN THE SEQUENCE
4 : k
5 : a 2         SEQUENCE BEGINS
6 : b 2
7 : c 1         FULL SEQUENCE MATCHES (FIRST TIME: 5,6,7 PRINTED)
8 : a 3         SEQUENCE BEGINS
9 : a 4         SEQUENCE FAILS, LINE 8 PRINTED, NEW SEQUENCE BEGINS
10: b 3
11: c 2         FULL SEQUENCE MATCHES (NOT FIRST TIME: 9,10,11 OMITTED)
12: NO MATCH
13: c 3         OUT OF SEQUENCE, NO MATCH
14: a 5         FIRST IN SEQUENCE
15: b 4         SECOND IN SEQUENCE
16: z 1         THIRD IN SEQUENCE (14, 15, 16 DROPPED)
17: example     SEQUENCE BEGINS, FAILS DUE TO END-OF-INPUT, 17 PRINTED

The command

  awk -f the_above_script FileDelete2 FileIn2

Will print lines 1 to 8, 12, 13 and 17.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "LP sample - was: Is awk suitable for this?" by Manuel Collado
Manuel Collado  
View profile  
 More options Sep 21 2012, 4:48 am
Newsgroups: comp.lang.awk, comp.os.linux.misc
From: Manuel Collado <m.coll...@domain.invalid>
Date: Fri, 21 Sep 2012 10:47:56 +0200
Local: Fri, Sep 21 2012 4:47 am
Subject: [OT] LP sample - was: Is awk suitable for this?
El 20/09/2012 18:48, Aharon Robbins escribió:

> ...
> You can see a fairly fancy program at http://www.skeeve.com/sendout3.ps.gz
> which I wrote a long time ago ...

It seems that this file is a weaved noweb Literate Programming document.
¿Is the noweb source code also available? I'm still interested on LP,
and there are very few real LP examples in the net.

Thanks,
--
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Re (7): Is awk suitable for this?" by f...@informatik.uni-bremen.de
f...@informatik.uni-bremen.de  
View profile  
 More options Sep 28 2012, 3:23 pm
Newsgroups: comp.lang.awk, comp.os.linux.misc
From: f...@informatik.uni-bremen.de
Date: Fri, 28 Sep 2012 19:23:46 +0000 (UTC)
Local: Fri, Sep 28 2012 3:23 pm
Subject: Re: Re (7): Is awk suitable for this?
In article <3scsi9xcs3....@perseus.wenlock-data.co.uk>, dave.gma+news...@googlemail.com.invalid (Dave Gibson) wrote:

Someone contaminated my thread.

lets move this to:
Newsgroups: comp.lang.awk,comp.os.linux.misc
Subject: awk: DeleteRepeatingTextBlocks

Let's try to add-value for the *nix community by revealing
methods which others can modify and use for their problems.

===========

> >>       print buffer[n]
> >>   bufpos = 0  \\ local var
> >> }

> bufpos is global, discard and n are only visible within that function.

-> man awk | grep bufpos == <empty>
So 'bufpos' is not a reserved-word [mentioned in man].
So how does awk's syntax make it global, whereas
'discard', 'n' are local.  I see 'bufpos' further in the code.
=================

> The script works by loading the first file's contents into an
> array.  The array is a sequence of regular expressions.

"loading the first file's contents into an array."
is an action intended to help achieve a HIGHER goal.
It's better to state the higher goal FIRST.

It's called top-down-design.
The implementation details, which must be bottom-up,
are best not explained until the top-down-design is
known.  Here's my STRUCTURED view:

SpeedUp knowledge absorbtion from http-fetched text
   Delete noise/distracting repeated garbage
     Identify garbage
       Must be done by human intelligence
        use a standard editor -- while reading/studying the text
     Automate the removal of further garbage-repeats
       Search the InFile for further matches of human-identified-trash

==> PS. I'm writing this WHILE I'm trying to decode your explanation.
The decomposition-chain is: Delete needs Match needs Regex.

You are going to compare the DeleteFile with the InFile.
To handle the regex requirement, you are building
"an array of regular expressions".
Apparently to <match the array with InFile parts> ?

My test results for your new script are dumb, since I have no
intermediate output traces.
See Subject: awk: DeleteRepeatingTextBlocks

Thanks,

== Chris Glur.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kaz Kylheku  
View profile  
 More options Sep 28 2012, 7:14 pm
Newsgroups: comp.lang.awk, comp.os.linux.misc
From: Kaz Kylheku <k...@kylheku.com>
Date: Fri, 28 Sep 2012 23:14:46 +0000 (UTC)
Local: Fri, Sep 28 2012 7:14 pm
Subject: Re: Re (7): Is awk suitable for this?
On 2012-09-28, f...@informatik.uni-bremen.de <f...@informatik.uni-bremen.de> wrote:

>      Identify garbage

Easy: pretty much everything posted by the incompetent originator of this
retarded thread.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Keith Keller  
View profile  
 More options Sep 28 2012, 10:14 pm
Newsgroups: comp.lang.awk, comp.os.linux.misc
Followup-To: comp.os.linux.misc
From: Keith Keller <kkeller-use...@wombat.san-francisco.ca.us>
Date: Fri, 28 Sep 2012 19:12:52 -0700
Local: Fri, Sep 28 2012 10:12 pm
Subject: Re: Re (7): Is awk suitable for this?
["Followup-To:" header set to comp.os.linux.misc.]

On 2012-09-28, f...@informatik.uni-bremen.de <f...@informatik.uni-bremen.de> wrote:

> Let's try to add-value for the *nix community by revealing
> methods which others can modify and use for their problems.

I think the best way to add value for the *nix community is for you to
stop asking these poorly phrased questions, and for the rest of us to
stop answering them.

--keith

--
kkeller-use...@wombat.san-francisco.ca.us
(try just my userid to email me)
AOLSFAQ=http://www.therockgarden.ca/aolsfaq.txt
see X- headers for PGP signature information


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Chick Tower  
View profile  
 More options Sep 29 2012, 3:41 pm
Newsgroups: comp.lang.awk, comp.os.linux.misc
From: Chick Tower <c.to...@deadspam.com>
Date: Sat, 29 Sep 2012 19:41:41 +0000 (UTC)
Local: Sat, Sep 29 2012 3:41 pm
Subject: Re: Re (7): Is awk suitable for this?
On 2012-09-28, Kaz Kylheku <k...@kylheku.com> wrote:

> On 2012-09-28, f...@informatik.uni-bremen.de <f...@informatik.uni-bremen.de> wrote:
>>      Identify garbage

> Easy: pretty much everything posted by the incompetent originator of this
> retarded thread.

That was him using another pseudonym.
--
                                 Chick Tower

For e-mail:  colm DOT sent DOT towerboy AT xoxy DOT net


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages < Older 
« Back to Discussions « Newer topic     Older topic »