procmail help! $MATCH grabs newline too

25 views
Skip to first unread message

Ronald Fischer

unread,
Nov 2, 2003, 9:10:29 AM11/2/03
to
In a procmail receipt, I would like to autogenerate a reply and put the
original subject in the subject of my reply mail. This is what I have
so far:

:0
.... various conditions
* ^Subject: *\/.*$
{
REPLYSUBJECT="Your mail was rejected (Re: $MATCH)"
LOG="+++ Rejecting mail ($MATCH)"
:0
|(formail ....)|$SENDMAIL -t
}
}

To my surprise I found that $MATCH also contained the closing newline character!
The logfile says for example:

+++ Rejecting mail (ORIGINAL SUBJECT
)

and using VERBOSE=yes shows that the assignment to REPLYSUBJECT also contains
a newline in front of the closing parenthese. Probably as a consequence of
this, I get the error message in the logfile

procmail: Error while writing to "(formail -r -A ... etc.

which is not surprisingly, because the extra newline causes the
mail to be malformed.

Is procmail broken here, or did I misunderstand something?

Im running procmail v3.11pre7 under SUSE Linux 6.2

Ronald

DWT

unread,
Nov 2, 2003, 11:04:36 AM11/2/03
to
ron...@eml.cc (Ronald Fischer) wrote in
<219750c.0311...@posting.google.com> that he has this code:

* ^Subject: *\/.*$

yet was surprised to find the newline included in $MATCH.

In procmail regexps, ^ and $ match the newline character, not the end-of-
the-line position as in other utilities that use regular expressions.

Leave out the dollar sign. To the right of "\/" procmail matches greedily,
so you'll get the rest of the line with just ".*" and no dollar sign.

Also, consider using ".+" rather than ".*" there so that you don't get a
match if the Subject: header is empty. Actually, this is the best way to do
it (the first brackets enclose space and tab, and the second pair enclose
caret, space, tab), so that your extraction begins with the first non-blank
character and happens only if there is one:

* ^Subject:[ ]*\/[^ ].*

| Is procmail broken here, or did I misunderstand something?

The latter.

--
David W. Tamkin

The Reply-To: address expires at midnight US Central Time on 09Nov2003.

Ronald Fischer

unread,
Nov 2, 2003, 12:12:02 PM11/2/03
to
Thank you very much for your explanation. Looks as if I am to "perlish"
with my approach to regexp's....

Ronald
--
To reduce spam in my inbox, the address given in the Reply-To: header is
not guaranteed to live longer than 1 month after the article was
posted. My permanent address is (after deleting the XXX):
Ronald Otto Valentin Fischer <rov...@operamail.com>

DWT

unread,
Nov 2, 2003, 1:39:52 PM11/2/03
to
ro...@feriasonline.pt wrote in <m2ekwqz...@karfiol.glp.de>:

| Thank you very much for your explanation. Looks as if I am to "perlish"
| with my approach to regexp's....

You're welcome, but I think most people would feel that procmail's approach
isn't perlish enough.

Nancy McGough

unread,
Nov 3, 2003, 6:53:12 AM11/3/03
to
On 2 Nov 2003 DWT (nobody@[127.0.0.1]) wrote:
>
> In procmail regexps, ^ and $ match the newline character, not the end-of-
> the-line position as in other utilities that use regular expressions.

Thanks for this info David -- I didn't realize this about
Procmail and I just updated my section about regular expressions,
which is here

<http://www.ii.com/internet/robots/procmail/qs/#RE>

so it now discusses this.

I have a question: Does this mean that ^ and $ are equivalent in
Procmail regular expressions? I.e., can they be used
interchangeably?

Thanks,
Nancy

--
Nancy McGough
Infinite Ink ~ <http://www.ii.com>
Deflexion & Reflexion ~ <http://deflexion.com>

Sven Guckes

unread,
Nov 3, 2003, 8:00:54 AM11/3/03
to
* Nancy McGough <nm-reverse-...@ii.deflexion.com>:

> On 2 Nov 2003 DWT (nobody@[127.0.0.1]) wrote:
>> In procmail regexps, ^ and $ match the newline character, not the end-of-
>> the-line position as in other utilities that use regular expressions.
>
> Does this mean that ^ and $ are equivalent in Procmail regular
> expressions? I.e., can they be used interchangeably?

no - definitely not.
'^' is still BOL and
'$' is for EOL.

Sven

David W. Tamkin

unread,
Nov 3, 2003, 9:33:22 AM11/3/03
to
When Nancy McGough <nm-reverse-...@ii.deflexion.com> asked:

M> Does this mean that ^ and $ are equivalent in Procmail regular
M> expressions? I.e., can they be used interchangeably?

Sven Guckes <usenet...@guckes.net> replied incorrectly in
<2003-11-0...@guckes.net>:

G> no - definitely not.
G> '^' is still BOL and
G> '$' is for EOL.

Sorry, Sven; while that's true for most regexp engines -- procmail might be
the only exception -- it is not for procmail. To match a real or putative
newline, even in the middle of a pattern, procmail will happily use either
a caret or a dollar sign.

Both of those punctuation marks have additional uses in procmail's regexp
engine, but to match a newline, either works.

Sven Guckes

unread,
Nov 3, 2003, 3:16:41 PM11/3/03
to
* David W. Tamkin <nobody@[>:

> When Nancy McGough <nm-reverse-...@ii.deflexion.com> asked:
> M> Does this mean that ^ and $ are equivalent in Procmail regular
> M> expressions? I.e., can they be used interchangeably?
>
> Sven Guckes <usenet...@guckes.net> replied incorrectly:

> G> no - definitely not.
> G> '^' is still BOL and
> G> '$' is for EOL.
>
> Sorry, Sven; while that's true for most regexp engines -- procmail
> might be the only exception -- it is not for procmail. To match
> a real or putative newline, even in the middle of a pattern,
> procmail will happily use either a caret or a dollar sign.

so "^^" is the same as "$$" and "^$" or "$^"?

> Both of those punctuation marks have additional uses in
> procmail's regexp engine, but to match a newline, either works.

what about matches with "^TOfoo" then?

Sven

David W. Tamkin

unread,
Nov 3, 2003, 4:10:14 PM11/3/03
to
As I said, Sven,

| > Both of those punctuation marks have additional uses in
| > procmail's regexp engine,

... which are not the same as the other's, so there are things a caret can do
that a dollar sign cannot and vice versa ...

| > but to match a newline, either works.

And that's still true. Your original statement that ^ matches only at BOL
and $ only at EOL is still incorrect for procmail.

Note that ^^ matches only a putative newline, not a real one.

Timo Salmi

unread,
Nov 4, 2003, 12:30:56 AM11/4/03
to
Nancy McGough <nm-reverse-...@ii.deflexion.com> wrote:
> On 2 Nov 2003 DWT (nobody@[127.0.0.1]) wrote:
> > In procmail regexps, ^ and $ match the newline character, not the end-of-
> > the-line position as in other utilities that use regular expressions.

I beg you pardon? There are countless recipes where ^ matches the
start of the line:

* ^Subject: whatever

As for $ it has a special meaning, at least at the start of the
line. It means evaluate the variables on the line (or something like
that).

> Procmail and I just updated my section about regular expressions,
> which is here
> <http://www.ii.com/internet/robots/procmail/qs/#RE>
> so it now discusses this.

You might wish to do some further testing to be sure, e.g. as per
http://www.uwasa.fi/~ts/info/proctips.html#testbench

Test e.g these two under varying circumstances (e.g. as the very
first line of the test message)
* ^Subject: whatever
* Subject: whatever

If course there are procmail-specific cases like ^^whatever^^

All the best, Timo

--
Prof. Timo Salmi ftp & http://garbo.uwasa.fi/ archives 193.166.120.5
Department of Accounting and Business Finance ; University of Vaasa
mailto:t...@uwasa.fi <http://www.uwasa.fi/~ts/> ; FIN-65101, Finland
Timo's FAQ materials at http://www.uwasa.fi/~ts/http/tsfaq.html

DWT

unread,
Nov 4, 2003, 11:30:52 AM11/4/03
to
t...@UWasa.Fi (Timo Salmi) wrote in <bo7dig$p...@poiju.uwasa.fi>:

| > On 2 Nov 2003 DWT (nobody@[127.0.0.1]) wrote:
| > > In procmail regexps, ^ and $ match the newline character, not the end-of-
| > > the-line position as in other utilities that use regular expressions.

| I beg you pardon? There are countless recipes where ^ matches the
| start of the line:

| * ^Subject: whatever

It looks that way, Timo, but procmail's regexp engine is different from
the others. If you used the same expression in grep, egrep, vi, perl, or
what-have-you, the caret would be matching the position at the beginning of
the line, but in procmail it needs an actual character. That caret is
actually matching the newline that precedes the line that starts
"Subject: whatever".

In order to make ^ and $ match the beginning and end of the search area,
procmail imagines an additional newline at the very beginning and another
at the very end. We call those the putative newlines; a putative newline
can be matched by a caret, a dollar sign, or two carets in the regexp.
The double caret is a special procmailism that will match only a putative
newline and not a real one.

--
David W. Tamkin

The Reply-To: address expires at midnight US Central Time on 11Nov2003.

Reply all
Reply to author
Forward
0 new messages