[40tude] Edit -> Find regular expression?

2 views
Skip to first unread message

mike

unread,
Jan 18, 2021, 11:33:56 PMJan 18
to
When searching for a character to replace in 40Tude Dialog, is there a way
to search for more than one disjoint character at a time?

As an example I might want to replace all starting left side curly
doublequotes and all ending right side curly doublequotes with straight
doublequotes.

Can the 40Tude Dialog "Edit -> Find" be made to see both types of opening
and ending doublequotes at the same time to replace both types with
straight quotes in a single Find command?

VanguardLH

unread,
Jan 19, 2021, 3:41:37 AMJan 19
to
Why do you think Dialog's Edit -> Find function has a replace operation?
It just finds. It does not replace. To replace means another field
would have to be included to show with just what to replace the found
substring. Edit -> Find has only one input field: "Text to find".
There is no "[Text to] replace with" field.

Or do you mean you'll use the Find function, and then manually edit the
found string? You could do a regex search using:

“.*”


The dot (.) means any character, and the * means zero, or more, of any
character. The above would find:

“”
“X”
“abc def. Right you are!”

However, regex rules are plain text, so you won't be able to add the
curly quotes to a rule. You'll need to use the encoded or numeric value
for the characters. My recollection is regex in Dialog uses the PCRE
variant.

https://www.regular-expressions.info/unicode.html

You would have to replace the curly quotes with their Unicode
equivalents. From the above article, PCRE does not handle \uFFFF to
specify the hexidecimal number for a Unicode character. Instead you use
\x{FFFF} (include the curly braces).

201C = left curly quote character
201D = right curly quote character

So, my guess is you would use the following regex to find (not replace)
any string of zero, or more, characters where the order is left curly,
followed by zero or more characters, followed by a right curly, and
probably looks like:

\x{201C}.*\x{201D}

Since all the headers are ASCII characters (I think there is some
encoding prefix identifier for the Subject header, but the string itself
is all ASCII), you must be trying to find the curly quoted strings in
the body of message. Make sure when you use Edit -> Find that you
select the "Article body pane" tab in the Find dialog.

I don't know of any posts that use the double curly quote characters,
but as a test I did a Find on the straight double quote characters, like
the "Edit -> Find" string in your post. I opened the Find dialog,
selected the "Article body pane" tab, and searched on:

\x{0022}.*\x{0022}

0022 is the hex Unicode value of the double straight quote ("). It
found your string in your post. I could've used " to make it easier,
but I wanted to test using the Unicode encoded format to specify the
straight double quote character.

mike

unread,
Jan 20, 2021, 7:07:44 PMJan 20
to
wrote:

> Since all the headers are ASCII characters (I think there is some
> encoding prefix identifier for the Subject header, but the string itself
> is all ASCII), you must be trying to find the curly quoted strings in
> the body of message. Make sure when you use Edit -> Find that you
> select the "Article body pane" tab in the Find dialog.

Yes. Body. When I cut and paste from various sources (usually web pages) I
want consistency in the quoting so that every cut and paste uses the
simplest consistent type of straight quotes and single dashes (why they
think we need even bigger dashes is lost on me).

Here's an example of something I might cut and paste into the body
https://www.lifewire.com/typing-quotes-apostrophes-and-primes-1074104
https://typographyforlawyers.com/straight-and-curly-quotes.html
https://usefulangle.com/post/217/html-curly-quotes
https://www.enotes.com/homework-help/what-are-two-quotes-from-curley-in-of-mice-and-men-302038

> Why do you think Dialog's Edit -> Find function has a replace operation?
> It just finds. It does not replace. To replace means another field
> would have to be included to show with just what to replace the found
> substring. Edit -> Find has only one input field: "Text to find".
> There is no "[Text to] replace with" field.
> Or do you mean you'll use the Find function, and then manually edit the
> found string?

My mistake for not being clear that I press control F to find and then I
repeatedly press F3 as many times as needed to find more, and then as it is
finding these things, I replace manually by typing the straight doublequote
(or the straight singlequote) to replace the curly characters (and then move
on using F3 until I get to the end).

> You could do a regex search using:
>
> ´.*¡
>
> The dot (.) means any character, and the * means zero, or more, of any
> character. The above would find:
>
> ´¡
> ´X¡
> ´abc def. Right you are!¡
> However, regex rules are plain text, so you won't be able to add the
> curly quotes to a rule. You'll need to use the encoded or numeric value
> for the characters. My recollection is regex in Dialog uses the PCRE
> variant.
> https://www.regular-expressions.info/unicode.html
>
> You would have to replace the curly quotes with their Unicode
> equivalents. From the above article, PCRE does not handle \uFFFF to
> specify the hexidecimal number for a Unicode character. Instead you use
> \x{FFFF} (include the curly braces).
>
> 201C = left curly quote character
> 201D = right curly quote character
>
> So, my guess is you would use the following regex to find (not replace)
> any string of zero, or more, characters where the order is left curly,
> followed by zero or more characters, followed by a right curly, and
> probably looks like:
>
> \x{201C}.*\x{201D}

When I typed "control + f" and then pasted that "\x{201C}.*\x{201D}"
(without the quotes), it didn't find the curly quotes.

What's the key sequence to enter those special characters?

> I don't know of any posts that use the double curly quote characters,
> but as a test I did a Find on the straight double quote characters, like
> the "Edit -> Find" string in your post. I opened the Find dialog,
> selected the "Article body pane" tab, and searched on:
>
> \x{0022}.*\x{0022}

I pasted something from here
https://typographyforlawyers.com/straight-and-curly-quotes.html
and then I tried to type what you suggested but I must be missing a special
character as I cut and pasted exactly what you wrote above and it found only
what you wrote above but not what I pasted.

control
f
(then I pasted)\x{0022}.*\x{0022}
(then I pressed)OK

Dialog said - Search term "\x{0022}.*\x{0022}" not found.

> 0022 is the hex Unicode value of the double straight quote ("). It
> found your string in your post. I could've used " to make it easier,
> but I wanted to test using the Unicode encoded format to specify the
> straight double quote character.

Can you just let me know HOW to type the \x{0022} stuff?

I must be missing a key on my keyboard as I pasted it exactly as you wrote
it but it found only itself. I must have missed a critical step.

VanguardLH

unread,
Jan 20, 2021, 7:53:15 PMJan 20
to
When you hit F to bring up the Find dialog, and made sure to pick the
"Article body pane" tab, did you also make sure to select the "Regular
expressions" option? Without that option, you would be searching on the
ASCII string of \x{201C} instead of a searching by a regex specifying an
escaped x followed by the braced numeric value for the Unicode char.

F (to show Find dialog).
Select "Article body" tab.
Select the "Regular expressions" option.
Enter "\x{201C}.*\x{201D}" (sans quotes) in the "Text to find" field.
Pick where to search: selected groups, all groups, selected body.

The search proceeds forward, not from the top. If you are past the
group or message with the doubled quotes, the search finds the *next*
article, if any, that has those characters. The search does not loop
around back to the top after reaching the bottom of the list.

mike

unread,
Jan 21, 2021, 8:15:05 AMJan 21
to
VanguardLH wrote:

> When you hit F to bring up the Find dialog, and made sure to pick the
> "Article body pane" tab, did you also make sure to select the "Regular
> expressions" option? Without that option, you would be searching on the
> ASCII string of \x{201C} instead of a searching by a regex specifying an
> escaped x followed by the braced numeric value for the Unicode char.
>
> F (to show Find dialog).
> Select "Article body" tab.
> Select the "Regular expressions" option.
> Enter "\x{201C}.*\x{201D}" (sans quotes) in the "Text to find" field.
> Pick where to search: selected groups, all groups, selected body.
>
> The search proceeds forward, not from the top. If you are past the
> group or message with the doubled quotes, the search finds the *next*
> article, if any, that has those characters. The search does not loop
> around back to the top after reaching the bottom of the list.

Thank you for explaining there are two different "Find" boxes, where below I
will call one of them the "short" Find Box & the other the "long" Find Box.

(1) In a browser I visit this page & copy the Demo sentence to my clipboard
https://www.lifewire.com/typing-quotes-apostrophes-and-primes-1074104
´Curly quotes¡ look better than "straight quotes"
(2) I then read your message & pressed F to followup to your message
(3) That followup takes up the entire Dialog window (on top of the "panes")
(4) I pasted that Demo sentence into the article body of that followup
(5) I hit control + f to bring up the Dialog "short" Find Box
(6) There is no option for "Regular Expression" in that "short" Find Box.
Text to find
Match case
Whole words
Backward search
(7) I hit Action -> Save and close draft
(8) I go back to the three dialog panes (group, messages, body)
(9) When hitting control F after putting the cursor in the resultant body
pane of the three pane Dialog window I get a "long" Find Box.
Find text in...
[Group list pane][Article header pane][Article body pane]<<<<
Text to find:
Options:
Match case
Whole words
Regular expression <<<<
Reset View first

Scope:
Selected group
All groups
Selected body only <<<<
(10) I select "Article body pane"
(11) I select "Regular expression"
(12) The default is "Selected body only"
(13) I paste this into "Text to find:" \x{201C}.*\x{201D}
(14) That selects the first curlyquoted half of the Demo sentence.
(15) I copy just the opening curly doublequote character into my clipboard.

But since I'm not in an editable window, I can't fix anything using that
"long" Find Box. I can only fix text when using the "short" Find Box.

I go back to my editable window in my Drafts folder and bring up the "short"
Find Box, which I then paste the previously copied opening curly doublequote
into that "short" Find Box and hit "OK".

This finds and selects the opening curly doublequote where I can then type a
straight doublequote to replace it. Since I had put the cursor at the top of
the message, I successively hit F3 to keep finding any more instances, and
then to replace them I just type the straight doublequote.

I do the same for the closing curly doublequote (replacing it with a
straight doublequote by typing it once), which is pretty much exactly what I
was doing all along unfortunately.

This is the result
"Curly quotes" look better than "straight quotes"

mike

unread,
Jan 21, 2021, 8:28:49 AMJan 21
to
wrote:

> (1) In a browser I visit this page & copy the Demo sentence to my clipboard
> https://www.lifewire.com/typing-quotes-apostrophes-and-primes-1074104

I made a mistake on the URL that I copied the "Demo" sentence from!

The "Demo" sentence came from this web page.
https://usefulangle.com/post/217/html-curly-quotes

I'm not a programmer but do you think a Dialog script might be able to
substitute curly quotes to straight quotes automatically just before sending
the message to the nntp server?

Bernd Rose

unread,
Jan 21, 2021, 11:57:04 AMJan 21
to
On Thu, 21nd Jan 2021 18:58:47 +0530, mike wrote:

> I'm not a programmer but do you think a Dialog script might be able to
> substitute curly quotes to straight quotes automatically just before sending
> the message to the nntp server?

Shouldn't be too hard to adjust this script:

http://web.archive.org/web/20120127150224/http://dialog.datalist.org/scripts/ScriptreplaceUmlaut.html

/If/ you have other conversation scripts installed, that deal with charset
manipulation, you may need a more sophisticated approach. Then I suggest
you to take the boxquote script as basis:

http://4d.vollmeier.at/scripte/ereignisscripte/onbeforesendingmessage/boxquote.html

HTH.
Bernd

VanguardLH

unread,
Jan 21, 2021, 5:00:48 PMJan 21
to
mike <th...@address.is.invalid> wrote:

> But since I'm not in an editable window, I can't fix anything using that
> "long" Find Box. I can only fix text when using the "short" Find Box.

Didn't know you were in compose mode when using the short Find dialog.
Yeah, that doesn't let you specify using regex to let you use Unicode to
find characters in your compose window.

Seems if you're pasting into your compose window, that you could first
paste into something else, like a word processor, where you could
specify the string to find and what to replace it with, copy the result,
and paste that into your compose window.

mike

unread,
Jan 21, 2021, 5:44:22 PMJan 21
to
Bernd Rose wrote:

> Shouldn't be too hard to adjust this script:
> http://web.archive.org/web/20120127150224/http://dialog.datalist.org/scripts/ScriptreplaceUmlaut.html

That script looks good since it removes things like the umlaut:
http://web.archive.org/web/20120127150224/http://dialog.datalist.org:80/scripts/ScriptreplaceUmlaut.html

(1) That URL has a minor syntax error (it's missing a single quote):
CHANGE THIS: s:-StringReplace(s,'x,';yy);
TO THIS: s:-StringReplace(s,'x',';yy);

(2) I already have an OnBeforeSending screipt, so since I'm NOT a
programmer, I didn't want to mess with the existing
OnBeforeSending script, so I added that script you found as an
OnBeforeSavingMessage script instead (with the quick syntax fix above,
so that it compiled and saved).

(3) I then tested it using a cut and paste of German language web pages
https://www.studying-in-germany.org/german-umlauts/
https://blogs.transparent.com/german/umlauted-vowels-next-to-ordinary-vowels/
https://howtotypeanything.com/umlaut-letters/

My problem is when I press "Action -> Save & close draft", I expected
this "OnBeforeSavingMessage" script to change the umlaut to ue before
saving the message as a draft - but it didn't change anything.

Do you know a test to invoke the "OnBeforeSavingMessage" script that is
better than "Action -> Save & close draft".

What I do I need to do to invoke the OnBeforeSavingMessage script below
so that when I save the message, the script removes the special characters?
--
program OnBeforeSavingMessage;

function StringReplace(S, OldPattern, NewPattern: string): string;
var
SearchStr, Patt, NewStr: string;
Offset: Integer;
begin
SearchStr := S;
Patt := OldPattern;
NewStr := S;
Result := '';
while SearchStr <> '' do
begin
Offset := AnsiPos(Patt, SearchStr);
if Offset = 0 then
begin
Result := Result + NewStr;
Break;
end;
Result := Result + Copy(NewStr, 1, Offset - 1) + NewPattern;
NewStr := Copy(NewStr, Offset + Length(OldPattern), 2147483647);
SearchStr := Copy(SearchStr, Offset + Length(Patt), 2147483647);
end;
end;

function OnBeforeSendingMessage(var Message: TStringlist; Servername:
string; IsEmail: boolean):boolean;
var s:string;
begin
result:=true;
s:=message.text;
s:=StringReplace(s,'ü','ue');
s:=stringreplace(s,'ö','oe');
s:=stringreplace(s,'ä','ae');
s:=stringreplace(s,'Ü','Ue');
s:=stringreplace(s,'Ö','Oe');
s:=stringreplace(s,'Ä','Ae');
s:=stringreplace(s,'ß','ss');
message.text:=s;
end;

begin
end.

mike

unread,
Jan 21, 2021, 6:40:30 PMJan 21
to
VanguardLH wrote:

> Seems if you're pasting into your compose window, that you could first
> paste into something else, like a word processor, where you could
> specify the string to find and what to replace it with, copy the result,
> and paste that into your compose window.

This is a test of OnBeforeSendingMessage with ä, ö, ü

If I can't get the script Bernd suggested to work, then I have to manually
clean out the body as you said, by pasting into my gVIM editor on Windows.
:%s/[control+-q+-147,control+-q+-148]/"/g
That will replace all instances of opening or closing curly doublequotes
with straight doublequotes.

But I was hoping this program that Bernd Rose suggested would work
(where I replaced my existing OnBeforeSendingMessage in its entirety)
with this which Bernd had suggested.
http://web.archive.org/web/20120127150224/http://dialog.datalist.org/scripts/ScriptreplaceUmlaut.html

But it doesn't work if you still see the special characters I'm pasting
below from https://www.studying-in-germany.org/german-umlauts/

This is a test of OnBeforeSendingMessage with ä, ö, ü

program OnBeforeSendingMessage;

function StringReplace(S, OldPattern, NewPattern: string): string;
var
SearchStr, Patt, NewStr: string;
Offset: Integer;
begin
SearchStr := S;
Patt := OldPattern;
NewStr := S;
Result := '';
while SearchStr <> '' do
begin
Offset := AnsiPos(Patt, SearchStr);
if Offset = 0 then
begin
Result := Result + NewStr;
Break;
end;
Result := Result + Copy(NewStr, 1, Offset - 1) + NewPattern;
NewStr := Copy(NewStr, Offset + Length(OldPattern), 2147483647);
SearchStr := Copy(SearchStr, Offset + Length(Patt), 2147483647);
end;
end;

function OnBeforeSending(var Message: TStringlist; Servername: string;

mike

unread,
Jan 22, 2021, 1:50:19 AMJan 22
to
wrote:

> program OnBeforeSavingMessage;
> function StringReplace(S, OldPattern, NewPattern: string): string;
> function OnBeforeSendingMessage(var Message: TStringlist; Servername:

Compilation error:
Failed when compiling [Error] (43:5): Type mismatch

Does anyone know why I get a "type mismatch" on these lines when I use them
as an OnBeforeSavingMessage but I do NOT get the type mismatch error when I
compile the same script as an OnBeforeSendingMessage?
NewStr := Copy(NewStr, Offset + Length(OldPattern), 2147483647);
SearchStr := Copy(SearchStr, Offset + Length(Patt), 2147483647);
The type mismatch is shown in red for "2147483647".

(1) I only get the type mismatch when I compile the script below
as an OnBeforeSavingMessage.
(2) When I compile the script below as an OnBeforeSendingMessage,
I don't get the compilation error.
(3) But, either way, the umlaut testcase doesn't get changed to "ue"
TEST ä ö ü ü ö ä

I'm trying to figure out what the code does so I can figure out why it's not
working, but I don't know what the purpose of the "2147483647" is yet.

Do you?
--
program OnBeforeSavingMessage;

function StringReplace(S, OldPattern, NewPattern: string): string;
var
SearchStr, Patt, NewStr: string;
Offset: Integer;
begin
SearchStr := S;
Patt := OldPattern;
NewStr := S;
Result := '';
while SearchStr <> '' do
begin
Offset := AnsiPos(Patt, SearchStr);
if Offset = 0 then
begin
Result := Result + NewStr;
Break;
end;
Result := Result + Copy(NewStr, 1, Offset - 1) + NewPattern;
NewStr := Copy(NewStr, Offset + Length(OldPattern), 2147483647);
SearchStr := Copy(SearchStr, Offset + Length(Patt), 2147483647);
end;
end;

function OnBeforeSavingMessage(var Message: TStringlist; Servername:
string; IsEmail: boolean):boolean;
var s:string;
begin
result:=true;
s:=message.text;
s:=StringReplace(s,'ü','ue');
s:=stringreplace(s,'ö','oe');
s:=stringreplace(s,'ä','ae');
s:=stringreplace(s,'Ü','Ue');
s:=stringreplace(s,'Ö','Oe');
s:=stringreplace(s,'Ä','Ae');
s:=stringreplace(s,'ß','ss');
message.text:=s;
end;

begin
end.
--
program OnBeforeSendingMessage;
--

Bernd Rose

unread,
Jan 23, 2021, 3:52:05 AMJan 23
to
On Fri, 22nd Jan 2021 12:20:16 +0530, mike wrote:

Hm. Where to start? In your message Message-ID: <rud3db$ggn$1...@solani.org>
you used a function name that differs from the program name. (Maybe a
copy/paste error?) But this way, the main function will never be called.

Using OnBeforeSavingMessage will /not/ accomplish, what you have in
mind, because it only fires when saving /incoming/ message to the
Dialog database. It does /not/ work when saving drafts. (Drafts are
saved from editor window as you type and are neither checked nor
altered when saving them without sending.)

You get compile error for the (seemingly identical) OnBeforeSavingMessage
script (compared to OnBeforeSendingMessage), because the latter program
is written to be able to be canceled, while the latter is not. Instead
of using a main function with return value as in OnBeforeSendingMessage,
you need to use a main /procedure/ with OnBeforeSendingMessage:

procedure OnBeforeSavingMessage(var Message: TStringlist; Servername:string; IsEmail: boolean);

And this procedure must /not/ have a line like:
result:=true;

The number 2147483647 is the maximum string length for the copy
function. It prevents buffer overflow. Larger strings will be truncated.


Back to your original question:

What has to be replaced depends on the encoding of your message. The
replacement occurs right before sending. Therefore it is done on the raw
outgoing message (including all headers!!).

To replace the German umlaut "ä" in several encodings, you'd need to
adjust the search/replace strings with sth. like:

s:=stringreplace(s,'ä','ae');
s:=stringreplace(s,'=E4','ae');
s:=StringReplace(s,'=C3=A4','ae');

Be aware, that this kind of alteration on the raw message must only be
used for characters or strings, which may /not/ be found inside the
header texts!! Else, anything can happen: From posting invalid messages
to stray messages reaching the wrong recipient!

On another thought: Maybe you can use the LastMessageCheck script to see,
how the characters are encoded in your outgoing messages. This script
opens a preview of the raw message right before sending (if it included
correctly). In a case of error, you can cancel the sending process. Be
aware, though, that everything written in this message will be lost on
cancel. So you need to start typing from scratch. (Or keep the whole
message in clipboard right before sending.) The (German) download site
for the LastMessageCheck script is:

http://4d.vollmeier.at/scripte/ereignisscripte/onbeforesendingmessage/lastmessagecheck.html

Hopefully, I have addressed all matters...
Bernd
Reply all
Reply to author
Forward
0 new messages