Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Tcl way to escape search strings in regexp

203 views
Skip to first unread message

heinrichmartin

unread,
Aug 18, 2015, 10:32:19 AM8/18/15
to
I came up with the following proc to escape search strings for regexp quite some time ago:

# escapes all regex special characters. This procedure is useful when
# literally matching against otherwise interpreted characters of a regular
# expression.
# @param from the string to be escaped
# @return a string which represents <code>from</code> in a regular expression
# @see http://tcl.tk/man/tcl8.5/TclCmd/re_syntax.htm
proc regexp_escape {from} {
set ret ""
foreach char [split $from {}] {
switch -glob -- $char {
"[A-Za-z0-9\r\n_!/ @]" - - {
append ret $char
}
[().*?{}|] {
append ret "\\$char"
}
default {
append ret [format {(?:\x%x)} [scan $char %c]]
}
}
}
return $ret
}

Is there a built-in way to achieve the same?

PS: http://core.tcl.tk/tcllib/tktview/06bf459d36c909846d3577d0cfd7fec261ab8c1e

Christian Gollwitzer

unread,
Aug 18, 2015, 10:41:31 AM8/18/15
to
Am 18.08.15 um 16:32 schrieb heinrichmartin:
> I came up with the following proc to escape search strings for regexp quite some time ago:
> [...code...]
> Is there a built-in way to achieve the same?

I'm not aware of a built-in method, but I thin you work too hard. Is it
not sufficient to just escape the metacharacters via string map? i.e.

set esc [string map {. \. ? \? ....} $from]

where in the braces you map the metacharacters?

Christian

> PS: http://core.tcl.tk/tcllib/tktview/06bf459d36c909846d3577d0cfd7fec261ab8c1e
>

heinrichmartin

unread,
Aug 18, 2015, 11:47:13 AM8/18/15
to
On Tuesday, August 18, 2015 at 4:41:31 PM UTC+2, Christian Gollwitzer wrote:
> I'm not aware of a built-in method, but I thin you work too hard. Is it
> not sufficient to just escape the metacharacters via string map? i.e.
>
> set esc [string map {. \. ? \? ....} $from]
>
> where in the braces you map the metacharacters?

If I understand your question correctly, then the answer is yes. The switch does not touch known non-meta-characters, backslash-escapes the ones that can be handled that way, and uses ascii codes for the rest.

I can't recall exactly how the code evolved, but I think that I had the \x for all characters initially (very ugly indeed!). This seemed to be the safe side ...

The dots in your reply are exactly what I'd like to have maintained along with the RE processing code (and it might depend on the switches!).

Martin

Alexandre Ferrieux

unread,
Aug 18, 2015, 12:25:45 PM8/18/15
to
On Tuesday, August 18, 2015 at 4:41:31 PM UTC+2, Christian Gollwitzer wrote:
>
> set esc [string map {. \. ? \? ....} $from]
>

Or:

regsub -all {[][{}()^$.+*?]} $from {\\&} esc
0 new messages