Re: [erlang-questions] Why is it necessary to "double-escape" [ characters in regular expressions?

672 views
Skip to first unread message

Gaspar Chilingarov

unread,
Mar 30, 2009, 2:57:40 AM3/30/09
to David Mitchell, erlang-questions Questions
Hello!

Let's try on the simple example

2> "\q".
"q"
3> "\\q".
"\\q"

But do really "\\q" means 3 symbols or 2 ?

6> lists:map(fun(X) -> erlang:display(X) end, "\\q").
92
113
[true,true]

Well. So it's in your regular expression you put exactly 2 characters -
\ and [
which is required by a regexp to interpret [ as a character and not as
class start symbol.

So it works quite predictable ;)

/Gaspar

--
Gaspar Chilingarov

tel +37493 419763 (mobile - leave voice mail message)
icq 63174784
skype://gasparch
e mailto:n...@web.am mailto:gasp...@gmail.com
w http://gasparchilingarov.com/
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://www.erlang.org/mailman/listinfo/erlang-questions

Sverker Eriksson

unread,
Mar 30, 2009, 8:06:15 AM3/30/09
to David Mitchell, erlang-questions Questions
David Mitchell wrote:
> However, I didn't expect that "double escaping" it would be the solution to
> my problem.
>
>
From http://erlang.org/doc/man/re.html:

Note

The Erlang literal syntax for strings give special meaning to the "\"
(backslash) character. To literally write a regular expression or a
replacement string containing a backslash in your code or in the shell,
two backslashes have to be written: "\\".


/Sverker, Erlang/OTP Ericsson

Zvi

unread,
Mar 30, 2009, 6:18:46 AM3/30/09
to erlang-q...@erlang.org

why type so many characters:

[{C}||C<-"\\q"].

output:
[{92},{113}]

Zvi


Gaspar Chilingarov wrote:
>
> 6> lists:map(fun(X) -> erlang:display(X) end, "\\q").
> 92
> 113
> [true,true]
>

--
View this message in context: http://www.nabble.com/Re%3A-Why-is-it-necessary-to-%22double-escape%22--%09characters-in-regular-expressions--tp22777475p22780122.html
Sent from the Erlang Questions mailing list archive at Nabble.com.

Zvi

unread,
Mar 30, 2009, 6:53:57 AM3/30/09
to erlang-q...@erlang.org

Why type so many characters? when in doubt about escaping rules, try
something like this:

1> [[C]||C<-"\\q"].
["\\","q"]

2> [{C}||C<-"\\q"].
[{92},{113}]

Zvi


Gaspar Chilingarov wrote:
>
> 6> lists:map(fun(X) -> erlang:display(X) end, "\\q").
> 92
> 113
> [true,true]
>

--
View this message in context: http://www.nabble.com/Re%3A-Why-is-it-necessary-to-%22double-escape%22--%09characters-in-regular-expressions--tp22777475p22780162.html

Zvi

unread,
Mar 30, 2009, 6:21:13 AM3/30/09
to erlang-q...@erlang.org

Why type sp many characters, when in doubt about escaping rules try something
like this:

1> [[C]||C<-"\\q"].
["\\","q"]

2> [{C}||C<-"\\q"].
[{92},{113}]

Zvi


Gaspar Chilingarov wrote:
>
> 6> lists:map(fun(X) -> erlang:display(X) end, "\\q").
> 92
> 113
> [true,true]
>

--

View this message in context: http://www.nabble.com/Re%3A-Why-is-it-necessary-to-%22double-escape%22--%09characters-in-regular-expressions--tp22777475p22780162.html
Sent from the Erlang Questions mailing list archive at Nabble.com.

_______________________________________________

Johnny Billquist

unread,
Apr 1, 2009, 9:34:27 AM4/1/09
to Richard Andrews, erlang-questions Questions
I'm not sure I would call it "escaping", since [] in a regular
expression actually have a meaning. They express a range of valid chars.
However, the characters inside [] are interpreted/parsed in another way
than outside of them, which cause a [ inside to be accepted literally. ]
is a little ugly in that it must be the first character in the range
specified inside a [], otherwise it won't work. (So you could say [abc[]
to match any of a,b,c or [, but you couldn't say [abc]], you would have
to write it as []abc]).

Using \ to excape brackets seems to vary between different
implementations of regexps that I look at.

As for the orginial question, others have already pointed it out, but in
order to get a \ in the actual string you create, you need to put a
double \ in the literal. And that's escaping. :-)

Johnny

Richard Andrews wrote:
> IIRC the way to escape [ in regular expressions is [[] not \[.
> Similarly []] not \].
>
> Never tried with erlang re application though.
>
> ------------------------------------------------------------------------
> *From:* David Mitchell <monc...@gmail.com>
> *To:* erlang-questions Questions <erlang-q...@erlang.org>
> *Sent:* Monday, 30 March, 2009 2:19:28 PM
> *Subject:* [erlang-questions] Why is it necessary to "double-escape" [
> characters in regular expressions?
>
> Hello group,
>
> Running 5.6.5 under Windows...
>
> I've got a bunch of code that's "almost but not quite syntactically
> correct" XML, and I'm trying to convert it to valid XML. Part of this
> process involves removing some invalid CDATA tags.
>
> My code fragment:
> re:replace("abc123", "<!\[CDATA\[<", "<", [{return, list}]).
> is giving me "exception error: bad argument in function re:replace/4.
>
> Trial and error shows that removing the escaped [ characters:
> re:replace("abc123 <![CDATA[< abc123", "<!CDATA<", "<" [{return, list}]).
> works as expected, but it's obviously not what I want.
>
> However, "double-escaping" the [ characters (by adding a second \ prior
> to the [ character) does exactly what I want:
> re:replace("abc123 <![CDATA[< abc123", "<!\\[CDATA\\[<", "<",
> [{return, list}])
> returns "abc123 < abc123", which is the result I'm after.
>
> In this context, I guess it's conceivable that the [ character can be
> misinterpreted in two distinct ways in a regular expression:
> - it could denote the start of an Erlang list
> - it could denote the start of a character grouping within a regular
> expression


> However, I didn't expect that "double escaping" it would be the solution
> to my problem.
>

> Is this expected behaviour, or some sort of anomaly? In any case,
> sending this email to the mailing list should help out the next person
> who falls into this trap, but who can use Google to track down the
> solution...
>
> Regards
>
> David Mitchell
>
> ------------------------------------------------------------------------
> Enjoy a better web experience. Upgrade to the new Internet Explorer 8
> optimised for Yahoo!7. Get it now.
> <http://au.rd.yahoo.com/search/ie8/mailtagline/*http://us.lrd.yahoo.com/_ylc=X3oDMTJxbnQwdTJhBF9zAzIxNDIwMjU2NTkEdG1fZG1lY2gDVGV4dCBMaW5rBHRtX2xuawNVMTEwMzQ0OAR0bV9uZXQDWWFob28hBHRtX3BvcwN0YWdsaW5lBHRtX3BwdHkDYXVueg--/SIG=11k6t9t1c/**http://downloads.yahoo.com/au/internetexplorer/>.
>
>
> ------------------------------------------------------------------------


>
> _______________________________________________
> erlang-questions mailing list
> erlang-q...@erlang.org
> http://www.erlang.org/mailman/listinfo/erlang-questions


--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: b...@softjar.se || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol

Jachym Holecek

unread,
Mar 30, 2009, 3:21:18 AM3/30/09
to David Mitchell, erlang-questions Questions
# David Mitchell 2009-03-30:

> However, "double-escaping" the [ characters (by adding a second \ prior to
> the [ character) does exactly what I want:
> re:replace("abc123 <![CDATA[< abc123", "<!\\[CDATA\\[<", "<", [{return,
> list}])
> returns "abc123 < abc123", which is the result I'm after.

Backslash is an escape character within string syntax, like in C.
So if you want your literal string to contain a backslash (you
do, in order to remove the special meaning '[' has in REs),
you need to write "\\" as you discovered. See section 2.14 of
Erlang Reference Manual for more details.

HTH,
-- Jachym

Tony Finch

unread,
Mar 30, 2009, 11:06:49 AM3/30/09
to David Mitchell, erlang-questions Questions
On Mon, 30 Mar 2009, David Mitchell wrote:
>
> However, "double-escaping" the [ characters (by adding a second \ prior to
> the [ character) does exactly what I want:
> re:replace("abc123 <![CDATA[< abc123", "<!\\[CDATA\\[<", "<", [{return, list}])
> returns "abc123 < abc123", which is the result I'm after.

There are two levels of backslash escaping here, one for string literals
and one for regular expressions. The \\ in the string literal becomes \ in
the run-time string, which the regex implementation treats as an escape
character for the following [. If you only write \[ in the string literal
then this becomes [ in the run-time string which the regex implementation
treats as the start of a character class specifier, and since there's no
closing ] it throws a syntax error.

Tony.
--
f.anthony.n.finch <d...@dotat.at> http://dotat.at/
GERMAN BIGHT HUMBER: SOUTHWEST 5 TO 7. MODERATE OR ROUGH. SQUALLY SHOWERS.
MODERATE OR GOOD.

David Mitchell

unread,
Mar 29, 2009, 11:19:28 PM3/29/09
to erlang-questions Questions
Hello group,

Running 5.6.5 under Windows...

I've got a bunch of code that's "almost but not quite syntactically correct" XML, and I'm trying to convert it to valid XML.  Part of this process involves removing some invalid CDATA tags.

My code fragment:
  re:replace("abc123", "<!\[CDATA\[<", "<", [{return, list}]).
is giving me "exception error: bad argument in function re:replace/4.

Trial and error shows that removing the escaped [ characters:
  re:replace("abc123 <![CDATA[< abc123", "<!CDATA<", "<" [{return, list}]).
works as expected, but it's obviously not what I want.

However, "double-escaping" the [ characters (by adding a second \ prior to the [ character) does exactly what I want:
  re:replace("abc123 <![CDATA[< abc123", "<!\\[CDATA\\[<", "<", [{return, list}])
returns "abc123 < abc123", which is the result I'm after.

Richard Andrews

unread,
Mar 30, 2009, 6:01:57 AM3/30/09
to David Mitchell, erlang-questions Questions
IIRC the way to escape [ in regular expressions is [[] not \[.
Similarly []] not \].

Never tried with erlang re application though.


From: David Mitchell <monc...@gmail.com>
To: erlang-questions Questions <erlang-q...@erlang.org>
Sent: Monday, 30 March, 2009 2:19:28 PM
Subject: [erlang-questions] Why is it necessary to "double-escape" [ characters in regular expressions?


Enjoy a better web experience. Upgrade to the new Internet Explorer 8 optimised for Yahoo!7. Get it now..
Reply all
Reply to author
Forward
0 new messages