[boost] [regex] embed string/char in regex w/o escaping?

93 views
Skip to first unread message

Arno Schödl

unread,
Mar 1, 2011, 2:50:38 PM3/1/11
to bo...@lists.boost.org
As far as I can see, there is no way to embed a plain string into a regex, without escaping the string. Same with characters. Isn't that a bad omission for a library that becomes the new C++ standard? Escaping something just to unescape it on the other end of the function call seems unnecessary, and it opens up the possibility of ills like regex code injection if someone forgets to escape or does it wrong.

I know about Boost.Xpressive, but that won't be the new standard. This is not a lobbying effort for Boost.Xpressive. I have never used it precisely because I wanted to stick to the standard. Now I am a bit worried about pouring something into C++ standard concrete that has a pretty obvious omission in it.

Arno

--
Dr. Arno Schödl | asch...@think-cell.com
Technical Director

think-cell Software GmbH | Chausseestr. 8/E | 10115 Berlin | Germany
http://www.think-cell.com | phone +49 30 666473-10 | US phone +1 800 891 8091

Amtsgericht Berlin-Charlottenburg, HRB 85229 | European Union VAT Id DE813474306
Directors: Dr. Markus Hannebauer, Dr. Arno Schoedl
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Jim Bell

unread,
Mar 1, 2011, 11:01:21 PM3/1/11
to bo...@lists.boost.org
On 1:59 PM, Arno Schödl wrote:
> As far as I can see, there is no way to embed a plain string into a regex, without escaping the string. Same with characters. Isn't that a bad omission for a library that becomes the new C++ standard? Escaping something just to unescape it on the other end of the function call seems unnecessary, and it opens up the possibility of ills like regex code injection if someone forgets to escape or does it wrong.

I was just pondering regex security risks
(<http://lists.boost.org/boost-users/2011/02/66533.php>).

Has anyone studied regex code injection and its implications?

How about
<http://www.boost.org/doc/libs/1_46_0/libs/regex/doc/html/boost_regex/ref/syntax_option_type/syntax_option_type_literal.html>?
It treats the whole string as literal. Is that what you're seeking?

regex has to contend with in-band signaling in general, and it's a
thorny issue.

To your point of escaping a string wrong, I fiddled with a
regex_replace() that would remove all '\E' (end-of-quoted-sequence),
including '\\\E', but not touch '\\E' (i.e., even numbers of '\'
prefixing), and couldn't get it.

John Maddock

unread,
Mar 2, 2011, 4:19:53 AM3/2/11
to bo...@lists.boost.org
>To your point of escaping a string wrong, I fiddled with a
>regex_replace() that would remove all '\E' (end-of-quoted-sequence),
>including '\\\E', but not touch '\\E' (i.e., even numbers of '\'
>prefixing), and couldn't get it.

I'm not sure I understand what you're trying to achieve there, can you
explain?

John.

John Maddock

unread,
Mar 2, 2011, 4:29:19 AM3/2/11
to bo...@lists.boost.org
>As far as I can see, there is no way to embed a plain string into a regex,
>without escaping the string. Same with characters.
>Isn't that a bad omission for a library that becomes the new C++ standard?
>Escaping something just to unescape it on the
>other end of the function call seems unnecessary, and it opens up the
>possibility of ills like regex code injection if someone
>forgets to escape or does it wrong.

Good question - if this is Boost.Regex rather than the std then there is an
option to treat a whole string as a literal (as Jim Bell mentioned), or you
can enclose part of a string that has to be treated as a literal in \Q...\E
as in Perl.

Otherwise you're looking at a call to regex_replace to quote things for you,
off the top of my head something like:

regex e("[.\[\]{}()\\\\*+?|^$]");
std::string my_escaped_string = "(?:" + regex_replace(my_string, e,
"\\\\$&") + ")";

Should do the trick.

HTH, John.

Jim Bell

unread,
Mar 2, 2011, 10:43:24 AM3/2/11
to bo...@lists.boost.org

On 1:59 PM, John Maddock wrote:
>> To your point of escaping a string wrong, I fiddled with a
>> regex_replace() that would remove all '\E' (end-of-quoted-sequence),
>> including '\\\E', but not touch '\\E' (i.e., even numbers of '\'
>> prefixing), and couldn't get it.
>
> I'm not sure I understand what you're trying to achieve there, can you
> explain?
>

I was going for different form, but (hopefully) same result as yours:

std::string my_escaped_string = "\\Q" + regex_replace(my_string, e,
"\\\\$&") + "\\E";

Yechezkel Mett

unread,
Mar 6, 2011, 6:16:56 AM3/6/11
to bo...@lists.boost.org
On Wed, Mar 2, 2011 at 5:43 PM, Jim Bell <J...@jc-bell.com> wrote:
> On 1:59 PM, John Maddock wrote:
>>> To your point of escaping a string wrong, I fiddled with a
>>> regex_replace() that would remove all '\E' (end-of-quoted-sequence),
>>> including '\\\E', but not touch '\\E' (i.e., even numbers of '\'
>>> prefixing), and couldn't get it.
>>
>> I'm not sure I understand what you're trying to achieve there, can you
>> explain?
>>
>
> I was going for different form, but (hopefully) same result as yours:
>
> std::string my_escaped_string = "\\Q" + regex_replace(my_string, e,
> "\\\\$&") + "\\E";

I use something like

"\\Q" + boost::replace_all_copy(text, "\\E", "\\E\\\\E\\Q") + "\\E"

Yechezkel Mett

Reply all
Reply to author
Forward
0 new messages