Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

boost::regex_relace vs. std::regex_replace: Who is right?

102 views
Skip to first unread message

Ralf Goertz

unread,
Jun 14, 2016, 4:31:58 AM6/14/16
to
Hi

after upgrading gcc from version 4.8.5 to version 5.3.1 I thought I
could get rid of boost's regex-implementation (boost version 1.54.0) and
use the one provided by gcc (it didn't work with gcc before version 4.9
AFAIK). However, this turned out to be a problem because those two
implementations behave differently:


#include <regex>
#include <boost/regex.hpp>
#include <iostream>
#include <string>

int main() {
std::string s="\\needs_another_backslash";
std::string reg("^\\\\");
std::string rep("\\\\");
std::regex sr(reg);
boost::regex br(reg);
std::cout<<"string before replacement:\n"<<s<<std::endl<<
"std::regex_replace:\n"<<std::regex_replace(s,sr,rep)<<std::endl<<
"boost::regex_replace:\n"<<boost::regex_replace(s,br,rep)<<std::endl;
return 0;
}

The output of that program is:

string before replacement:
\needs_another_backslash
std::regex_replace:
\\needs_another_backslash
boost::regex_replace:
\needs_another_backslash


It seems as if boost treats the '\' in a replacement string specially
whereas gcc does not. According to
http://www.cplusplus.com/reference/regex/regex_replace/ the magical
character for backreference etc. in the replacement string is '$'. So I
tend to think that gcc is right. However, in other programs (like vim
e.g.) it is '\'. So boost might have a point in treating '\' specially.
So who is right?

Ralf

Marcel Mueller

unread,
Jun 18, 2016, 5:34:21 PM6/18/16
to
On 14.06.16 10.31, Ralf Goertz wrote:
> It seems as if boost treats the '\' in a replacement string specially
> whereas gcc does not. According to
> http://www.cplusplus.com/reference/regex/regex_replace/ the magical
> character for backreference etc. in the replacement string is '$'. So I
> tend to think that gcc is right. However, in other programs (like vim
> e.g.) it is '\'. So boost might have a point in treating '\' specially.
> So who is right?

It depends on the specification.

Most implementations I know use \1, \2 ... for back references in the
pattern and $1, $2 ... in the replacement string.


Marcel

Ralf Goertz

unread,
Jun 20, 2016, 5:10:46 AM6/20/16
to
Am Sat, 18 Jun 2016 23:32:12 +0200
schrieb Marcel Mueller <news.5...@spamgourmet.org>:

> It depends on the specification.
>
> Most implementations I know use \1, \2 ... for back references in the
> pattern and $1, $2 ... in the replacement string.
>
>

But that's the problem. Boost also uses $ for back reference. So why
would I need to escape the backslash?

Öö Tiib

unread,
Jun 20, 2016, 8:58:46 AM6/20/16
to
Best is to assume that 'boost::regex_replace' and 'std::regex_replace'
are totally different functions and those having something in common
is purely accidental.

Issue about escape symbols is always about details of grammar. Regex
grammars can be seemingly configured with constants (your posted
code goes with defaults). http://www.cplusplus.com/reference/regex/regex_constants/
There can be (I'm speculating now) some sort of difference. For example
that the 'format_default' means slightly different things for 'boost' and
'std'.

Such differences are no problem but the endless source of work (and
so bread) for you. ;-)

Ralf Goertz

unread,
Jun 21, 2016, 4:29:48 AM6/21/16
to
Am Mon, 20 Jun 2016 05:58:28 -0700 (PDT)
schrieb Öö Tiib <oot...@hot.ee>:

> Best is to assume that 'boost::regex_replace' and
> 'std::regex_replace' are totally different functions and those having
> something in common is purely accidental.

Well, isn't boost the forerunner in implementing new features that later
become standard? So I would like to think that these to function are
merely two implementations of the same (standard) thing.

> Issue about escape symbols is always about details of grammar. Regex
> grammars can be seemingly configured with constants (your posted
> code goes with defaults).
> http://www.cplusplus.com/reference/regex/regex_constants/ There can
> be (I'm speculating now) some sort of difference. For example that
> the 'format_default' means slightly different things for 'boost' and
> 'std'.

Yeah, but so far I have failed to find a standard or boost document that
mentions explicitely that I have to escape the backslash. The only hint
given by someone at stackoverflow is that boost does it the perl way and
in perl we also need to escape '\' in the replacement string. I only
have gcc at hand I don't really know what the standard says. It might
still be that boost is correctly interpreting the standard and the gcc
got it wrong. That's my real question. What is the right behaviour of
that function according to the standard?

> Such differences are no problem but the endless source of work (and so
> bread) for you. ;-)

Well, for me the are indeed a pain in the ass because I am not paid to
write code per se but to solve problems. For that I write c++ programs
and try to stick to the standard.


Öö Tiib

unread,
Jun 21, 2016, 12:19:31 PM6/21/16
to
On Tuesday, 21 June 2016 11:29:48 UTC+3, Ralf Goertz wrote:
> Am Mon, 20 Jun 2016 05:58:28 -0700 (PDT)
> schrieb Öö Tiib <oot...@hot.ee>:
>
> > Best is to assume that 'boost::regex_replace' and
> > 'std::regex_replace' are totally different functions and those having
> > something in common is purely accidental.
>
> Well, isn't boost the forerunner in implementing new features that later
> become standard? So I would like to think that these to function are
> merely two implementations of the same (standard) thing.

You seem to think that first there is standard and then boost. Actually
it is other way around. Boost was made for to produce candidates of
classes and libraries into C++ standard library. It is quite successful in
that. That does not mean that successful candidate is accepted into
standard library without changes. It seems more common that there
are some changes. Also, Boost tries to be platform-neutral so usually
there will be several platform-specific implementation after something
of it is standardized.

>
> > Issue about escape symbols is always about details of grammar. Regex
> > grammars can be seemingly configured with constants (your posted
> > code goes with defaults).
> > http://www.cplusplus.com/reference/regex/regex_constants/ There can
> > be (I'm speculating now) some sort of difference. For example that
> > the 'format_default' means slightly different things for 'boost' and
> > 'std'.
>
> Yeah, but so far I have failed to find a standard or boost document that
> mentions explicitely that I have to escape the backslash. The only hint
> given by someone at stackoverflow is that boost does it the perl way and
> in perl we also need to escape '\' in the replacement string.

I have also read gossip that boost had some Perl-like features to that
"ECMAScript" that std::regex is not required to contain but i'm unsure
if it is about those backslashes.

> I only have gcc at hand I don't really know what the standard says. It
> might still be that boost is correctly interpreting the standard and the
> gcc got it wrong. That's my real question. What is the right behaviour
> of that function according to the standard?

Perl is not mentioned anywhere. I see "ECMAScript", "POSIX BRE",
"POSIX ERE", "grep", "egrep" and "awk" mentioned.

>
> > Such differences are no problem but the endless source of work (and so
> > bread) for you. ;-)
>
> Well, for me the are indeed a pain in the ass because I am not paid to
> write code per se but to solve problems. For that I write c++ programs
> and try to stick to the standard.

When you write programs to solve problems then what a function or library is
documented to do is likely less important than what it actually does. These
two things are never exactly same and also both documentation and behavior
will change over time. Also, what problem needs usage of both boost::regex
and std::regex in mix?

Ralf Goertz

unread,
Jun 22, 2016, 3:56:39 AM6/22/16
to
Am Tue, 21 Jun 2016 09:19:16 -0700 (PDT)
schrieb Öö Tiib <oot...@hot.ee>:


> You seem to think that first there is standard and then boost.
> Actually it is other way around. Boost was made for to produce
> candidates of classes and libraries into C++ standard library. It is
> quite successful in that. That does not mean that successful
> candidate is accepted into standard library without changes. It
> seems more common that there are some changes. Also, Boost tries to
> be platform-neutral so usually there will be several
> platform-specific implementation after something of it is
> standardized.

I am well aware of boost's role in the advancement of the standard. Of
course there can be modifications during that process. That is one of
the reasons why I try to get rid of boost functions as soon as there are
std alternatives. Not that I don't like boost. On the contrary, I am
very grateful for the work of its programmers.

> I have also read gossip that boost had some Perl-like features to that
> "ECMAScript" that std::regex is not required to contain but i'm
> unsure if it is about those backslashes.

It seems I hadn't looked hard enough. Boost clearly states that the
backslash is an escape character in all three variants of the format
string (Sed, Perl, Boost-Extended). What puzzled me was that it is also
(and in the Sed-case only) used for the `normal’ escape sequences like
'\t' which means that the string literals "\t" and "\\t" give the same
result in boost::regex_replace when used as a replacement (format)
string.

> When you write programs to solve problems then what a function or
> library is documented to do is likely less important than what it
> actually does. These two things are never exactly same and also both
> documentation and behavior will change over time. Also, what problem
> needs usage of both boost::regex and std::regex in mix?

None of course. But as I said in my initial post I am switching from
boost::regex to std::regex now that my upgrading gcc made the latter
available. During that process I discovered the problem.

0 new messages