regex usage in C++

79 views
Skip to first unread message

Mike Frysinger

unread,
Oct 5, 2017, 1:57:02 PM10/5/17
to chromium-os-dev
do we have any pref when it comes to using regex in C++ code ?  i was thinking of updating our docs to suggest people start with the C lib's regex.h, and if that bare bones approach isn't sufficient, to use PCRE.  looking at platform2, those look like the only things we use currently.
-mike

Mike Frysinger

unread,
Oct 5, 2017, 2:28:34 PM10/5/17
to chromium-os-dev
Chun-ta pointed out that with C++-11, we also now have <regex>.  i guess that would supersede any regex.h usage in C++ code.

so how about:
- for basic C code, use regex.h
- for basic C++ code, <regex>
- if you need more functionality, or speed is critical, use PCRE
-mike

Nam Nguyen

unread,
Oct 5, 2017, 4:06:49 PM10/5/17
to Mike Frysinger, chromium-os-dev
On Thu, Oct 5, 2017 at 11:27 AM, Mike Frysinger <vap...@chromium.org> wrote:
Chun-ta pointed out that with C++-11, we also now have <regex>.  i guess that would supersede any regex.h usage in C++ code.

so how about:
- for basic C code, use regex.h
- for basic C++ code, <regex>
- if you need more functionality, or speed is critical, use PCRE

​That last point, I think, should favor Google's RE2.
Nam


-mike

On Thu, Oct 5, 2017 at 1:56 PM, Mike Frysinger <vap...@chromium.org> wrote:
do we have any pref when it comes to using regex in C++ code ?  i was thinking of updating our docs to suggest people start with the C lib's regex.h, and if that bare bones approach isn't sufficient, to use PCRE.  looking at platform2, those look like the only things we use currently.
-mike

--
--
Chromium OS Developers mailing list: chromiu...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-os-dev?hl=en


Mike Frysinger

unread,
Oct 5, 2017, 4:28:47 PM10/5/17
to Nam Nguyen, chromium-os-dev
On Thu, Oct 5, 2017 at 4:06 PM, Nam Nguyen <namn...@chromium.org> wrote:
On Thu, Oct 5, 2017 at 11:27 AM, Mike Frysinger <vap...@chromium.org> wrote:
Chun-ta pointed out that with C++-11, we also now have <regex>.  i guess that would supersede any regex.h usage in C++ code.

so how about:
- for basic C code, use regex.h
- for basic C++ code, <regex>
- if you need more functionality, or speed is critical, use PCRE

​That last point, I think, should favor Google's RE2.

only reason i don't favor RE2 is that no one is using it in CrOS today whereas we have projects using PCRE.  in general, the inertia seems to favor PCRE over RE2.

Daniel Erat

unread,
Oct 5, 2017, 9:34:13 PM10/5/17
to Mike Frysinger, Nam Nguyen, chromium-os-dev
On Thu, Oct 5, 2017 at 2:28 PM, Mike Frysinger <vap...@chromium.org> wrote:
On Thu, Oct 5, 2017 at 4:06 PM, Nam Nguyen <namn...@chromium.org> wrote:
On Thu, Oct 5, 2017 at 11:27 AM, Mike Frysinger <vap...@chromium.org> wrote:
Chun-ta pointed out that with C++-11, we also now have <regex>.  i guess that would supersede any regex.h usage in C++ code.

so how about:
- for basic C code, use regex.h
- for basic C++ code, <regex>
- if you need more functionality, or speed is critical, use PCRE

​That last point, I think, should favor Google's RE2.

only reason i don't favor RE2 is that no one is using it in CrOS today whereas we have projects using PCRE.  in general, the inertia seems to favor PCRE over RE2.
-mike

RE2 is used heavily within Chrome, and <regex> is banned by https://chromium-cpp.appspot.com/. The justification for the latter decision just seems to be the use of other similar libraries in Chrome, though, so I don't know how applicable it should be to Chrome OS.

I do lean towards following Chrome's lead in the absence of strong reasons to the contrary. dev-libs/re2 looks like it's present in portage-stable, although I also don't see anyone using it.

Hidehiko Abe

unread,
Oct 6, 2017, 12:37:02 AM10/6/17
to Daniel Erat, Mike Frysinger, Nam Nguyen, chromium-os-dev
On Fri, Oct 6, 2017 at 10:33 AM, Daniel Erat <de...@chromium.org> wrote:
On Thu, Oct 5, 2017 at 2:28 PM, Mike Frysinger <vap...@chromium.org> wrote:
On Thu, Oct 5, 2017 at 4:06 PM, Nam Nguyen <namn...@chromium.org> wrote:
On Thu, Oct 5, 2017 at 11:27 AM, Mike Frysinger <vap...@chromium.org> wrote:
Chun-ta pointed out that with C++-11, we also now have <regex>.  i guess that would supersede any regex.h usage in C++ code.

so how about:
- for basic C code, use regex.h
- for basic C++ code, <regex>
- if you need more functionality, or speed is critical, use PCRE

​That last point, I think, should favor Google's RE2.

only reason i don't favor RE2 is that no one is using it in CrOS today whereas we have projects using PCRE.  in general, the inertia seems to favor PCRE over RE2.
-mike

RE2 is used heavily within Chrome, and <regex> is banned by https://chromium-cpp.appspot.com/. The justification for the latter decision just seems to be the use of other similar libraries in Chrome, though, so I don't know how applicable it should be to Chrome OS.

I do lean towards following Chrome's lead in the absence of strong reasons to the contrary. dev-libs/re2 looks like it's present in portage-stable, although I also don't see anyone using it.
 

My two cents: According to my personal experience, I'd recommend RE2 than <regex> in C++, ATM.
In some cases (like a large regex), C++ standard <regex> (provided with clang) didn't work (i.e. crashed).
I'm not sure how much we'd like to use regular expression in C/C++, though.

In the code we manage (i.e. platform/ platform2/), we can migrate to use one regex lib technically, if necessary.
Though, it doesn't help for third_party libs using pcre, so if we decide to use RE2, we may need to maintain two regex libs, at least?

Jorge Lucangeli Obes

unread,
Oct 6, 2017, 10:24:27 AM10/6/17
to Hidehiko Abe, Daniel Erat, Mike Frysinger, Nam Nguyen, chromium-os-dev
On Fri, Oct 6, 2017 at 12:36 AM, Hidehiko Abe <hide...@chromium.org> wrote:


On Fri, Oct 6, 2017 at 10:33 AM, Daniel Erat <de...@chromium.org> wrote:
On Thu, Oct 5, 2017 at 2:28 PM, Mike Frysinger <vap...@chromium.org> wrote:
On Thu, Oct 5, 2017 at 4:06 PM, Nam Nguyen <namn...@chromium.org> wrote:
On Thu, Oct 5, 2017 at 11:27 AM, Mike Frysinger <vap...@chromium.org> wrote:
Chun-ta pointed out that with C++-11, we also now have <regex>.  i guess that would supersede any regex.h usage in C++ code.

so how about:
- for basic C code, use regex.h
- for basic C++ code, <regex>
- if you need more functionality, or speed is critical, use PCRE

​That last point, I think, should favor Google's RE2.

only reason i don't favor RE2 is that no one is using it in CrOS today whereas we have projects using PCRE.  in general, the inertia seems to favor PCRE over RE2.
-mike

RE2 is used heavily within Chrome, and <regex> is banned by https://chromium-cpp.appspot.com/. The justification for the latter decision just seems to be the use of other similar libraries in Chrome, though, so I don't know how applicable it should be to Chrome OS.

I do lean towards following Chrome's lead in the absence of strong reasons to the contrary. dev-libs/re2 looks like it's present in portage-stable, although I also don't see anyone using it.
 

My two cents: According to my personal experience, I'd recommend RE2 than <regex> in C++, ATM.
In some cases (like a large regex), C++ standard <regex> (provided with clang) didn't work (i.e. crashed).
I'm not sure how much we'd like to use regular expression in C/C++, though.

In the code we manage (i.e. platform/ platform2/), we can migrate to use one regex lib technically, if necessary.
Though, it doesn't help for third_party libs using pcre, so if we decide to use RE2, we may need to maintain two regex libs, at least?
 

I'd support using re2 both because of the stability guarantees (https://github.com/google/re2/wiki/WhyRE2) and to avoid diverging from Chrome's style for no reason.

Mike Frysinger

unread,
Oct 6, 2017, 6:03:46 PM10/6/17
to Jorge Lucangeli Obes, Hidehiko Abe, Daniel Erat, Nam Nguyen, chromium-os-dev
On Fri, Oct 6, 2017 at 10:23 AM, Jorge Lucangeli Obes <jor...@chromium.org> wrote:
On Fri, Oct 6, 2017 at 12:36 AM, Hidehiko Abe <hide...@chromium.org> wrote:
On Fri, Oct 6, 2017 at 10:33 AM, Daniel Erat <de...@chromium.org> wrote:
On Thu, Oct 5, 2017 at 2:28 PM, Mike Frysinger <vap...@chromium.org> wrote:
On Thu, Oct 5, 2017 at 4:06 PM, Nam Nguyen <namn...@chromium.org> wrote:
On Thu, Oct 5, 2017 at 11:27 AM, Mike Frysinger <vap...@chromium.org> wrote:
Chun-ta pointed out that with C++-11, we also now have <regex>.  i guess that would supersede any regex.h usage in C++ code.

so how about:
- for basic C code, use regex.h
- for basic C++ code, <regex>
- if you need more functionality, or speed is critical, use PCRE

​That last point, I think, should favor Google's RE2.

only reason i don't favor RE2 is that no one is using it in CrOS today whereas we have projects using PCRE.  in general, the inertia seems to favor PCRE over RE2.

RE2 is used heavily within Chrome, and <regex> is banned by https://chromium-cpp.appspot.com/. The justification for the latter decision just seems to be the use of other similar libraries in Chrome, though, so I don't know how applicable it should be to Chrome OS.

I do lean towards following Chrome's lead in the absence of strong reasons to the contrary. dev-libs/re2 looks like it's present in portage-stable, although I also don't see anyone using it.
My two cents: According to my personal experience, I'd recommend RE2 than <regex> in C++, ATM.
In some cases (like a large regex), C++ standard <regex> (provided with clang) didn't work (i.e. crashed).
I'm not sure how much we'd like to use regular expression in C/C++, though.

In the code we manage (i.e. platform/ platform2/), we can migrate to use one regex lib technically, if necessary.
Though, it doesn't help for third_party libs using pcre, so if we decide to use RE2, we may need to maintain two regex libs, at least?
 
I'd support using re2 both because of the stability guarantees (https://github.com/google/re2/wiki/WhyRE2) and to avoid diverging from Chrome's style for no reason.

i'm not anti-RE2, but to be clear, we'll still need to support PCRE because things like git, glib, wget, libselinux, and the wider open source community rely on it.  not that we're authoring that code.
-mike

Will Drewry

unread,
Oct 11, 2017, 12:14:16 PM10/11/17
to Mike Frysinger, Jorge Lucangeli Obes, Hidehiko Abe, Daniel Erat, Nam Nguyen, chromium-os-dev
On Fri, Oct 6, 2017 at 5:03 PM, Mike Frysinger <vap...@chromium.org> wrote:
On Fri, Oct 6, 2017 at 10:23 AM, Jorge Lucangeli Obes <jor...@chromium.org> wrote:
On Fri, Oct 6, 2017 at 12:36 AM, Hidehiko Abe <hide...@chromium.org> wrote:
On Fri, Oct 6, 2017 at 10:33 AM, Daniel Erat <de...@chromium.org> wrote:
On Thu, Oct 5, 2017 at 2:28 PM, Mike Frysinger <vap...@chromium.org> wrote:
On Thu, Oct 5, 2017 at 4:06 PM, Nam Nguyen <namn...@chromium.org> wrote:
On Thu, Oct 5, 2017 at 11:27 AM, Mike Frysinger <vap...@chromium.org> wrote:
Chun-ta pointed out that with C++-11, we also now have <regex>.  i guess that would supersede any regex.h usage in C++ code.

so how about:
- for basic C code, use regex.h
- for basic C++ code, <regex>
- if you need more functionality, or speed is critical, use PCRE

​That last point, I think, should favor Google's RE2.

only reason i don't favor RE2 is that no one is using it in CrOS today whereas we have projects using PCRE.  in general, the inertia seems to favor PCRE over RE2.

RE2 is used heavily within Chrome, and <regex> is banned by https://chromium-cpp.appspot.com/. The justification for the latter decision just seems to be the use of other similar libraries in Chrome, though, so I don't know how applicable it should be to Chrome OS.

I do lean towards following Chrome's lead in the absence of strong reasons to the contrary. dev-libs/re2 looks like it's present in portage-stable, although I also don't see anyone using it.

My two cents: According to my personal experience, I'd recommend RE2 than <regex> in C++, ATM.
In some cases (like a large regex), C++ standard <regex> (provided with clang) didn't work (i.e. crashed).
I'm not sure how much we'd like to use regular expression in C/C++, though.

In the code we manage (i.e. platform/ platform2/), we can migrate to use one regex lib technically, if necessary.
Though, it doesn't help for third_party libs using pcre, so if we decide to use RE2, we may need to maintain two regex libs, at least?
 

I'd support using re2 both because of the stability guarantees (https://github.com/google/re2/wiki/WhyRE2) and to avoid diverging from Chrome's style for no reason.

i'm not anti-RE2, but to be clear, we'll still need to support PCRE because things like git, glib, wget, libselinux, and the wider open source community rely on it.  not that we're authoring that code.

In general, PCRE is a security nightmare. If it is used in software, we need to sandbox that software and/or isolate it from attacker-controlled input (be it user supplied, dbus, from stateful, etc).  GNU libc's regex isn't much better, but has a lot less attack surface. Their DFA construction is still pretty sloppy iirc - DoSes are much more common.  In cases of critical code, I'd recommend investing the effort in moving to RE2.

Julius Werner

unread,
Oct 11, 2017, 8:32:47 PM10/11/17
to Will Drewry, Mike Frysinger, Jorge Lucangeli Obes, Hidehiko Abe, Daniel Erat, Nam Nguyen, chromium-os-dev
>> i'm not anti-RE2, but to be clear, we'll still need to support PCRE
>> because things like git, glib, wget, libselinux, and the wider open source
>> community rely on it. not that we're authoring that code.
>
> In general, PCRE is a security nightmare. If it is used in software, we need
> to sandbox that software and/or isolate it from attacker-controlled input
> (be it user supplied, dbus, from stateful, etc). GNU libc's regex isn't
> much better, but has a lot less attack surface. Their DFA construction is
> still pretty sloppy iirc - DoSes are much more common. In cases of critical
> code, I'd recommend investing the effort in moving to RE2.

Are they just completely unsafe in general or can we maybe reduce this
to a list of constructs that need to be avoided? Not every third-party
project we use may be willing to accept a new dependency to a custom
regex library. For example, we use a userspace tool bundled with
coreboot that runs glibc POSIX regexes of the form "\n literals [^\n]*
more literals \n" on an untrusted memory buffer of a few hundred KB at
most... is something that simple something we would need to worry
about? Or are we only talking complicated regexes with multiple
repetitions in the same pattern or certain advanced features here? And
is the concern only DoS, or do they have actual code execution
vulnerabilities?

Will Drewry

unread,
Oct 11, 2017, 9:57:16 PM10/11/17
to Julius Werner, Mike Frysinger, Jorge Lucangeli Obes, Hidehiko Abe, Daniel Erat, Nam Nguyen, chromium-os-dev
Historically, PCRE has had both, but the project is responsive and patches them (thankfully!). The main concern is control over regex patterns -- and a reduced pattern language does make a difference. There is some risk of pathological input to complex, but static patterns too, but smaller, in my experience.

Just given the complexity in PCRE (which is why it is also so alluring :), it is good practice to be careful with its exposure.  I'd need to dig through some older notes and dust off some neurons to make explicit recommendations, but the '\n literals [^\n]*' example is likely fine! 

Not the most helpful, but may a little clarifying?
thanks!
will

Julius Werner

unread,
Oct 12, 2017, 4:09:22 PM10/12/17
to Will Drewry, Julius Werner, Mike Frysinger, Jorge Lucangeli Obes, Hidehiko Abe, Daniel Erat, Nam Nguyen, chromium-os-dev
> Historically, PCRE has had both, but the project is responsive and patches
> them (thankfully!). The main concern is control over regex patterns -- and a
> reduced pattern language does make a difference. There is some risk of
> pathological input to complex, but static patterns too, but smaller, in my
> experience.
>
> Just given the complexity in PCRE (which is why it is also so alluring :),
> it is good practice to be careful with its exposure. I'd need to dig
> through some older notes and dust off some neurons to make explicit
> recommendations, but the '\n literals [^\n]*' example is likely fine!
>
> Not the most helpful, but may a little clarifying?

Thanks, yeah, that's what I thought. I just wanted to confirm that we
don't need to start worrying about every case of PCRE on untrusted
input we have right now. I understand there may be DoS risks for
certain patterns, but depending on where it is used that's not always
that big of a deal.

I don't know of any cases where we're using untrusted input as a
pattern anywhere, that should hopefully be very rare. If I ever see
one I'll make sure to ask for a security review.
Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages