Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Looking for recommended approach to using Regular Expressions.

41 views
Skip to first unread message

Cameron_C

unread,
Jun 22, 2010, 11:56:12 AM6/22/10
to
Hello again folks,
I am working through things in my application, and I would appreciate any
recommended approaches to using Regular Expressions in my code.
This is an MFC application. I am using Visual Studio 2008 pro.
I have read about references to a "boost" library, and I have read
references to an ATL regex class, and I have read something about Regular
Expressions being included in SP1 of VS2008.
I wnat to use regular expressions to edit telephone numbers, and Postal
Codes, and dollar amount fields.

Anyway, if anyone has any experience with any of the above, and can offer
some advice or recommendation, I would appreciate it.

Joseph M. Newcomer

unread,
Jun 22, 2010, 12:16:51 PM6/22/10
to
Regular expressions are most powerful when you are doing either searches for a match or a
set of edits that are characterized by a simple "programmattic" replacement of many, many
instances. While I can certainly see how a regexp might help find things in a directory
of people and telephone numbers, I'm not at all sure how it would help in
editing-by-way-of-regexp.
joe

Joseph M. Newcomer [MVP]
email: newc...@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Giovanni Dicanio

unread,
Jun 22, 2010, 12:27:07 PM6/22/10
to
On 22/06/2010 17:56, Cameron_C wrote:

> I have read about references to a "boost" library, and I have read
> references to an ATL regex class, and I have read something about Regular
> Expressions being included in SP1 of VS2008.

The ATL regular expression class is CAtlRegEx, and it was available in
VC8 (i.e. VS2005), too:

http://msdn.microsoft.com/en-us/library/k3zs4axe(VS.80).aspx

I don't know about boost's regex, but it is correct that with the C++
Feature Pack (and then SP1) for VS2008 a regular expression template
well integrated with STL was offered (this is available in VS2010 as well):

http://msdn.microsoft.com/en-us/library/bb982727.aspx
http://msdn.microsoft.com/en-us/library/bb982382.aspx


If you plan to write portable C++ code you may want to consider the
TR1's regex engine. If you don't like STL style and are not interested
in multiplatform C++ code, you may consider the CAtlRegEx class instead.

Note that TR1's regex engine uses C++ exceptions (e.g. regex_error class
is thrown in some cases), instead ATL's CAtlRegEx tends to use error
return codes (like BOOLeans). So if your programming style tends to
prefer error codes instead of exceptions you may want to choose CAtlRegEx.


HTH,
Giovanni

Pete Delgado

unread,
Jun 22, 2010, 12:26:45 PM6/22/10
to

"Cameron_C" <Came...@discussions.microsoft.com> wrote in message
news:54293FD0-0333-40D0...@microsoft.com...

Any of the above approaches that you have mentioned will work, however as
with any engineering decision, there are tradeoffs. I would recommend
against using the ATL regular expression library because it uses a
non-standard syntax and is fairly limited in its abilities.If you can handle
the limitations of the library and are willing to work with the peculiar
regular expression syntax for the library, it will probably work as well.

I personally prefer to use the std::tr1 regular expression library, but I
have used the boost regular expression library as well. A quick look on the
net shows more examples of usage for the boost libraries than for the tr1
libraries so this may be a determening factor for you!

-Pete

TR1
http://www.johndcook.com/cpp_regex.html
http://msdn.microsoft.com/en-us/library/bb982727.aspx


Boost
http://www.boost.org/doc/libs/1_43_0/libs/regex/doc/html/index.html
http://onlamp.com/pub/a/onlamp/2006/04/06/boostregex.html


Pete Delgado

unread,
Jun 22, 2010, 12:43:43 PM6/22/10
to

"Giovanni Dicanio" <giovanniD...@REMOVEMEgmail.com> wrote in message
news:%23IDkFfi...@TK2MSFTNGP02.phx.gbl...


> On 22/06/2010 17:56, Cameron_C wrote:
>
>> I have read about references to a "boost" library, and I have read
>> references to an ATL regex class, and I have read something about Regular
>> Expressions being included in SP1 of VS2008.
>
> The ATL regular expression class is CAtlRegEx, and it was available in VC8
> (i.e. VS2005), too:
>
> http://msdn.microsoft.com/en-us/library/k3zs4axe(VS.80).aspx
>
> I don't know about boost's regex, but it is correct that with the C++
> Feature Pack (and then SP1) for VS2008 a regular expression template well
> integrated with STL was offered (this is available in VS2010 as well):
>
> http://msdn.microsoft.com/en-us/library/bb982727.aspx
> http://msdn.microsoft.com/en-us/library/bb982382.aspx
>
>
> If you plan to write portable C++ code you may want to consider the TR1's
> regex engine. If you don't like STL style and are not interested in
> multiplatform C++ code, you may consider the CAtlRegEx class instead.

The boost library implementation is as portable as the stl version...

>
> Note that TR1's regex engine uses C++ exceptions (e.g. regex_error class
> is thrown in some cases), instead ATL's CAtlRegEx tends to use error
> return codes (like BOOLeans). So if your programming style tends to prefer
> error codes instead of exceptions you may want to choose CAtlRegEx.

CAtlRegEx uses a limited, non-standard syntax while both boost and the TR1
regular expression libraries support a variety of standard regular
expression syntaxes. Personally, I don't think I will ever use the ATL
regular expression library again because of its limitations.

-Pete


Cameron_C

unread,
Jun 22, 2010, 1:33:28 PM6/22/10
to

Is there any performance differences among the rainbow of choices?
Or all about the same?
I am guessing that the Feature Pack implementation would be the best choice
of direction overall, since it is incorporated into the MFC framework?

"Pete Delgado" wrote:

> .
>

Pete Delgado

unread,
Jun 22, 2010, 3:59:07 PM6/22/10
to

"Cameron_C" <Came...@discussions.microsoft.com> wrote in message
news:1C7F2250-657A-429E...@microsoft.com...

>
> Is there any performance differences among the rainbow of choices?
> Or all about the same?
> I am guessing that the Feature Pack implementation would be the best
> choice
> of direction overall, since it is incorporated into the MFC framework?
>

I've never run a performance test on any of the packages since for my needs,
any of the packages would have been "fast enough". Unless you are doing some
heavy text processing as the primary function of your application, it will
likely be the same for you!

The "feature pack" implementation was not a part of MFC but rather is an
implementation of the C++ TR1 addition to the ISO 2003 C++ standard. The
two were simply delivered together.

Why not take a look at all three libraries and see which one you find to be
easiest in both syntax and usage? The library that you are comfortable using
is perhaps the best choice with all else being equal!

-Pete


Joseph M. Newcomer

unread,
Jun 22, 2010, 5:13:23 PM6/22/10
to
See below...

On Tue, 22 Jun 2010 12:26:45 -0400, "Pete Delgado" <Peter....@NoSpam.com> wrote:

>
>"Cameron_C" <Came...@discussions.microsoft.com> wrote in message
>news:54293FD0-0333-40D0...@microsoft.com...
>> Hello again folks,
>> I am working through things in my application, and I would appreciate any
>> recommended approaches to using Regular Expressions in my code.
>> This is an MFC application. I am using Visual Studio 2008 pro.
>> I have read about references to a "boost" library, and I have read
>> references to an ATL regex class, and I have read something about Regular
>> Expressions being included in SP1 of VS2008.
>> I wnat to use regular expressions to edit telephone numbers, and Postal
>> Codes, and dollar amount fields.
>>
>> Anyway, if anyone has any experience with any of the above, and can offer
>> some advice or recommendation, I would appreciate it.
>>
>
>Any of the above approaches that you have mentioned will work, however as
>with any engineering decision, there are tradeoffs. I would recommend
>against using the ATL regular expression library because it uses a
>non-standard syntax

****
With decades of regular expression tradition, it continues to amaze me that someone could
be STUPID enough to invent a nonstandard syntax and think it could possibly make sense. I
do not use the library for the same reason!
****


>and is fairly limited in its abilities.If you can handle
>the limitations of the library and are willing to work with the peculiar
>regular expression syntax for the library, it will probably work as well.
>
>I personally prefer to use the std::tr1 regular expression library, but I
>have used the boost regular expression library as well. A quick look on the
>net shows more examples of usage for the boost libraries than for the tr1
>libraries so this may be a determening factor for you!

****
There are several variants of the FreeBSD regular expression library out there, as well; I
don't know if boost uses one of them. But I have my own adaptation of the FreeBSD library
and I use it, because it uses the historically established regex syntax that everyone
already knows.
joe
****

Joseph M. Newcomer

unread,
Jun 22, 2010, 5:16:30 PM6/22/10
to
See below...

On Tue, 22 Jun 2010 10:33:28 -0700, Cameron_C <Came...@discussions.microsoft.com> wrote:

>
>Is there any performance differences among the rainbow of choices?

****
Why do you think it matters? How many millions of matches are you going to need to make
for each regexp? (Note: if you cannot express it in terms of integer multiples of
millions, the performance probably won't matter)
****


>Or all about the same?
>I am guessing that the Feature Pack implementation would be the best choice
>of direction overall, since it is incorporated into the MFC framework?

****
Avoid anything nonstandard. So the TR1 design (which probably involved intelligent,
thinking human beings) should be a reasonable choice.
joe
****

Giovanni Dicanio

unread,
Jun 22, 2010, 5:44:34 PM6/22/10
to
On 22/06/2010 23:16, Joseph M. Newcomer wrote:

> Avoid anything nonstandard. So the TR1 design (which probably involved intelligent,
> thinking human beings) should be a reasonable choice.

IMHO, if the OP doesn't have exception-safe code, he may consider the
ATL engine, which uses error codes instead of C++ exceptions to signal
error condition (the TR1 engine uses exceptions instead and is highly
integrated with STL).

I think it's not what is best or worst in absolute, but what is best or
worst in the given context of existing OP's code.

I agree with Joe about the performance stuff (probably, considering that
the ATL engine is more lightweight than the TR1, it could be more
efficient... but as already noted, does it really make a difference in
working code? How many matches/second are requested?? :)

Giovanni

Cameron_C

unread,
Jun 23, 2010, 12:30:40 PM6/23/10
to

Everyone, thanks for the feedback.
What I really "needed" to hear was that performance should only be factored
in, if I am considering a heavy weight implementation of the Regex facilities.
To clarify thiings a bit, I only want to edit a few input fields in a
dialog. Maybe fifteen or twenty in a window.
I want to ensure proper formatting of telephone numbers (999-999-9999),
Canadian Postal Codes (A9A 9A9), and dollar amount fields ($99999.99). If the
fields do not appear to be formatted correctly, I pop up an error message.
I realize this is small stuff in the overall scheme of things.
Maybe I am being lazy here. It seemed to me to be a perfect fit for the
Regex functionality.

Pete Delgado

unread,
Jun 23, 2010, 12:55:32 PM6/23/10
to

"Cameron_C" <Came...@discussions.microsoft.com> wrote in message
news:671FE8FD-E3B0-492F...@microsoft.com...

> Everyone, thanks for the feedback.
> What I really "needed" to hear was that performance should only be
> factored
> in, if I am considering a heavy weight implementation of the Regex
> facilities.
> To clarify thiings a bit, I only want to edit a few input fields in a
> dialog. Maybe fifteen or twenty in a window.
> I want to ensure proper formatting of telephone numbers (999-999-9999),
> Canadian Postal Codes (A9A 9A9), and dollar amount fields ($99999.99). If
> the
> fields do not appear to be formatted correctly, I pop up an error message.
> I realize this is small stuff in the overall scheme of things.
> Maybe I am being lazy here. It seemed to me to be a perfect fit for the
> Regex functionality.

That is a perfectly legitimate use of regular expressions. I think what most
people were simply saying was that you should not let percieved performance
guide your decision on *which* library to use because it was unlikely that
you were using the library heavily enough to make performance an issue.Your
problem description shows that this was indeed the case.

If your purpose is to validate input fields on a form, assuming that you are
taking into account I8N issues, perhaps custom controls for the fields would
be appropriate. If you perform validation at the control level, you can
design the controls in such a way as to eliminate error message popups
because the controls will only accept valid inputs.

Joe has an example of a validating edit control on his site and there are
many others out there as well. Here's a link to Joe's example and one that
is on MSDN.

http://www.flounder.com/validating_edit_control.htm
http://msdn.microsoft.com/en-us/magazine/cc300635.aspx

-Pete


Joseph M. Newcomer

unread,
Jun 23, 2010, 2:15:01 PM6/23/10
to
See my Validating Edit Control on my MVP Tips site.

I consider the "pop up a dialog box if the field is invalid" to be one of the worst human
interfaces ever conceived. I believe in "real-time" validation, and the OK button is not
even *enabled* if every control doesn't have valid data in it.

I first developed the edit control for a client who had a highly-stylized part number,
sort of along the lines of ABC-1234X where there were three letters, a hyphen, 3 or 4
digits, and a possible letter following (which turned out to be a color designator).

When I delivered it, a number of data entry people complained about the changing colors,
and asked that they be removed. So I did. And all the *rest* of the data entry people
complained that this incredibly useful feature had been removed! So we made it a
user-selectable option.

What I did was do validation every time I got an EN_CHANGE notification, by parsing the
text. If it parsed as valid, the background went (very light) green. If it was
syntactically illegal, the background went (very light) red. If the input was incomplete,
the background went (very light) yellow.

If data is invalid and the OK button is enabled, I consider that the dialog is defective
in its basic design.

I've used a number of techniques to handle validation. In addition to the validating edit
control changing colors, I've used tooltips over the control to tell why ("incomplete part
number" and several variants of this; "invalid part number" complete with an explanation),
using a CStatic which is normally invisible but if there is an error displays the error
text with yellow letters on a red background, and having a tooltip display if you hover
over the (disabled) OK button. In one complex dialog, a listbox would appear listing
every error in every control.

Never do validation in the OnKillFocus handler. This is bad taste, and will lead to a
real nightmare. For example, if you try to pop up a dialog box in an OnKillFocus handler,
you are in very deep trouble (it used to just lock up the entire app! I don't know if
this has been fixed).

Dialog boxes to report errors are SO 1980s! Users won't tell you this unless you ask
them, but they find them offensive.

Note that a regexp can give you are "correct" or "not correct" result, but not an
"incomplete" result. Fortunately, the cases you are describing are truly trivial variants
of my validating edit control, and what I would do is rip out the floating-point
validation code from my example and use a virtual method, then subclass the dialogs for
"SSN", "Dollars", and so on, and put the validating code in the overridden virtual method
of each subclass.

Alternatively, you can used a "masked edit control" from a third-party vendor; in these
controls, if you type the first three digits of an SSN, it inserts the hyphen for you.
Don't Try This At Home. They are not easy to write (I've done a couple, and consider them
not worth the effort to re-create from scratch, when you can buy them cheaper)

Given how trivial the validation code is, I'd consider a regexp to be overkill for
technology and also inadequate because it can't tell partially-correct results
joe

Joseph M. Newcomer

unread,
Jun 23, 2010, 2:17:09 PM6/23/10
to
But it is so trivial to write an FSM parser; for the cases cited, I can write an FSM
parser as fast as I can type. A regexp is technological overkill, particularly because
the flexibility of a programmable pattern is not required!
joe

Pete Delgado

unread,
Jun 23, 2010, 3:25:22 PM6/23/10
to

"Joseph M. Newcomer" <newc...@flounder.com> wrote in message
news:cqj426tf8a7iiggmr...@4ax.com...

> But it is so trivial to write an FSM parser; for the cases cited, I can
> write an FSM
> parser as fast as I can type. A regexp is technological overkill,
> particularly because
> the flexibility of a programmable pattern is not required!
> joe

Joe,
If you take into account internationalization and the differences in format
pattern required for the fields cited, the FSM approach becomes more complex
than using one of the regular expression libraries. Of course, I'm assuming
that the author is familiar with writing validating regular expressions
because it is just as easy to write a buggy expression as it is to write
buggy code!

However, if this particular piece of code will *never* be used anywhere else
except where the author intends, then I see the wisdom in your approach and
completely agree with your conclusion that a regexp is overkill and cannot
handle incomplete data cases while your method can.

-Pete


Joseph M. Newcomer

unread,
Jun 23, 2010, 3:33:14 PM6/23/10
to
See below...

On Wed, 23 Jun 2010 15:25:22 -0400, "Pete Delgado" <Peter....@NoSpam.com> wrote:

>
>"Joseph M. Newcomer" <newc...@flounder.com> wrote in message
>news:cqj426tf8a7iiggmr...@4ax.com...
>> But it is so trivial to write an FSM parser; for the cases cited, I can
>> write an FSM
>> parser as fast as I can type. A regexp is technological overkill,
>> particularly because
>> the flexibility of a programmable pattern is not required!
>> joe
>
>Joe,
>If you take into account internationalization and the differences in format
>pattern required for the fields cited, the FSM approach becomes more complex
>than using one of the regular expression libraries. Of course, I'm assuming
>that the author is familiar with writing validating regular expressions
>because it is just as easy to write a buggy expression as it is to write
>buggy code!

****
And that means someone has to translate the date-time format of the default user locale to
a regexp...
joe
****


>
>However, if this particular piece of code will *never* be used anywhere else
>except where the author intends, then I see the wisdom in your approach and
>completely agree with your conclusion that a regexp is overkill and cannot
>handle incomplete data cases while your method can.
>
>-Pete
>

Jeff Flinn

unread,
Jun 24, 2010, 8:00:36 AM6/24/10
to

Joseph M. Newcomer wrote:
> See below...
> On Tue, 22 Jun 2010 10:33:28 -0700, Cameron_C <Came...@discussions.microsoft.com> wrote:
>
>> Is there any performance differences among the rainbow of choices?
> ****
> Why do you think it matters? How many millions of matches are you going to need to make
> for each regexp? (Note: if you cannot express it in terms of integer multiples of
> millions, the performance probably won't matter)
> ****
>> Or all about the same?
>> I am guessing that the Feature Pack implementation would be the best choice
>> of direction overall, since it is incorporated into the MFC framework?
> ****
> Avoid anything nonstandard. So the TR1 design (which probably involved intelligent,
> thinking human beings) should be a reasonable choice.

The C++ TR1 design was driven by John Maddock's boost RegEx lib, in fact
there is a TR1 lib in boost as well that contains . Yes, he's an
intelligent thinking human. Each compiler manufacturer provides it's own
implementation, so I'm not sure who developed MS's implementation.

Boost also has the Xpressive lib developed by Eric Niebler who developed
Greta, a regex engine, while at Microsoft. Xpressive has both dynamic
and static regex engines. The latter is implemented as a Domain Specific
Embedded Language. This allows you to have your regular expression
evaluated at compile time.

There was a recent thread on the boost developer mailing list concerning
regex performance. They mentioned iregexp (sp?) from google as a top
performer, along with xpressive and regex.

Also the OP might look to see if the boost spirit parser library might
also address his needs.

Jeff

0 new messages