Can anybody explain this patterns (RegExp) to me?

Ahmad

unread,

Jul 27, 2009, 11:26:44 PM7/27/09

to

Hi,

Can anyone please explain the following patterns to me in some
details:

% regsub -all {\d(?=(\d{3})+($|\.))} 1234567.89 {\0,} ==> 1,234,567.89

Or in general:

(?!pattern)

(?=pattern)

(?:pattern)

'\<' and '\>'

-inline

-extended

What is the most easy, quick and efficient RE tutorial that I can
study?

Thanks a lot in advance,
Ahmad

slebetman

unread,

Jul 27, 2009, 11:44:58 PM7/27/09

to

On Jul 28, 11:26 am, Ahmad <ahmad.abdulgh...@gmail.com> wrote:
> Hi,
>
> Can anyone please explain the following patterns to me in some
> details:
>
> % regsub -all {\d(?=(\d{3})+($|\.))} 1234567.89 {\0,} ==> 1,234,567.89
>
> Or in general:
>
> (?!pattern)
>
> (?=pattern)
>
> (?:pattern)
>
> '\<' and '\>'
>
> -inline
>
> -extended
>

http://www.tcl.tk/man/tcl8.5/TclCmd/re_syntax.htm

Ahmad

unread,

Jul 27, 2009, 11:58:24 PM7/27/09

to

Thanks for this. I would like to have one with more extensive
explanation/examples.

Thanks,
Ahmad

tom.rmadilo

unread,

Jul 28, 2009, 1:37:08 AM7/28/09

to

On Jul 27, 8:58 pm, Ahmad <ahmad.abdulgh...@gmail.com> wrote:
> Thanks for this. I would like to have one with more extensive
> explanation/examples.

The only way to really learn them is to try out smaller sub-
expressions and then paste them together to get longer ones. You also
have to read the documentation...carefully. There is nothing really
intuitive about the syntax, until you learn it.

{\d(?=(\d{3})+($|\.))} 1234567.89 {\0,}

This basically means look for a string that begins with a decimal,
start the match (?=) where you have multiples of three decimal digits
before you encounter an end ($) or a decimal point (.). The -all
switch enables additional scans. So no commas are substituted until
you have four digits.

Cameron Laird

unread,

Jul 28, 2009, 7:55:34 AM7/28/09

to

In article <01402718-81a4-4939...@c2g2000yqi.googlegroups.com>,
Ahmad <ahmad.ab...@gmail.com> wrote:
.
.
.

>What is the most easy, quick and efficient RE tutorial that I can
>study?

.
.
.
'Depends on your personal style; 'least, that's the message of
<URL: http://regularexpressions.com/#what_is_is >: do you want
a Tcl-oriented tutorial? One that's Web-based? Emphasizes
abstract theory? All this, and more, are available.

Ahmad

unread,

Jul 29, 2009, 5:11:16 AM7/29/09

to

> do you wanta Tcl-oriented tutorial? One that's Web-based?

> Emphasizes abstract theory? All this, and more, are available.

I need a TCL oriented one.

Thanks,
Ahmad

On Jul 28, 2:55 pm, cla...@lairds.us (Cameron Laird) wrote:

Cameron Laird

unread,

Jul 29, 2009, 8:16:28 AM7/29/09

to

In article <00a3630b-21ea-4532...@c29g2000yqd.googlegroups.com>,

Ahmad <ahmad.ab...@gmail.com> wrote:
>> do you wanta Tcl-oriented tutorial? One that's Web-based?
>> Emphasizes abstract theory? All this, and more, are available.
>
>I need a TCL oriented one.

.
.
.

>> <URL:http://regularexpressions.com/#what_is_is>: �do you want

.
.
.
I think you'll want to read the section already
mentioned above; beyond that, though, Bill Poser's
redet <URL: http://wiki.tcl.tk/14566 > and Doulos'
trev <URL: http://www.doulos.com/knowhow/tcltk/examples/trev/ >
are definite winners. Rx Toolkit <URL:
http://docs.activestate.com/komodo/4.4/regex.html >
has a lot to like, too.

rocket777

unread,

Jul 29, 2009, 12:42:24 PM7/29/09

to

I occasionally run into regular expressions that I can't figure out.
In the early days, on unix (1970's), these were not too complicated.
However, today, there are several varieties and extensions along with
a whole maze of special escapes.

So.... given that I do need to deal with these at times, I looked for
a tool that could help, and I found one that is NOT FREE AND ONLY runs
on WINDOWS, but is quite good. I took the above and pasted it into
the tool and it explained each component in English.

It says that you start with a single digit 0..9 and then uses positive
lookahead - the (?=....) to insure that this digit is followed by 1
or more groups of 3 digits ending in either a decimal point or the end
of the string. It then replaces ONLY the part that matched before the
lookahead (e.g.. in 1234567 it matches the 1 but requires groups of 3
digits following, in this case the 234 and 567). Then it replaced the
1 with a 1, and the -all then tells it to continue with the 234567 and
repeat this. The 2 and 3 do not match because they are not followed
one or more groups of (exactly) 3 digits, and so it moves on until it
gets to the 4 which does match since it is followed by the 567, a
single group of 3 digits.

The $|\. is an "or" between the end of string and a dot. So, no
substitution is made following the decimal point or it simply ends if
no decimal point is found.

Actually, I like this regular expression, which at a high level, it
takes a decimal number and adds commas in the expected way. It does
change 1234 to 1,234 which often is not done until there are at least
5 digits in a number. But this is a nice way to comma-tize numbers and
is a lot less code that the commas proc I'd been using.

=====================================
WARNING - commercial recommendation follows
=====================================

The tool I use will also let one select portions of the r.e. and right
click and choose help. The documentation is worth the price alone as
it explains in detail with examples what the particular expression
does. For example, the (?= certainly wasn't something I remembered).
It also allowed me to specify that the variety of regular expressions
I was using is Tcl ARE. And it has a template facility that will write
fragments of tcl code to use in a program.

The tool's name is regexbuddy and while it has no free evaluation, it
has a no questions asked refund within the first 90 days.

I have no affiliation with the author other than I have purchased
several of his programs. Apologies if this forum frowns on suggestions
that lead to commercial products.

Cameron Laird

unread,

Jul 29, 2009, 2:22:29 PM7/29/09

to

In article <c36cb07c-eb60-4ecf...@l5g2000pra.googlegroups.com>,
rocket777 <google...@rocketship1.biz> wrote:
.
.
.

>several of his programs. Apologies if this forum frowns on suggestions
>that lead to commercial products.
>

'Best I can tell, this forum smiles on working code and
anything that helps Tcl programmers. Thank you for shar-
ing your experience.

tom.rmadilo

unread,

Jul 29, 2009, 3:10:47 PM7/29/09

to

On Jul 29, 9:42 am, rocket777 <googlegro...@rocketship1.biz> wrote:
> Actually, I like this regular expression, which at a high level, it
> takes a decimal number and adds commas in the expected way. It does
> change 1234 to 1,234 which often is not done until there are at least
> 5 digits in a number.

You can fix that with a small change:

{\d\d(?=(\d{3})+($|\.))

I've only used a visualizer once, and although they seem useful for
explaining how a regular expression works, they don't explain the ones
that don't work (obviously). But I don't remember if they show the sub-
expression matches, which would help a lot in debugging one that isn't
working as desired. Another problem is matching too much, or failing
in certain cases.

This is why I consider regular expressions as mini-programs which
tell the regular expression engine what to do. As such, it is
important to go out of your way to document how they work, or don't
work, because they also fall into the category of brittle code. I
don't think the documentation should try to explain things like look-
aheads, replacement strings, etc. This is covered in the man pages or
tutorials. In the example case, not much documentation would be
required, maybe name the regular expression "commify_decimal".

One simple documentation technique that I use is to separately define
the "atoms" or sub-expressions and then combine them into one big
expression, something with a more useful variable name than "re". That
technique doesn't really apply as well to [regsub].