Can anyone please explain the following patterns to me in some
details:
% regsub -all {\d(?=(\d{3})+($|\.))} 1234567.89 {\0,} ==> 1,234,567.89
Or in general:
(?!pattern)
(?=pattern)
(?:pattern)
'\<' and '\>'
-inline
-extended
What is the most easy, quick and efficient RE tutorial that I can
study?
Thanks a lot in advance,
Ahmad
Thanks,
Ahmad
The only way to really learn them is to try out smaller sub-
expressions and then paste them together to get longer ones. You also
have to read the documentation...carefully. There is nothing really
intuitive about the syntax, until you learn it.
{\d(?=(\d{3})+($|\.))} 1234567.89 {\0,}
This basically means look for a string that begins with a decimal,
start the match (?=) where you have multiples of three decimal digits
before you encounter an end ($) or a decimal point (.). The -all
switch enables additional scans. So no commas are substituted until
you have four digits.
I need a TCL oriented one.
Thanks,
Ahmad
On Jul 28, 2:55 pm, cla...@lairds.us (Cameron Laird) wrote:
I occasionally run into regular expressions that I can't figure out.
In the early days, on unix (1970's), these were not too complicated.
However, today, there are several varieties and extensions along with
a whole maze of special escapes.
So.... given that I do need to deal with these at times, I looked for
a tool that could help, and I found one that is NOT FREE AND ONLY runs
on WINDOWS, but is quite good. I took the above and pasted it into
the tool and it explained each component in English.
It says that you start with a single digit 0..9 and then uses positive
lookahead - the (?=....) to insure that this digit is followed by 1
or more groups of 3 digits ending in either a decimal point or the end
of the string. It then replaces ONLY the part that matched before the
lookahead (e.g.. in 1234567 it matches the 1 but requires groups of 3
digits following, in this case the 234 and 567). Then it replaced the
1 with a 1, and the -all then tells it to continue with the 234567 and
repeat this. The 2 and 3 do not match because they are not followed
one or more groups of (exactly) 3 digits, and so it moves on until it
gets to the 4 which does match since it is followed by the 567, a
single group of 3 digits.
The $|\. is an "or" between the end of string and a dot. So, no
substitution is made following the decimal point or it simply ends if
no decimal point is found.
Actually, I like this regular expression, which at a high level, it
takes a decimal number and adds commas in the expected way. It does
change 1234 to 1,234 which often is not done until there are at least
5 digits in a number. But this is a nice way to comma-tize numbers and
is a lot less code that the commas proc I'd been using.
=====================================
WARNING - commercial recommendation follows
=====================================
The tool I use will also let one select portions of the r.e. and right
click and choose help. The documentation is worth the price alone as
it explains in detail with examples what the particular expression
does. For example, the (?= certainly wasn't something I remembered).
It also allowed me to specify that the variety of regular expressions
I was using is Tcl ARE. And it has a template facility that will write
fragments of tcl code to use in a program.
The tool's name is regexbuddy and while it has no free evaluation, it
has a no questions asked refund within the first 90 days.
I have no affiliation with the author other than I have purchased
several of his programs. Apologies if this forum frowns on suggestions
that lead to commercial products.
'Best I can tell, this forum smiles on working code and
anything that helps Tcl programmers. Thank you for shar-
ing your experience.
You can fix that with a small change:
{\d\d(?=(\d{3})+($|\.))
I've only used a visualizer once, and although they seem useful for
explaining how a regular expression works, they don't explain the ones
that don't work (obviously). But I don't remember if they show the sub-
expression matches, which would help a lot in debugging one that isn't
working as desired. Another problem is matching too much, or failing
in certain cases.
This is why I consider regular expressions as mini-programs which
tell the regular expression engine what to do. As such, it is
important to go out of your way to document how they work, or don't
work, because they also fall into the category of brittle code. I
don't think the documentation should try to explain things like look-
aheads, replacement strings, etc. This is covered in the man pages or
tutorials. In the example case, not much documentation would be
required, maybe name the regular expression "commify_decimal".
One simple documentation technique that I use is to separately define
the "atoms" or sub-expressions and then combine them into one big
expression, something with a more useful variable name than "re". That
technique doesn't really apply as well to [regsub].