Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

regexp problem

8 views
Skip to first unread message

arnon...@gmail.com

unread,
Dec 13, 2005, 6:26:15 AM12/13/05
to
Hi

I am trying to parse a verilog (Hardware description language) file
using TCL,
and I have some problems with an implementation of a non-greedy regexp:

Here's the string I am looking in:

set a {module a
...
inst1 abc
...
endmodule


module b
...
inst2 xyz
...
endmodule
}


I would like to find the name of the module containing an instance of
type
"inst2", called "xyz". For that I use the following non-greedy regexp
(in
this example I am trying to get module b as an answer):

regexp "module.*?inst2 xyz" $a match

What I get in $match, is the following:


module a
...
inst1 abc
...
endmodule


module b
...
inst2 xyz

Even though I used a non greedy quantifier, I got a greedy match:
instead of
returning "module b" as the start of the match, I get "module a". I had
sucess implementing non-greedy regexps before. This one simply doesn't
work.
I wonder what am I doing wrong.

arnon

Arjen Markus

unread,
Dec 13, 2005, 8:31:17 AM12/13/05
to
I think you should not use a regexp for this.

Why not:

foreach line [split $a "\n"] {
if { [lindex $line 0] == "module" } {
set module_name [lindex $line 1]
}
if { [string trim $line] == "inst2 xyz" } {
break
}
}
puts "Module: $module_name"

It may not be as compact as a regexp, but IMHO it is clearer
Regards,

Arjen

arnon...@gmail.com

unread,
Dec 13, 2005, 9:12:34 AM12/13/05
to
Thanks, but the example I gave above is quite simple. In real life I
would have files with thousands of lines. The solution you gave would
make things quite slow, as the regexp engine is faster. Is it possible
there is still a bug in TCL regexp engine ?

arnon

suchenwi

unread,
Dec 13, 2005, 9:39:12 AM12/13/05
to
Even though you use a non-greedy regexp, the "first" part of the
(greedy) ditty "first, then longest" still seems to apply. I tried this
simpler example:

% regexp -inline {a.*?b} acabad
acab

"ab" alone would have been a match too, but "acab" is encountered
first...

arnon...@gmail.com

unread,
Dec 13, 2005, 9:48:33 AM12/13/05
to
OK. Now that I know whats wrong, Is there a way to bypass it using
regexp ?

Bruce Hartweg

unread,
Dec 13, 2005, 2:05:23 PM12/13/05
to

arnon...@gmail.com wrote:
> OK. Now that I know whats wrong, Is there a way to bypass it using
> regexp ?
>

yeah, add .* at the beginning to eat up the extra stuff before the section you care about
(this assumes there is only one secrtion you care about - if you want to match multiple ones
then you need to adjust your RE and use the inline and all options to loop thru the matches

Bruce


arnon...@gmail.com

unread,
Dec 13, 2005, 2:52:12 PM12/13/05
to
thanks bruce, but I don't think I follow you . Can you please be more
specific, and give the exact regexp ?

arnon

Donald Arseneau

unread,
Dec 14, 2005, 1:26:00 AM12/14/05
to
arnon...@gmail.com writes:

> thanks bruce, but I don't think I follow you . Can you please be more
> specific, and give the exact regexp ?

Hey that's good Bruce!

For the exact regexp, instead of

regexp "module.*?inst2 xyz" $a match

use

regexp ".*(module.*?inst2 xyz)" $a whole match

--
Donald Arseneau as...@triumf.ca

Bruce Hartweg

unread,
Dec 14, 2005, 1:57:47 PM12/14/05
to

OK, sorry to be terse,

if there is only one match in the whole thing then

regexp ".*module\s(\S+).*?inst2\sxyz" $input ignoreThis moduleName

will give you the name

and if there are multiple matches, then I would use

set names {}
foreach {module name marker} [regexp -all -inline "module\s(\S+).*?(inst2 xyz)?endmodule"] {
if {$marker ne ""} {
lappend names $name
}
}

to get the list of names.

Bruce

0 new messages