dear perl users: I want to write a function that extracts "ordinary"
subroutines from perl code. (an equivalent task is extracting all
macros from a latex file.) I am not trying to be too clever. let's
presume I can recognize subs because subs and only subs always start at
the first character of a line and are not anonymous. a sub is followed
by a name and can contain nested expressions.
I can do plain pattern matching to find the first occurance of the
first sub: '^sub \w+'. but now I am stuck. I need to continue
on with a Text::Balanced expression right after, and after the
text::balanced is done, continue on with my regex search (\G).
my $text=
"
{ text }
expressions
sub a {
if (1==1) { print "true"; } # if string or comment could contain
unbalanced paren, even better
}
more expressions
sub b {
if (0) { print "false" }
}
";
and I want to call a subroutine getnextsub() that first will return
"sub a {\n if (1==1) { print \"true\"; } # if string or
comment could contain unbalanced paren, even better\n }"
and on the next call will return
"sub b {\n if (0) { print \"false\" }\n }";
this must be a fairly common problem (e.g., extracting tex macro
arguments, etc.), but short of mimicking C in writing very low-level
paren counter subroutines on individual characters, I cannot see how to
solve this. I do understand the issue of how to treat nested subs, but
this right now is only a secondary concern.
help appreciated.
sincerely, /iaw
> I want to write a function that extracts "ordinary"
> subroutines from perl code.
Search CPAN on 'balanced' or on 'parse'.
See also
http://search.cpan.org/search?module=PPI
--
Affijn, Ruud
"Gewoon is een tijger."
Here is one way you may use(the iterator way from the book HOP):
#########################
use strict;
use warnings;
my $text= <<'END_TEST';
{ text }
expressions
sub a {
if (1==1) { print "true"; } # if string or comment could contain
unbalanced paren, even better
}
more expressions
sub b {
if (0) { print "false" }
}
safdsf
END_TEST
local our $n;
# pattern to track embedded braces
my $pattern = qr/
(?> (?{$n = 0})
(?:
[^{}]
|
\{ (?{$n++})
|
\} (?(?{$n != 0}) (?{$n--}) | (?!) )
)*
)(?(?{$n != 0})(?!))
/x;
# set the iterator
my $it = getnextsub($text);
my $count = 0;
# loop through the text and print out all functions
while (my $next_sub = $it->()) {
print "Subroutine-", ++$count, " is:\n${next_sub}\n\n";
}
# subroutine to set the iteratior
sub getnextsub {
my $text = shift;
return sub {
my $sub_def;
if ($text =~ s/(sub\s*\S+\s*{$pattern})//) {
$sub_def = $1;
}
$sub_def;
}
}
####################################
BTW. you can make the subroutine "getnextsub" to skip any number of
function definitions.
#################
sub getnextsub {
my $text = shift;
return sub {
my $num_subs = shift || 1;
my ($sub_def, $cnt);
while ($text =~ s/(sub\s*\S+\s*{$pattern})//) {
if (++$cnt == $num_subs) {
$sub_def = $1;
last;
}
}
return $sub_def || "undefined\n";
}
}
################
# if you have subroutine definitions a, b, c, d, e, f
# and in that order, then
$it = getnextsub($text); # set the iterator
$next_sub = $it->(); #get sub a {...}
$next_sub = $it->(3); #get sub d {...}
$next_sub = $it->(); #get sub e {...}
$next_sub = $it->(2); #return "undefined"
$it = getnextsub($text); # reset the iterator
Xicheng :-)
=> > if ($text =~ s/(sub\s*\S+\s*{$pattern})//) {
should change from \s* to \s+, and \w+ is enough to replace \S+
if ($text =~ s/(sub\s+\w+\s*{$pattern})//) {
Xicheng
if ($text =~ s/.*?(sub\s+\w+\s*{$pattern})//) {
Xicheng
regards,
/iaw
You mentioned Text::Balanced; how does extract_codeblock not do what you
want?
Ben
--
The cosmos, at best, is like a rubbish heap scattered at random.
Heraclitus
benm...@tiscali.co.uk
sure you can. the key for this method (you might want to read [1] for
more introduction about iteration) is how to use "closure" in Perl
subroutines. I revised the previous subroutine again and fixed some
bugs. :
1) 's' modifier is added in the s/// expression, otherwise .*? can not
match multiple lines;
2) capture two parts: $1, and $2, and use something like $1 =~ tr/\n//;
to count the number of newlines in a substring.
3) "return" statement is revised so that you can use the iterator in a
while loop;
4) two variables added: $line_num to count newlines in the whole
matched text block. $lineno is the line_number containing the keyword
'sub' of your function declaration...
###################################
sub getnextsub {
my $text = shift;
my $line_num = 0;
return sub {
my $num_subs = shift || 1;
my ($sub_def, $cnt) = ("", 0);
while ($text =~ s/(.*?(sub\s*\S+\s*{$pattern}))//s) {
$line_num += ($1 =~ tr/\n//);
if (++$cnt == $num_subs) {
$sub_def = $2;
my $lineno = $line_num + 1 - ($sub_def =~ tr/\n//);
print "line_number is $lineno\n";
last;
}
}
print "undefined\n" if not $sub_def;
return $sub_def;
}
} # end of getnextsub #
###################################
don't know where you want the line numbers to go, so just print them
out.
Good luck
Xicheng
[1] "Higher-Order Perl: Transforming Programs with Programs", by M.J.
Dominus.