It's easy to add 'invalide' code to .pmc files. E.g. I had defined:
METHOD parent() {
return PMC_pmc_val(SELF) ? PMC_pmc_val(SELF) : PMCNULL;
}
Due to the absence of a return value, the PMC compiler just ignores this
'method' without further notice.
This also is happening, if there's just a whitespace before the '*':
METHOD PMC *parent() {
return PMC_pmc_val(SELF) ? PMC_pmc_val(SELF) : PMCNULL;
}
This totally valid C type declaration is just ignored.
Fixes welcome,
leo
I had a look at this, but I'm not that good at Perl, and regular
expressions. However, I found where things go wrong, so someone who
really groks REs may fix it.
THe problem is (well, at least I think it is) at about line 440 in pmc2c.pl
sub parse_pmc {
my $code = shift;
my $signature_re = qr{
^
(?: #blank spaces and comments and spurious semicolons
[;\n\s]*
(?:/\*.*?\*/)? # C-like comments
)*
(METHOD\s+)? #method flag
(\w+\**) #type <<<==========I'd say this
should be (\w+\s*\**) so it matches a word (the return type), optional
spaces, and then optial *'s to indicate a pointer
\s+
(\w+) #method name
\s*
\( ([^\(]*) \) #parameters
}sx;
If the fix as I noted above is done, things don't compile anymore.
I'm sorry I can't provide a real fix, but at least it's easier to fix
now, hopefully.
A more kludgy fix may be to check whether $type equals "METHOD", if so,
then there is something wrong. And it may be that not everything is
handled by this.
kind regards,
klaas-jan
> I had a look at this, but I'm not that good at Perl, and regular
> expressions. However, I found where things go wrong, so someone who
> really groks REs may fix it.
I'm no Abigail, :-) but I'll try to help.
> THe problem is (well, at least I think it is) at about line 440 in
> pmc2c.pl
>
> sub parse_pmc {
> my $code = shift;
>
> my $signature_re = qr{
> ^
> (?: #blank spaces and comments and spurious
> semicolons
> [;\n\s]*
> (?:/\*.*?\*/)? # C-like comments
> )*
You're asking for multiple instances of something that could be
empty. I don't know if this is problematic, but I suspect it might
cause unnecessary backtracking. I would write: (?: [;\n\s] | (?:/
\*.*?\*/) )*
> (METHOD\s+)? #method flag
>
> (\w+\**) #type <<<==========I'd say
> this should be (\w+\s*\**) so it matches a word (the return type),
> optional spaces, and then optial *'s to indicate a pointer
> \s+
> (\w+) #method name
In the case where there are no '*'s in the text, the pattern '\s*'
eats up all the whitespace so the following '\s+' doesn't match.
Although I don't understand why backtracking wouldn't kick in and
make things match up, albeit inefficiently.
Try writing: ( \w+ (?: \s* \*+ )? )
(Some word characters optionally followed by any whitespace and some
'*'.)
> \s*
> \( ([^\(]*) \) #parameters
> }sx;
>
> If the fix as I noted above is done, things don't compile anymore.
> I'm sorry I can't provide a real fix, but at least it's easier to
> fix now, hopefully.
The real solution would use regular expressions but not rely on them.
I've been reading Higher Order Perl, by Mark Jason Dominus. It has a
chapter on writing parsers which is applicable to this discussion,
and I cannot recommend it highly enough.
Higher Order Perl
http://hop.perl.plover.com/
Josh