Anyone who wanted to have a short at doing any kind of mass-processing
of Perl 5 code to produce docs or pugs AST trees or what have you should
be able to safely build on top of PPI now.
To summarize, "Parsing and Analyzing are getting quite usable. I'll get
back to you later on Manipulating".
Also, I saw another mention recently (possibly on TPF request for
donations) about the Perl 5 to Perl 6 converter, and it being 40%
completed? ... Larry?
Is anybody working on it? If it's built on something other than PPI, is
there anything I can see, so I can steal any parsing tricks I don't know
of yet. :)
Well, by one reckoning it's 0% done. At the moment I'm just working
on a Perl 5 to Perl 5 converter. When I get that right, I can start
on the Perl 6 converter. It's quite likely I've already done 40%
of the work, though.
: Is anybody working on it? If it's built on something other than PPI, is
: there anything I can see, so I can steal any parsing tricks I don't know
: of yet. :)
Er, I'm not sure you will want to--I'm using PPI's evil twin brother,
"PPD" (the actual Perl parser). I've just modified it so it doesn't
forget anything I want it to remember. (As you know, the standard
parser throws away gobs of useful information, everything from
whitespace and comments to pruned opcode subtrees. I have a version
that doesn't do that, by and large, though I'm still finding fiddly
spots.) When it's done parsing, it can spit out all the information
in rough-and-ready XML. From there I've currently got two passes, one
to turn the XML into an AST very much like what PPI uses, and another
to turn the AST into whatever the target language is, Perl 5 for now.
When I can run any (un-source-filtered) Perl 5 program through it and
get the exact same program out, including whitespace and comments,
then I'll know I have a good platform for translation to Perl 6.
Essentially I'll be 80% done at that point, and ready for the next 80%.
Doing a literal translation to Perl 6 won't be terribly hard, except
where we've actually removed things that were in Perl 5, but we can
always emulate any features that are missing, probably pulling them
out of some EVIL::PERL5::EMULATION module to discourage people from
using the emulations in new code.
Source filters are always going to be a problem, but if we translate
the underlying standard Perl 5 code (which is what my current setup
will do), it should at least run, if not produce good looking code.
We can look at the existing source filters on a case-by-case basis
and perhaps install recognizers that refactor the ugly code back into
something more like the pre-source-filtered code. Most of the hard
work in the second 80% is going to be in figuring out how (and whether)
to refactor Perl 5 idioms into Perl 6 idioms, particularly for OO stuff.
We can probably arrange not to duplicate the back-end refactoring
work, if we can get our respective ASTs to line up. But I do think
the front end of the "standard" translator must be based on the actual
parser they're using. I'm sure there's plenty of room for alternate
approaches, though. There might be classes of lightly-source-filtered
programs that PPI would translate better than what I'm working on.
But what I've got isn't ready to release even in preliminary form.
I'm still tweaking too many things in parallel along the whole chain,
and the design is still doing cartwheels occasionally. Anyone else
working on it would have to have an interest extending from the
insanity of Perl 5 parser internals all the way to various deep Perl 6
design issues, and I don't think anyone else besides me has the
requisite multiple personality disorder.
Hmm, that sounds like I'll never release it till it's perfect.
It's really only the p5-to-p5 part that has to be close-to-perfect,
but it's a really easy target to test for. I figure the translator
can go alpha when it can do the wooden literal translation, and we
can then do all the refactoring work in parallel. Or I might figure
out an earlier point where I could use people's help. Right now
it would just be a distraction, though.
Unfortunately, I don't have a lot of spare time to work on it these
days because of having to put bread on the table. And quite apart from
the bread, I am deeply indebted to many people for letting me design
Perl 6 for the last several years--but I mean that all too literally.
As the bumper sticker says: I owe, I owe, so off to work I go... :-)
So I'm presuming that you don't intend this as a tool that can do mass
porting of code (due to the dependency issues), but rather as something
for helping individual module authors port individual files/modules.
Also curious how you handle BEGIN and friends... I take they are
executed and then pruned, and end up unpruned in your XML?
Also curious if you have managed to keep comments, POD etc...
With the existence of Ponie, my hope is that people can port things
piecemeal and retest for regressions at every stage along the way,
presuming they have something that actually has regression tests.
I think "translate everything and hope for the best" is a recipe
for disaster on any project larger than one person's head.
That being said, there's nothing that says the translator has to
support only one kind of output, which means there's no reason
you can't have some kind of overall policy driving the individual
translations, so I don't see why dependency mapping should be a
big problem. It just forces your translation granularity to chunks
of modules that require the same support, when that support is of a
nature that can't be split between Ponie and Perl 6. Only in the limit
does that mean you have to translate everything all at once, and you'd
still probably want some kind of overall policy file to control it,
if only so you can tweak it and try the whole mess some other way.
: Also curious how you handle BEGIN and friends... I take they are
: executed and then pruned, and end up unpruned in your XML?
I just intercept the op_free() routine with another routine that knows
where to store the op tree that was about to be freed, to the first
approximation. I also install null nodes in the tree as "pegs" to hang
the exact location of declarations like BEGIN, use, subs, etc.
: Also curious if you have managed to keep comments, POD etc...
Certainly. It takes MAD skills, where MAD stands for Miscellaneous
Attribute Decorations. (Doing anything with toke.c requires madness.)
Well, actually, speaking of doing things piecemeal, I haven't tested
the POD part yet, just the comments. And I'm quite sure I haven't
captured the __DATA__ yet, but that'll have to happen too. But conceptually
it's all there. :-)
The thing is that these MAD props are hung on whatever node is handy at
the time, which might be the token before, but usually is the token after,
but usually *wants* to be somewhere up higher in the tree that doesn't
exist yet. The changes to Perl internals are intentionally very minimal
so as not to influence parsing behavior more than .5 iota, so I don't
try to do any tree rearrangement in the parser. The XML is just the
raw dump of the tree with its misplaced madprops. That's the main
reason for the first pass of translator, to reattach the madprops
at a more appropriate place in the tree.
Interesting issues arise, such as deciding when a comment goes with the
previous code and when it goes with the next code, or when you just
stick it into the interstices for now. At the moment my tendency is
to hoist leading and trailing whitespace into the interstices of the
higher list when that's practical. But with comments you'd like them
to travel with the code they're commenting, in cases where refactoring
moves code around.
The basic problem is that there's no one level that's right to do
the translation. You have to take into account both shallow and
deep information and everything in between simultaneously, because
all of those things are important to the programmer at some point.
I'm aiming for a deeply correct translation that tries to preserve
as much surface detail as possible, but when push comes to shove,
it's the surface detail that has to get shoved, even if that screws
up their pretty formatting. The nice thing about a deep translation
is that you can know when you're guessing, and at least mark it
so the programmer can double-check the translation. A surface-level
translator is always guessing, and doesn't always know it. I dare
say most Perl 5 could be translated to Perl 6 with a series of s///,
but it always be getting stupid just when you want it to be smart.
Gee, it looks like you found my hot button, or at least my warm button.
Maybe I should work up a talk about all this someday...
> The thing is that these MAD props are hung on whatever node is handy
> at the time, [...]. That's the main reason for the first pass of
> translator, to reattach the madprops at a more appropriate place in
> the tree.
> But with comments you'd like them
> to travel with the code they're commenting, in cases where refactoring
> moves code around.
So, what I'm hearing you say is that you have just written the very
very basic skeleton--maybe even just the backbone--of a Perl
refactoring browser. Is that correct?
Once you're done, could the community take this tool that you're
producing and flesh it out into something that would allow for
straightforward refactoring and reformatting of Perl code?