A summary to this lengthy mail:
(1) Why type-enriched Camlp4 is an unreasonable idea
(2) We should extract the typedtree; why it's hard
(3) A fictional narrative of the camlp4/camlp5 history
(4) Why you don't want to become Camlp4 maintainer
(5) How we could try not to use Camlp4 in the future
(6) Syntax extension survival advices
# (1) Why type-enriched Camlp4 is an unreasonable idea
Wojciech, your idea of having type information at the Camlp4 level is
absolutely unreasonable. You are not speaking about a "minor change"
here, but a major rewrite that would affect the compiler internals as
well. It would really be a new (and interesting) project.
Camlp4 is, and I guess will remain, a syntax-level preprocessing
tool. You have to accept the fact that you can't use type information
at this level (but you can certainly "interact" in some way with the
type system by producing/transforming pieces of code in a way that you
know will have interesting typing effects; for example, you may want
to generate code that is purposedly ill-typed in some cases, to
disallow certain uses of your syntax extension). I'm not even sure
what it would mean to access type information at the camlp4 level, as
you're producing and transforming untyped AST; would you want
partially typed ASTs? How is the typer supposed to work on the part
that you haven't transformed yet, and therefore are not valid OCaml
syntax? I suppose you could have a "preprocessing and transformation"
tool at the typedtree level, but that would be a different tool with
different uses, distinct from the syntactic preprocessing part (though
you may develop "extensions" that act on both fronts).
I'm not aware of so much Camlp4 situations that would really require
typing information. I would be interested in good examples if you have
some. One problem that I have had with Camlp4 is that you don't have
identifier resolution information (eg. you don't know if the
identifier "(@)" you're seeing is really list concatenation, or has
been redefined/shadowed in the context); this makes uses of Camlp4 for
inlining, for example, quirky and fragile. That's still a simpler
problem than type information.
# (2) We should extract the typedtree; why it's hard
If you really want to play with type information and scope-resolved
identifiers, AST-manipulating tools is probably not the way to go: you
indeed want full access to the typedtree. Currently this is only
possible by hacking the compiler, and this is what for example Jun
Furuse's Ocamlspotter project does. Those kind of tools could be made
less intrusive if it was possible to pass typedtree-like information
in and out of the compiler.
I remember reading that some people (OcamlPro, I suppose) have this on
their target list. The problem however is that the current internal
compiler's typedtree representation is not at all adapted for external
communication. If you want a kind of tool that is robust and
future-proof in any sense (you could probably get something working by
just marshalling the current typedtree, but then it could break
awfully after minor language changes, make the compiler choke, etc.;
I certainly wouldn't want to use that), you have to design a clean and
efficient representation for OCaml programs after the type inference
phase. Having a solid proposal on this topic wil be an awful lot of work.
(3) A fictional narrative of the camlp4/camlp5 history
Jérémie Dimino wrote:
> But there is something I don't understand here. Why is there camlp4 and
> camlp5 ? These two projects do exactly the same thing and are
> incompatible. So i don't see the point of maintaining them both. We
> should at least deprecate one.
Let me repeat the story as I know it (with possible mistakes, I was
still a caml baby in the <3.10 times) in a hopefully compact form for
those on the list who have no idea about it. DISCLAIMER: this is only
a fictional storytelling, meant to give a reasonable idea (or at least
my vision) of the situation. I may be wrong about the events
chronology, people name, hard facts, and of course english spelling
and grammar. The story is complicated and I don't know the gory
details.
If you know a better story, feel free to add important precisions,
correct the obvious mistakes, etc. I also welcome suggestions to make
it a funny, entertaining read; finally, a few romantic details could
clearly turn it into a blockbuster.
The original Camlp4 tool was mostly developped by Daniel de
Rauglaudre. Apparently, personal relations between Daniel and the
OCaml team were not easy, and Camlp4 was gradually becoming more and
more external to the OCaml distribution (in the past, the stream
syntax was available as part of the core language, but it was moved to
Camlp4; the Oreilly book was written before that move) and its
maintainance status incertain.
In the 3.09/3.10 transition, Nicolas Pouillard started working on
a refactoring of the Camlp4 codebase (which was mostly a silent,
non-moving animal at that time) to make it more easy to evolve and
maintain. The refactoring maybe went "a bit too far", in that it
brought a number of changes to the external interface and, in
particular, broke existing Camlp4 extension, as well as the (quite
good) Camlp4 documentation. Of course they also were advantages, in
that the new design was modular and, for example, the bootstrapping
process was made easier, and Nicolas Pouillard could maintain the tool
as offered by the distribution. The 3.10 transition was however very
painful for people using existing Camlp4 extensions (I'm thinking of
eg. Martin Jambon, which had extensive Camlp4 extensions, and the Coq
team which has user-defined notations using Camlp4 and, huh, I really
don't want to know the details); basically they didn't upgrade to
3.10 -- instead of porting the extensions, as was originally hoped.
Personal note: I learned Camlp4 in this period, just after the release
of OCaml 3.10. I'm honestly unaware of how pre-3.10 Camlp4 was (though
I guess it is not too difficult to move from one to the other) so my
interpretations of previous times are all based on reading the
mailing-list.
Daniel, which apparently did not agree with some of the changes made,
relatively suddenly restarted developpment of "his" branch of Camlp4,
taken from the old sources, before refactoring. This was done as
a separate project, outside the OCaml distribution (apparently Daniel
and the OCaml team prefer not to work together). In a quite creative
move, he named his version "camlp5" so that it could be easily
distinguished from the "upstream camlp4", the incompatible new version
being distributed with OCaml. Camlp5 is therefore the continuation of
the *old* camlp4. Development, however, continued (while still
preserving or mostly preserving compatibility with pre-3.10
extensions, ensuring grateful thanks from the users) with
non-neglectible changes (eg. addition of a library of non-destructive
streams), and is currently ongoing (as is maintainance of pre-3.10
camlp4 in the OCaml distribution).
http://cristal.inria.fr/~ddr/camlp5/CHANGES
Of the projects that relied on Camlp4 before 3.10, some of them ported
they extension to >=3.10 Camlp4, and lived happily ever after, and
some of them jumped on camlp5 as a lower-cost migration opportunity
and, I suppose, also lived happily ever after; or at least, as happily
as you can knowing that your codebase depends on camlp{4,5}.
I make no technical judgments of which version (Camlp4 or Camlp5) is
"better". I know and (try not to) use Camlp4. I also understand the
position of Camlp5 users for compatibility reasons. I would advise
newcomers to try Camlp4 first as it has a larger user base (being
distributed with the OCaml distribution). Anyway, see at the end of
this mail why you may not want to use camlp* anyway.
# (4) Why you don't want to become Camlp4 maintainer
> I am volunteer for the maintenance of Camlp4.
Jérémie, I am deeply impressed by your sense of sacrifice.
Camlp4 is a piece of devil beauty. It does incredibly clever things,
and is incredibly complex inside: Daniel is clearly a remarkable
hacker, but his code is not easy to understand. I know that
maintaining the whole thing is very hard; and that is the reason why
Camlp4 tends to have problems to bump from one version to another,
when non-neglectible syntaxic changes are made to the language.
The core internals of Camlp4 are quite imperative: parsing is done by
erasing tokens from the input stream (which makes for lovely Stream
debugging), but, more importantly, grammar extensions are destructive
mutations of the grammar. It is sometimes painful when using Camlp4,
and I suppose it must be hell maintaining it.
# (5) How we could try not to use Camlp4 in the future
I think Alain's idea of moving out of Camlp4 is actually a good
thing. After having used Camlp4 extensively, I came out with the
general feeling that allowing general extension of the grammar is not
a good thing, because it is too complex and the solutions are too
fragile. On the contrary, the quotation mechanism of Camlp4 is
a wonderful tool, that acts as a *controlled* extension point for the
grammar, and is therefore much more robust. The other excellent use of
Camlp4 is generating OCaml code out of "annotations" on certain AST
nodes, as the 'type-conv' extension does; this is also a form of
controlled extensibility that could, I think, be taken into account in
the base language or a simple tool, and doesn't require full-fledged
grammar mutation.
I think we should isolate such restricted uses of syntax extensibility
and allow them through simpler, more robust tools. Alain's idea, and
the reactions from Camlp4 users (Martin Jambon, Nicolas Pouillard),
can be found on his blog:
http://www.lexifi.com/blog/syntax-extensions-without-camlp4
I'm personally not sure his solution is quite ready to replace Camlp4
yet; or at least the state in which it was at this time. He has an
annotation mechanism, but it is somehow not restricted enough to
guarantee a reasonable behaviour; as long as people are tempted to use
"annotations" to define try..finally, we will have fragile extensions
floating around.
(And I'm not sure it's a good idea to move Camlp4 out of the
distribution as long as we don't have a viable alternative proposal
and some users have started moving to it. Camlp4 will still need to be
supported by someone anyway, and it needs to evolve in lockstep with
OCaml language changes.)
Yet I do agree that "without Camlp4" is the future. It may take some
time, but what I would personally like to see in a few years is an
OCaml world where we don't need Camlp4 anymore, because we have other,
simpler tools to do the reasonable things, and have learned to live
without the rest. The whole "maintain Camlp4" entreprise will still be
useful, necessary until that time, but it won't stay, I hope, at the
center of attentions.
# (6) Syntax extension survival advices
To the reader considering use of a new syntax extension in his next project:
- don't
- if you really must, try to make sure that your code is also
reasonable to use *without* a syntax extension (eg. by producing
a library with a clean interface, making your extension desugar to
uses of it, but also making sure that it can be used by the human
user)
- if you really must, try to get it in the form of a quotation; the
rest is fragile
- alternatively, try to branch yourself on existing "flexible" syntax
extensions such as Markus Mottl's 'type-conv' and Jeremy Yallop's
'patterns': you are relatively safe if you don't write any line of
code modifying OCaml's syntax yourself.
http://hg.ocaml.info/release/type-conv
http://code.google.com/p/ocaml-patterns/
- do not hesitate to send for code review and/or ask for help on the
list
- don't