Yes, it's true. The generation of the parser is now completely
template based. But, N.B. not the generation of the lexer, which is
stil only partly completed.
Here are the two templates that generate the the parser:
http://code.google.com/p/kawadd/source/browse/trunk/kawadd/src/kawadd/output/Parser.java.ftl
http://code.google.com/p/kawadd/source/browse/trunk/kawadd/src/kawadd/output/javacode.ftl
It's about a thousand lines of FTL. I think this can also gradually
evolve into a more modularized, clearer state. My basic idea is that
the master template will be still pretty much target language agnostic
and there will be javacode.ftl and pythoncode.ftl and rubycode.ftl
macro libraries that do the real work of generating language specific
code. Of course, I'm still a ways away from that. but I did achieve
one initial goal. Note, of course, that the template works by running
over data structures that are built up on the java side. But still,
it's interesting to note that the code in kawadd.parsegen that handles
that is not terribly bulky. I mean, the whole generation of the KawaDD
parser, leaving aside the parsing of the grammar, and all the lexer
side, is about 2000 lines of java and 1000 lines of freemarker. And
note that that java code is not specific to generating a parser in
Java. In principle, all you need to do to target another language is
to translate the templates. Again, note that the lexer side is not
finished.
This was much much harder than I anticipated. The thing was that when
I undertook this, I didn't realize how utterly entangled the code was.
I mean, the JavaCC code was utterly unreadable but I figured that this
was mostly because I had very little understanding of how parser
generators work under the hood. Now that I've undertaken this cleanup,
I actually see that it is mostly not that complicated. The problem is
that the whole thing was so utterly entangled -- really, implemented
in a rather amateurish way. I mean, it works, but it's code that was
in a state where nothing can be done with it.
I think that any experienced implementor would break down the problem
into a few stages. I mean, first, you parse the grammar, next, you run
one or more passes over the grammar that build up data structures for
the code generation step. Then you walk those data structures and you
output actual code. And then, as a final and more or less minor thing,
you run a beautifier over the code so that it looks reasonable.
But no, there is no notion in the JavaCC codebase of this kind of
separation of concerns. All the above concerns are basically all
entangled together. I mean, if you look in the original JavaCC
codebase at this thing called ParseEngine.java, you see code that
intersperses the building up of data structures, with the generation
of actual code. I mean code that generates code is building up data
structures for the next code to be generated. And then to top it off
there is this really funky scheme of outputting to an intermediate
buffer with these \u0001 and \u0002 symbols that are later replaced
with indents and outdents. And all these things are mixed together!
Unbelievable.
The generated code is messy since I basically chopped out all that
indent/outdent scheme. Finally, I don't think that that is very
important. At some point, I'll just add in a final pass on the
generated output that indents everything properly. A typical user has
fairly little reason to read the generated code. OTOH, I had plenty of
reason to look at the generated code, since when I broke something, I
would compare the code generated by the previous version that worked
with the current version that didn't. But, basically, whenever I
eyeballed a generated source file, in eclipse or sometimes in JEdit, I
simply beautified the source, so that's hardly a problem anyway.
Well, functionally, KawaDD is still about exactly the same as as
JavaCC (except for the removal of static parsers and all the ReInit()
methods to reuse existing non-static parsers.) I would be curious to
know if anybody has any issues using KawaDD on existing JavaCC
grammars. Basically, if you go:
svn co http://kawadd.googlecode.com/svn/trunk/kawadd
cd kawadd
ant
then it should checkout and build. You can use the kawadd or
kawadd.bat scripts in the ./bin directory, as in kawadd MyGrammar.jj
Best Regards,
JR
P.S. A final caveat is that the FTL templates are written against the
as-yet unreleased 2.4. That is why some of the directives, such as
#var and #set will look unfamiliar if you know FreeMarker.
Brian, the boilerplate on top of the source files is essentially the
same as what was on top of the JavaCC files as well as what appears
here, for example:
http://en.wikipedia.org/wiki/BSD_license
I'm not a lawyer, so I am not sure exactly what "all rights reserved"
means in this context. i guess it means all rights that are not
specifically granted in the license that follows, but I dunno really.
As a practical matter, AFAICS, and again, I'm not a lawyer, the
license verges on putting the thing in the public domain. You can do
whatever you want with it as long as you acknowledge the original
authorship.
>
> Obviously kawadd requires Java 1.5 to run, but as of now seems to
> generate Java 1.4 code. Do you plan to have a multiple Java templates
> ala ANTLR, or only support 1.4 or 1.5?
Well, one of the things about the template approach is that it should
make it fairly easy for people to customize as they wish. Frankly, as
a user myself, I don't feel that the ability to generate java code
that uses 1.5 constructs is all that important or useful, so, no, it's
not something that I am very concerned with. Basically, the java code
that is generated is an intermediate format. Ultimately, what is run
is the bytecode. So, whether the code you *generate* uses generics or
not, for example, is pretty much irrelevant, since all the generics
information gets erased in the bytecode anyway. I mean, if it
generated extra robustness in the resulting bytecode, it would be one
thing, but as it happens...
The main advantage of the generics is to make the source more clear
and readable, but one should not have to read the generated java code
that often, so IMHO the whole issue is rather marginal.
Besides targetting other languages such as Csharp or ruby or whatever,
one initial goal I have is to consolidate javacc/jjtree/jjdoc as a
single program. I think that jjtree and jjdoc will just melt away as
separate programs because they can be thought of as, in the jjtree
case, just turning on the parts of the template that contain
tree-building actions, and jjdoc is just a question of using a
different set of templates that, instead of generating java (or
whatever language) code generates a navigable HTML view of the
grammar.
Another thing I want to do is to improve the error handling, which is
really fairly lame in JavaCC.
>
> I'm not too familiar with the #set ftl directive. Isn't this going
> away from the idea that the presentation layer shouldn't contain
> logic?
Well, if the #set directive goes against that, then the #assign
directive would as well, so I don't see that as much of an issue. In
FreeMarker 2.3 (and earlier) you have the directives #assign and
#local which exist to create or define variables at the namespace and
macro levels respectively. In FreeMarker 2.4 (still unreleased) those
directives still work as before, but are superseded by the newer
directives #set and #var where #var exists to explicitly declare a
variable in the innermost scope containing that declaration. And then
#set affects the value of that variable. So.
<#list users as user>
<#var foo>
.....
<#set foo = "bar">
...
</#list>
This declares a variable foo in the "list users as" loop and the set
directive automatically sets that foo variable. This is different from
assign/local in 2 basic ways. The assign/local had a glitch that it
was easy to make a certain kind of mistake.
<#macro x>
<#local foo = "bar">
....
<#assign foo = "baz">
...
</#macro>
In the above, the assign directive actually creates a new variable
called foo at the namespace level, rather than resetting the one that
was declared higher up in the macro. This, I think, is a very easy
mistake to make. Oh, and BTW, the #set directive in the absence of a
corresponding var directive simply creates a variable at the namespace
level. However, in strict_vars mode, which I am using in those
templates, (that's what the <#ftl strict_vars = true> means) you
cannot set a variable if it was not already declared previously via
#var. This impedes a certain class of mistakes, involving misspelling,
where you write:
<#set filename = "foo.bar">
...
and then further down, you have:
<#set fileName = "baz.bar">
and instead of changing the existing filename variable, you create a
new variable called fileName. If strict_vars mode, the above mistake
gets flagged because you declared a variable via <#var filename> and
you didn't declare one named fileName, so the latter blows up and
alerts you to the mistake.
Another basic improvement is that you can declare variable in any
block scope, like within a loop, or an if condition, or whatever,
rather than just at a macro level, so it's more general.
> In classic MVC, isn't the view just smart enough to put the
> model into the right holes?
I don't really think so. Some people claim that MVC means no logic in
the view. I don't really think that's tenable, so my own take on this
is that MVC means no logic in the view that is not specifically
view-related. The view may have to be fairly smart to handle
presentation issues.
What it boils down to is separation of concerns. If you have broken
down the problem into different pieces, you want it to be feasible for
people to work on a given piece independently. In the typical
ecommerce web app use of FreeMarker, you want web page designers and
programmers to be able to work independently without stepping on each
others' toes. In my view, a lot of the MVC blah blah misses a
fundamental aspect of this. If the view layer is too underpowered, you
violate SoC (separation of concerns) because the people who work on
that layer will have to resort to asking the java programmers for help
to solve typical view-related problems.
There's that paper T. Parr wrote about model-view separation, which
(at least if I really understand it, big if :-)), is mostly fallacious
IMO. Basically, that kind of thing generalizes (fallaciously) from "no
business logic in the view" to "no logic in the view" and, as a
consequence, draws a whole bunch of fallacious conclusions.
Of course, as it relates to freemarker, freemarker basically came into
existence to separate out the view layer when developing more or less
typical kinds of e-commerce web sites, so using it to generate java
code in this case takes it away from its original use case. Probably
the model-view separation in this case is that you want the java code
to build up data structures that specifically relate to the parser
grammar at hand. And then the templates walk over that data and
generate java code or python code or a navigable HTML presentation of
the grammar or whatever.
So, you know, it's not 100% obvious exactly how much the partitioning
of the problem to facilitate the building of ecommerce web sites
transfers over to partitioning the problem of generating parsers. At
first blush, they're very different things. But still, there is a core
thing in common, which is that, in either case, it is highly desirable
to partition the problem, that you separate out the generation of
output from the backend machinery.
Hope that answers your questions.... :-)
JR