Thanks for your interest. A draft of the "Newspeak" paper is here: http://xlr.sourceforge.net/newspeak. However, please be informed that when I submitted it for review, it was rated "Bad" to "Weak". I didn't have time since then to try and improve it. Nevertheless, I will certainly appreciate any helpful comments to make it better.
Regards
Christophe
> --
> You received this message because you are subscribed to the Google Groups "XL programming language and runtime" group.
> To post to this group, send email to xlr-...@googlegroups.com.
> To unsubscribe from this group, send email to xlr-talk+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/xlr-talk?hl=en.
>
As indicated in my previous reply, this reply is going to be quite long. It's also answering questions that are of general interest to the XL community, so I CC xlr-talk as well.
On 27 févr. 2011, at 22:23, Marc Coram wrote:
> Anyway, the aspects I'm most interested in at this point and that I'd
> like to see in a future paper are more "bottom-up".
> For example, if we
> unroll all of the language-extensions and present a minimal XLR, what
> exactly is left?
1) A scanner/parser generating a parse tree with 8 node types, 4 terminals (integer, real, text, symbol/name) and 4 structure constructors (infix, prefix, postfix, block).
2) A rewrite engine transforming one tree shape into another. There are currently 4 different implementations of this rewrite engine that I will detail below.
3) Primitive forms for declaring rewrites. Currently, we have ->, where 'foo 0 -> 1' means that the tree with shape 'foo 0' is rewritten as 1, and the 'data' prefix, where 'data x,y' means that no further rewrite occurs when we hit a form that matches. Those have to be known by the compiler. If you want a language with mutable values, assignment also goes here.
4) A way to enter / declare other primitive operations. Here, the various implementations differ. In XL2, it's a declaration which refers to the XL.BYTECODE module. In the dyncompile XLR branch, it's a rewrite with 'opcode' or C on the right, e.g. 'x < y -> opcode ICmpLT' or 'allocate N:integer -> C malloc'. Those also have to be known to the compiler.
Of course, the language also requires a basic runtime library, e.g. arithmetic, basic math, text, I/O, ... But with the above element, it's possible to construct that entire library. So I believe that's basically it.
The existing rewrite engines are the following:
a) The XL2 rewrite engine runs only at compile-time, and has multiple instances used to distinguish translation phase (macros, semantics, codegen, generics, ...). It integrates the XL2 type system in the selection of possible forms.
b) The XLR interpreter uses a similar, but slightly simpler and faster rewrite selection algorithm, and uses it to transform trees in such a way as to implement evaluation. It's basically the XL2 macro engine used as a program evaluator. The simplification is that the XL2 algorithm takes the 'best match', whereas the XLR algorithm takes the 'first match'. For example, in XLR, the following factorial definition leads to an infinite loop (but it would terminate with the XL2 rewrite engine):
N! -> N * (N-1)!
0! -> 1
c) The XLR "old" compiler implements a large part of the rewrite engine with LLVM-generated code. Basically, the compile-time part determines the possible candidates at compile-time, and the run-time part compares actual tree values to select the first match. All typing in that "old" compiler is dynamic, i.e. if we have X:integer+Y:integer and X:real+Y:real, we will actually check if X and Y are integer or real at runtime. The internal representation of all elements is always a dynamically Tree *, including for types such as integer. This puts a huge burden on the garbage collector.
d) Finally, the current work on the compiler (in the dyncompile branch) implements a Haskell-style type inference mechanism to better narrow the possible value types. As a result, it often can narrow actual types enough to be able to use native data types (e.g. integers, floating-point numbers). It automatically boxes/unboxes them from the corresponding XL tree classes (e.g. Integer or Real) as required. As a result, on some micro-benchmark, this version has been within 15% of optimized C performance-wise.
> How is the precedence of reduction rules decided?
In all cases, it is implemented in the symbol table.
The complex solution (in XL2) is summarized as "largest and more specialized wins". Basically, it means that X+Y*Z wins over X+Y, and X+0 wins over the more general X+Y. This is achieved by computing a "score" for each candidate. The details are here: http://xlr.git.sourceforge.net/git/gitweb.cgi?p=xlr/xlr;a=blob;f=xl2/native/xl.symbols.xl#l166. The Lookup function is the one that gets to pick the candidate.
The slightly simpler and faster solution in XLR is similarly implemented in the Context class, see http://xlr.git.sourceforge.net/git/gitweb.cgi?p=xlr/xlr;a=blob;f=xlr/include/context.h;hb=dyncompile#241. The lookup implementation is a template member so that we can configure it for a variety of uses.
The interpreted evaluation is here: http://xlr.git.sourceforge.net/git/gitweb.cgi?p=xlr/xlr;a=blob;f=xlr/context.cpp;hb=dyncompile#l545.
The compilation goes through the same template code, ending in http://xlr.git.sourceforge.net/git/gitweb.cgi?p=xlr/xlr;a=blob;f=xlr/expred.cpp;hb=dyncompile#l300 through the RewriteCalls class used as an evaluator for Context, see declaration in http://xlr.git.sourceforge.net/git/gitweb.cgi?p=xlr/xlr;a=blob;f=xlr/args.h;hb=dyncompile#l78.
> How is parsing done?
The first pass is scanning, which decomposes an input UTF-8 stream into tokens, see http://xlr.git.sourceforge.net/git/gitweb.cgi?p=xlr/xlr;a=blob;f=xlr/include/scanner.h;hb=dyncompile.
Then, the stream of tokens is turned into a tree by the parser, see http://xlr.git.sourceforge.net/git/gitweb.cgi?p=xlr/xlr;a=blob;f=xlr/include/parser.h;hb=dyncompile.
This parsing is configured by precedence tables read from the xl.syntax file, and stored in the Syntax class, see http://xlr.git.sourceforge.net/git/gitweb.cgi?p=xlr/xlr;a=blob;f=xlr/include/syntax.h;hb=dyncompile. That syntax can also be dynamically altered by the program using the syntax keyword, see http://xlr.git.sourceforge.net/git/gitweb.cgi?p=xlr/xlr;a=blob;f=xlr/tests/05.Control/syntax.xl;hb=dyncompile for an example.
The structure of the parse tree is defined by the tree.h file: http://xlr.git.sourceforge.net/git/gitweb.cgi?p=xlr/xlr;a=blob;f=xlr/include/tree.h;hb=dyncompile.
All this has an XL2 implementation in the xl2/native directory.
> How does a pragma function? [can it modify the parser?]
Pragmas only exist in XL2, not in XLR. They work by invoking lookup with the given pragma name (it's the 'category' argument to XL.SYMBOLS.Lookup). In other words, {XLSemantics} foo is equivalent to looking up the rewrites generated by all the 'translation XLSemantics' statements.
When you define a compiler extension, the same principle applies. For example, the differentiation plugin is declared with something like this: http://xlr.git.sourceforge.net/git/gitweb.cgi?p=xlr/xlr;a=blob;f=xl2/native/xl.plugin.differentiation.xl#l161
161 translation Differentiation
162 // ------------------------------------------------------------------------
163 // Translate differentiation statements
164 // ------------------------------------------------------------------------
165
166 when
167 'dFn' / 'dVar' ('dExpr')
168 where
169 (dFn.kind = PT.xlNAME and dVar.kind = PT.xlNAME and
170 IsDName(dFn, 1) and IsDName(dVar))
171 then
172 // General case
173 return Differentiate (dFn, dExpr, dVar)
174
175 when
176 'dFn' / 'dVar'
This in turns causes the XL2 compiler to populate its symbol table with rewrite entries that will match the 'dFn / dVar (dExpr)' forms for differentiation. Now, when a program contains {differentiation} d/dx (1+x), the parse tree form will match dFn/dVar(dExpr) with dFn=d, dVar=dx and dExpr=1+x.
The 'when' condition will be evaluated (through a compiler-generated callback that matches the condition after 'when'). Since it matches in that case,
The symbol lookup will then invoke the corresponding callback, which is compiler-generated code corresponding to the "return Differentiate..." part of the source code.
However, a pragma cannot modify the parser. The way to reconfigure the parser is through the syntax, as explained above. However, as far as I remember, it cannot be reconfigured dynamically in the XL2 compiler.
> Is "..." built-in?
Depends on what you call "built-in". It's part of the standard XLSemantics phase, yes. It's detected here: http://xlr.git.sourceforge.net/git/gitweb.cgi?p=xlr/xlr;a=blob;f=xl2/native/xl.semantics.overload.xl#l61. This sets the 'variadic' flag, and this is used to adjust scoring in the scoring callback for this symbol.
> How/when does the code generation pipeline function?
In the XL2 case, we generate high-level code (C or Java currently, XLR planned but not implemented yet).
Special functions declared as belonging to XL.BYTECODE are flagged so that when we invoke them, we emit the associated "bytecode text". Most "typically useful" operators are declared in xl_builtins.xs, which is loaded by default: http://xlr.git.sourceforge.net/git/gitweb.cgi?p=xlr/xlr;a=blob;f=xl2/native/library/runtime/C/xl_builtins.xs.
The corresponding generated text is configured via the xl.bytecode file, see
http://xlr.git.sourceforge.net/git/gitweb.cgi?p=xlr/xlr;a=blob;f=xl2/native/library/runtime/C/xl.bytecode
Changing the target runtime is simply changing the set of files used for output, i.e. to emit Java instead of C, we switch to native/library/runtime/Java/xl.bytecode instead of native/library/runtime/C/xl.bytecode.
XLR gets its name from "XL runtime", because it's intended to be a way to generate native code dynamically, i.e. to be a way for a new "runtime" in XL2 to generate code on the fly. However, it sort of took a life of its own as an independent language :-)
In the XLR case, we generate code by calling LLVM. It's mostly implemented in compiler.cpp, and the underlying logic is very different between the "old" and "new" compiler.
> What is a sequence of extensions from that core that result in an acceptable conceptual programming language?
I'd say, mostly adding a big library. Something which I never managed to complete in any version so far...
> When does one introduce typing?
I hope some of the answers above illustrate that. Having an extensible type system is something that I have not managed to design just yet...
> Is the re-write system similar to macro-systems?
In XL2, yes, it's the "XLMacro" phase of the compiler. In XLR, it's just the standard rewrite and tree manipulation mechanism. With a few escaping mechanism, e.g. the 'parse_tree' built-in, which is conceptually similar to quote in Lisp.
> Is it "hygenic"?
Just as hygenic as the standard function binding rules, I believe.
> What would a staged computation look like?
Not entirely sure :-)
> These are issues that I guess, for now, I have to answer from the code base,
> which is daunting, but certainly more achievable than for truly large/
> messy languages.
Hope that the pointers I gave will help you.
> More structurally, it might be helpful to compare with other language extension efforts. The Racket people enable custom parsers and sophisticated mechanisms to increase the modularity of Scheme macros (this includes very practical concerns such as debugging and code tracing).
I'm not there yet, unfortunately.
> They've got a typed-scheme variant, and java-esque dialect,
> and even a presentation dialect which is relevant to Taodyne, I
> suppose.
I looked it up, it's very interesting. I need to spend more time reading it. For readers on the list, see http://docs.racket-lang.org/quick.
> The language Kernel (Schutt) revitalizes the fexpr approach (with pleasing minimalism). Other approaches utilize Pratt parsers or packrat parsers to produce dynamically programmable languages. Probably you're already aware of many of these.
Yes, but this is a field with a lot of activity going on.
> Oh, I'd add one thing to your list of pseudo-metrics: Does the language facilitate extremely DRY code?
That's part of the "bandwidth" metric, isn't it?
> I.e. are there cases where, for practical reasons other than for intentional conceptual clarity, programmers are forced to emit roughly the same code more than once?
I think this happens mostly for primitives, because they are almost, but not entirely identical. For example, integer comparisons differ slightly from their FP counterparts, signed and unsigned shifts are not exactly symmetric, ...
> Anyway, best wishes,
Thanks a lot. Keep asking good questions.
Christophe