I understand your fears.
But, in my opinion, you try find problems of the compiler performance
not in that place.
The basic problems of compiler performance are caused by following factors:
1. Use of not suitable algorithms:
1.1. Simple defects - use of loops for search of elements instead of
table lockup (Use of improper data structures (for example, lists
instead of hash-tables.)
1.2. Use universal (but is terrible hard) algorithms. For example,
the overload resolution algorithm rather is not effective in case of
extension methods overload resolution (and it can be simple improved).
1.3. Multiple repeat of typing process in case of speculative
typing. Today the compiler cannot use the typed expressions received
at speculative typing process.
2. Functional оверхэдом (closure and virtual calls have are much more
cost than cycles).
3. Generation of not optimum code. For example, not use switch.
An available compiler code is very difficult for analyzing and to
optimization in consequence of high coupling and foggy.
After the refactoring (or in process) it will make much easier.
> Similarly
> separation of delayed typing from immediate typing and error reporting
> as separate pass - this is a way to make compilation of most common
> cases (code not requiring hard type inference and without errors)
> fast.
It is necessary to introduce concept of «speculative compilation». To
hide errors is necessary only at speculative compilation. Presence of
«a silent mod» only complicates a compiler code. This technics is very
difficult to understand for other programmers. I have spent a lot of
time when understanding this silent mode. And I have found many bugs
in its implementation.
At least, we should to reconsider it.
Two passes is an obvious harm. The two-pass scheme is badly compatible
with macro.
Macros should work always in a normal mode. And macros should not take
in account problems of a error messages handling (including those
which appear at speculative typing).
I think, rewriting the compiler from scratch and caring of
productivity we shall make much faster the compiler which will be
clear from hacks.
> It will also require touching lots of code (e.g. MType to operate
> needs Solver object, so either you pass it in constructor or pass it
> in every method you call, etc.
Good example! TyVar (MType derived from it) already has the reference
on Manager (ManagerClass). The TyVar get a reference on Solver through
Manager. It is a big design bug! It is necessary to store the direct
reference on Solver in TyVar. It only will improve productivity.
Will be better to think how to optimize types graph and Solver.
By the way, it necessary to add the Location into TyVar. Today we
simply lose the location information for types annotations.
> Some of it could potentially be
> simplified by macros, e.g. one which change every call of given method
> to another method with additional argument passed from current
> context: mtype.Require(ty) would be rewritten by macro to
> mtype.Require(solver, ty))
In it simply there is no necessity. In this cases it is better to not
hide a reality from the programmer.
> If you do this you might as well eliminte Preparser I guess. It was
> created mainly to group code parts into {} () [] chunks, which could
> be then used in macros. If you don't constrain yourself to those
> groups and just pass something like TokenStream to macros, then I
> think you don't need preparse stage.
> I think this is what you are planning, since with macros able to kick
> in at lexer stage you will be able to move most of the parsing into
> macros.
I think, we can consider the PreParse as Lexer-macros which give out
the processed stream of tokens. If we wish to support the Python like
syntax the given feature is equally necessary to us.
DSL not compatible with Nemerle syntax can be presented in the form of
Lexer-macros.
> > • Support macros in pattern matching patterns.
> Oh, this is kind of strange feature, but why not... Any example
> reasoning why this could be useful?
Pattern matching on XML. For example:
| <a><b>$x</a></b> => doSomething(x)
or
| /a/b/$x => doSomething(x)
> • Make macro syntax similar to EBNF.
> This is going to be a hard at implementation stage (you will need to
> come up with parser for general grammars and additionally make it very
> dynamic to modifications - e.g. every time you import some namespace
> you need to rebuild the parser). I'm not sure how you would like to
> make it work with macros processing arbitrary token stream, e.g. EBNF
> rules specified by user will definitely contain things like:
>
> E1 => A Arbitrary1 C
> E2 => A Arbitrary2 D
>
> and even if normally you could decide on E1 vs E2 rule looking ahead
> at C vs D, now you won't be able, since you need to choose between
> Arbitrary1 and Arbitraty2 - i.e. you can't look ahead of any arbitrary
> (macro eaten) stream because you cannot know where does it end.
> Adding full EBNF support is just a huge task (people writing stuff
> like ANTLR eaten years developing it)
I have some ideas about this. For dynamic creation the parser (parser
which marge a some number of gramars at runtime) it is possible to use
algorithm known as "Earley parser":
http://en.wikipedia.org/wiki/Earley_parser
But I agree that it is serious change which it is not necessary to
make at the given stage. First it is necessary to make the refactoring
of an available compiler.
> You will probably want to get rid of notion of "keyword", e.g. "class"
> would no longer be a special keyword recognized always, but you would
> at each moment of parsing have a map of currently expected tokens,
> which might terminate current parsing rule.
Yes. But we can convert a standard keywords in a macros and move it
into global namespace.
In this case we could switch off standard keyword by preprocessor
directive or somehow. It allows creating high-grade DSL on the basis
of Nemerle.
Probably this improvement too can be postponed till the best times.
It is now important to make a code of the compiler:
1) clear (more simple and rectilinear);
2) loose (low) coupled;
3) fast (to not lose in performance).