Macro Code Generator

0 views

Skip to first unread message

Ilse Marseau

unread,

Aug 4, 2024, 6:16:19 PM8/4/24

to bliterinem

Howeveryou should be aware that this is merely a limitation of the C preprocessor. For example, in the Lisp family of languages, macro expansion is code generation, they're exactly the same thing. To write a macro, you write a program (in Lisp) to transform S-expression input into another S-expression, which is then passed to the compiler.

It's a tradeoff. Let me give an example. I stumbled on the technique of differential execution around 1985, and I think it's a really good tool for programming user interfaces. Basically, it takes simple structured programs like this:

Now, a really good way to do that would have been to write a parser for C or whatever the base language is, and then walk the parse tree, generating the code I want. (When I did it in Lisp, that part was easy.)

However, when I do this in C#, somebody in their wisdom decided macros were bad, and I don't want to write a C# parser, so I gotta do the code generation by hand. This is a royal pain, but it's still worth it compared to the usual way of coding these things.

Okay, we all know why macros are evil and why no one likes them. What I would like to present are reasons why I think Kotlin should have them anyway. Not maybe as a tool for everyday use, but as a set of high powered tools when nothing else works. The way I would implement them is the same way it is done in Nemerle (little known fantastic language for .net), as compiler plug-ins. IDE should be able to expand every macro to see generated code. The principal reasons for macro systems are:

(2) There are some very good use cases. Macro could log in to a relational database at compile time and check you SQL queries to see if all tables and fields really exist. Also, writing most design patterns could be automated via macros. I know IDE can automate design patterns as well but when it does so it generates code which increases LOC and boilerplate. Macros, on the other hand, generate code on compile time so it is not as visible (but you could still expand macro if you want to know what it outputs.) The cancer that is killing Java is not the lack of any specific feature but boilerplate, so I am all for radical steps to prevent boilerplate.

I would like to generate no_std Rust code (instead of C) ideally using Rust (instead of Python). The generated code are algorithms (mostly multiplication of small dense matrices, for which I could use ndarray/nalgebra, or roll my own implementation), with some data specific parts. The approach I have follow so far would work using Rust as well. My question is:

Is there a better way? Should I invest time learning Rust macros or other tools (does that make sense in my particular case)?

Rust macros are a convenient way to, from within a Rust program, ask for code generation incorporated into that program. If you have a tool that is generating the entire program, then there is much less reason to use macros.

However, some of the tools used for Rust macros may be useful for your code generation. For example, quote can build up programs from fragments, while ensuring that there are no lexical errors of the sort string concatenation can produce.

where SomeType is a generated type that implements a trait whose functions are the data-specific numerical functions your generator generated. (Or in other kinds of problems, the thing to pass to library::main might be a static variable containing a data table.)

This way, as much of the code as possible is normal Rust code in a library, which can be maintained using all of the usual tooling, and problems which involve tinkering with the generator and rerunning it are kept to a minimum.

Just for the sake of my understanding, consider how I used to do things in the C/Python scenario. To keep the compiled-code size as small as possible, I basically generated code without branches (e.g., noif-else) using conditional compilation. In C, it was relatively straightforward using #ifdef. That is, depending on the special structure of data set, I could generate a few #defines, and this would compile only the code relevant to handle the data set in question.

In Rust, this could be roughly replicated using features, right? That is, instead of the 3 points you suggested, the process would be something like this

It may be worth at least investigating the ease and output quality you get by using Rust traits and generics, since they seem to logically match up exactly to replacing parts of of a template and it's a bit nicer to actually edit. They get talked about mostly with being able to swap out methods, but you can also use associated constants.

If you need to generate and use fancy long byte arrays for those constants, you can use include_bytes! to include data on disk, or even write your own procedural macro to instead generate the values on the fly.

Just for the sake of my understanding, consider how I used to do things in the C/Python scenario. To keep the compiled-code size as small as possible, I basically generated code without branches (e.g., noif-else ) using conditional compilation. In C, it was relatively straightforward using #ifdef . That is, depending on the special structure of data set, I could generate a few #define s, and this would compile only the code relevant to handle the data set in question.

In Rust, this could be roughly replicated using features, right?

It is very easy to, when editing a library that uses features, make a change which causes it to not compile under some particular combination of features, and not notice. If you had heavy use of features, you would want to write a test script which builds and tests the library under many combinations of features.

If you instead write ordinary generic code, the compiler can and will check that it is valid for all possible instantiations (though of course, that does not prevent there possibly being a bug that is detectable at run time only).

I am unfortunately not yet proficient in writing code using traits/generics. But in principle, in my particular case, it could be possible to write most of my code using traits and generic types. The idea would be as follows:

GATs are new enough (on stable) that I've not yet used them really at all, so I don't have a good feel on where you can use them other than those examples, but certainly I've heard they can be used for some really fancy stuff.

That isn't quite what I meant. I was suggesting your code generator could generate a type (possibly one carrying no data itself) and accompanying trait implementation (whose functions can have whatever bodies the generator wants).

Thanks a lot for your input and time. It really enlightened me. I will try to implement something simple using your suggestions. I might come back to this thread in a few weeks in case I have more questions.

Cheers!

It's one thing to iterate and improve on a solution, but I'm not here to rehash solved problems. Especially not problems I've already solved. (Side note, that's why I love a healthy, high quality open source ecosystem.)

Copy and pasting code means you get code drift and conceptual duplication, which is problematic. It also means if you find a bug in that copy pasted code, congratulations, that bug now exists in multiple places. I hope you find them all.

Tradeoffs here include less immediately accessible readability, and increased compile time, but used judiciously this can be a worthwhile trade. If you're not generating hex docs for your team's usage, the accessibility drops off even more dramatically.

Code generators as they are commonly used in Elixir don't attempt to update all the instances where you generated code en masse. This means if you want to make an improvement anywhere you're having to find all the places this was used manually, and update it manually. You're basically back to copy/paste, but possibly more convenient.

Future exploration here would include using the AST to compare the generated older code, with potential generated new, using a framework that generates the migrations and updates them appropriately across the entire codebase automatically. This would be a bigger win than both the macros and the code generators.

Hypothetically, a generator framework could either be used to support both macros, or generated code, as described via templating. Win/win. It could run the code version migrations everywhere the old code version matches across the whole codebase, until a generator version is marked deprecated, at which point it is never used.

The existing converter is very bad. It looks like a macro generator program which produces pretty awful HTML code. So there are two questions. The existing code output is compliant with HTML 4.0, will this be changed to HTML 5? And, has the code generator been modified to produce better HTML?

Now, suppose we want to customize the f1/1 function. The way to do it is to edit the source code directly, but the problem with macros that generate code is that there is no source code for us to edit! However, this is where the special features in the CodeGen module become useful. Remember we have defined a number of named blocks. First, we can query the module to see which block names are available (of course, the author of the template module should make that clear in the documentation, but querying the block names is always halpful):

This CodeGen module is useful in any situation where you want to put some code in a module without adding literal code to the file, but in which you think you might have to customize some of the functions by editing the source code.

The idea is precisely not to have to pass optional arguments, or even to customize what the default value of the parameter is! This is precisely the case with my PhoenixComponents example. One may want to specify a default width for form fields (i.e., set a default value dynamically at compile-time) and allow the user to override it on a case-by-case example (by making it an optional parameter in the component).

What is it that you say that can be achieved with functions only? The end result of useing CodeGen is to add functions to the currenty module without the user having to spell them out (just like use GenServer does, for example). That has to be done with metaprogramming, AFAIK.