A path to "Hello, World!" for the bootstrap Rune compiler

1 view
Skip to first unread message

Bill Cox

unread,
Jun 30, 2023, 6:08:27 PM6/30/23
to Rune language discussion
So far, we've built modules of Rune's bootstrap compiler (rewriting Rune in Rune) iout to a pretty good level of detail before moving to the next phase.  I would like to propose changing that, and instead building out just barely enough functionality in each of several phases to enable the new Rune bootstrap compiler to compile the tests/helloworld.rn program.  For example, instead of finishing the HIR builder in full, we can begin making progress on the next modules after simply building support for print statements that are capable of printing strings.

I did this for the original C Rune compiler.  It allowed me to spend a random hour here and there to add more functionality over time in all the different phases, all the time having a working compiler.  Having a Hello World capable compiler will unblock progress on all the phases, and contributors can then feel free to work on any phase they like.

The bootstrap compiler phases should include:
  • What we have now: lexing, PEG-parsing, HIR-building.
  • Loading additional syntax files for DSL extenstions.
  • Traversing top-level statements to find import/use statements, and parsing these dependencies, including building HIR for them.  We can skip this for now.
  • Executing transform and relation statements, which do many things to the HIR such as adding linked lists between classes, and lowering DSL parse tree extension to HIR.  We can skip this for now.
  • Discovering variables and class data members, which we can skip for now.
  • Semantic checking (skip for now).
  • Build out the datatype module.  Just support for strings is enough for now.
  • Type inference, which where we should do just the minimum for now:
    • Write a function to queue the top level function, which is the auto-generated "main" function.
    • Create an inference engine that can propagate type information in any direction, even between statements.
    • Needs to deal with functions taking no parameters that return nothing (the None type).
    • Should be able to infer the type of a string constant
    • Needs to deal with println parameters.
    • We can skip support for type constraints for now.
  • Post-processing of bound signatures.
    • For now, just add missing return statements, without reachability analysis.  The helloworld sub-function of main is not legal without a return statement.
  • Add memory management (skip for now)
    • For "primary" classes, creates a structure containing arrays for data members of all class objects it owns.
    • We'll just create the global primary class structure for now.
    • Generate allocate/free functions per class (not template), and add calls to constructors/destructors.
  • Inlining iterators (skip for now).
  • Lowering to LIR (Low-level IR) pass.
    • For now, assume we'll use the HIR data structures and do the lowering in-place.
    • Println post-processing needs to be done to determine a format string.
    • Convert println statements to calls to runtime_sprintf, a C function in runtime/io.c.
    • Explode expressions into statements that perform only a single operation (skip for now).
    • Convert statements that operate on dynamic arrays to calls into the runtime (skip for now).
    • Convert integer types > 64 bit to runtime.BigintArray, and change operators to runtime calls (skip for now)
    • When using the C code generator backend, convert integer types <= 64 bit to C-compatible widths.
    • Add temp array allocation and freeing around statements that generate temporary array results.
    • Note that big integers and strings are also just dynamic arrays.  All Rune heap data is stored in dynamic arrays.
  • Generate C code from the LIR.
    • For now, only needs to support calling runtime builtin functions like runtime_sprintf.
    • Later on, we should add debug annotation so that gdb can step through the original Rune files.
  • Generating LLVM IR from the LIR (can skip for now)
    • Doing this well is unfortunately a career in itself, as we have to duplicate the huge number of optimizations performed in Clang's front-end such as vectorization and loop unrolling.
  • Upgrading the runtime library (skip for now)
    • We need ultra-fast constant and non-constant operations on Bigints.  The CTTK library works, but is too slow as-is.
      • Either upgrade CTTK, or link in additional libraries for this.
    • Upgrade dynamic array allocation/free to be much faster (I wrote a doc on this sometime back).  Right now, it just calls calloc/free.

I'll add some documentation about the various passes in the g3doc directory.

Bill

Reply all
Reply to author
Forward
0 new messages