Greetings from Berlin :)
2013/6/5 David Pineau <
dav.p...@gmail.com>:
> Greetings, people !
>
> As you may all know, we planned to start the re-write of the compiler in May
> or in June. Unfortunately, we didn't get any news from Lionel since the last
> hackathon.
>
> Anyways, I think it's time to start thinking about how we are going to
> design
> the new compiler. The current development status being more of a
> trial-and-error kind of thing, the design is quite perfectible, and we plan
> to use
> this opportunity to its fullest.
>
> ~~~~
>
> First of all, we need to identify every single component or compilation
> step,
> in order to isolate them (both for testing purposes and ease of
> development).
> Here is the list I can identify (complete/comment it where you see fit):
>
> Compilation steps:
> - Parsing (Takes the code in, builds an AST)
> - PlaceHolder Parsing (Identifies back-end placeholders, and parse them)
> - Introspection (Identifies C/RTX declarations, annotates type trees)
Doesn't the identification (as long as the placeholder identification)
take place when you parse ?
These 3 step would be the parsing box ?
> - Type Checking (Tries to fully match -qualify- rathaxes types against
> available interfaces)
Yup (and we would lookup in the cache if some of the referenced
interfaces are already compiled ?)
> - Registration (Not actually a compilation step; registers
> interfaces/templates/drivers to the cache)
> - Generation (Takes the parsed driver, generated the C code for it)
For the generation I agree but should we already think about the tool
generation ? (I mean the makefile for example)
>
> Components:
> - Tree structure (AST nodes definitions)
> - Parser
> - Cache (registers and loads a bunch of info about the saved interfaces and
> templates)
> - Linker (Uses the cache to relate templates for any other component that
> needs it)
> - Type checker (compilation step; ensures that the types are consistent
> with
> registered interfaces)
> - Resolver
Didn't you forget the structure representing an interface too ? Maybe
something more "functionally aware" than an ast would be
simpler to manipulate. I imagine this "class" as a wrapper over an ast.
>
> Currently unsolved issues (with their proposed solution):
> - Generation process does not take note properly of optional/required
> sequences.
> -> The solution is to rework the whole generation process (more about
> that afterwards)
> - We have no definite way to manage unknown types (kernel-defined types
> for instance) in the code of a template.
Didn't the cnorm can parse them ?
> -> A new syntax to declare a C type may be a useful placeHolder, which
> must be checked together when weaving the trees.
>
> ~~~~
>
> For now, that will be all for this list. Now, allow me to describe my
> thoughts
> about the redesign, and how I envision things. I do not necessarily believe
> that each component I identified should translate to a class; but they have
> to
> be properly identifiable in the code.
Agreed.
>
> Code Organization:
> Currently, the code is organized by compilation steps. A purely procedural
> way
> of organizing code, which is half helpful, and half hindering the
> development
> efforts.
>
> Let me elaborate on that: It is helpful because it makes it easy to
> search for code in a specific compilation step. But at the same time, it's
> harmful because when we try to add/fix/modify a feature into the mix, we may
> have to touch multiple files, while no documents tells which files contains
> all
> the code for which feature.
Can't we be inspired by the way llvm and clang are organized ? I'm not
competent to tell how we should do it but maybe Lionel can help us
with that.
I think, since we will be using Python we should have clearly
identified block of features. Otherwise it would be a nightmare to
understand the call graph.
>
> This is why I believe that following the flow of the rewrite, we should
> completely
> reorganize our code. I mean, we were working with a purely procedural
> language,
> so we were limited by choice. Now that we're moving on to python. we have a
> full-fledged object language at our disposal. I want your opinion on this:
> ~
> Do you think that it may be wise to associate an object = a feature ?
Or a module.
> That is, to use hooks within each compilation steps to run feature-specific
> code
Yes
> when it has a meaning. This way, we could concentrate a feature's code in
> one
> place, hopefully making it easier to add/fix/re-factor/modify.
Sounds good.
> ~
> This is one of my current thinking subjects, and I would really appreciate
> your
> input on that subject.
>
>
> Code Generation:
> As I wrote in the previous listing, the code is currently being generated by
> a
> kind-of forceful way. We merely recursively try to pull everything that fits
> into
> a place-holder (pointcut, and such). This has the demerits that is does not
> fully
> comply to our multi-platform and flexibility goals: Only include the code
> that is
> necessary to include.
Is it really a problem at this moment ?
>
> During previous hackathons we have discussed it a few times, and a new
> generation model began to take form. The basic idea is to separate the
> generation process into multiple steps: first a pre-selection of the
> templates
> depending on the generation configuration; followed by a second selection,
> based on what sequences are implemented and/or used in the driver's code,
> and finally, only weave the final code tree by only using those selected
> templates (and those that they explicitly use).
>
Logical.
> This method is not fully designed yet, but will heavily rely on the typing
> system
> (on the side of the middle-end language), as well as the linker, that will
> have to
> use the cache to build the proper pre-selection of templates. This means
> that
> we have to take special care while designing the cache, the linker, and the
> typing system.
I know the cache is important, but in the begining if we decide to
reparse everything it may not be a problem if we can't come
up with a definitive AST in the first steps. Or do I miss something ?
>
> ~~~~
>
> This leads me to the last part of this fat mail: In order to re-design, we
> must
> identify every requirements for each component (and data structure ?).
> This may be the part where everyone's help is most precious.
>
> The focus here will be over the Cache and the linker, the two central
> components.
> Since their roles are quite related and it's hard to separate, I'll simply
> list the
> known requirements; and we'll separate them at a later time.
>
> What I know for sure for the Cache/Linker is:
> - We need to register a bunch of data:
> > Source files (Interfaces, templates, drivers)
> > Compiled files (idem)
> - It has to be structured/indexed:
> > By Interface (for interface search and the typing system)
> > By configuration (for pre-selection by config)
> > By Template (for explicit template selection)
> > By dependency (for the resolver to easily select what's used or not)
It will be fun ... It reminds me graph databases.
> - We need to be able to do some structured queries, for an informational
> purpose (listing the support for a given config/template for instance):
> > List the currently registered / not registered templates for a given
> interface
> and a given configuration
> > List the configurations supported by the registered code for a given
> template
> - We need to identify multiple flavors for the cache:
> > System cache (Library of Interfaces, Templates and drivers stored
> system-wide)
> > User cache (Registered Interfaces; Templates and drivers stored in the
> user's data)
>
We may start to thing to an actual database...
>
> Which leads me to ask:
> - Which functionalities are "Linker" or "Cache" ?
> - Is it necessary/Useful/Doable to separate both ?
Between the beginning and the end of my reading, I'm not sure anymore
of what is the cache.
Can you recall me ?
For me, it was just an optimization permitting not to compile each
time the back-end files. If i'm right,
can we start without it, and add it when we will have a more stable
compilation ?
>
>
>
> Next is the new resolver algorithm.
> The aim is to get a finer resolution that does not include unused code into
> the
> generated driver. Meaning: Generate only what you need/want.
>
> First step, a first big selection on the cache is done through matching the
> configuration against the multiple templates' constraints.
>
> Second step, by using the cache's dependency system, and selecting
> through vthe templates and sequences used/implemented (or not) in the
> driver, we can get only what's used for the last step.
>
> Finally, the last step is the part that matches the closest with the current
> resolver algoritm, since it's the one that recursively descends through the
> placeholders; and resolves them by using the cache/linker. The only new
> part here is that it won't take everything in, but only what has been
> previously selected.
>
If I understand properly, the resolver matches the code from the rtx
with the one
in the backend, BUT, in the backend we won't be using everything.
I can't see how you can detect unused code without false positive:
You will need to follow each call/reference and add what is needed. But what is
you entry point ? How can you indicate if a symbol is not required by
the kernel ?
>
>
> The last part I want to elaborate on is my thought about how we are going
> to structure our code and how we are going to work with a new language
> that offers multiple object features.
>
> I explained that I was thinking about how to easily add a new feature,
> since we currently need to dive into multiple files for each feature that
> we want to add, which is actually quite an efficiency sink. I want us to be
> able to add new features at a much faster rate, thanks to a new and useful
> design.
>
> Sadly, I have no miracle solution; and I feel that "One object for a
> feature"
> is kind of a stupid mistake we must avoid at all costs. Do you have any
> proposition about this specific issue ? We could have for example; say...
> One base class for each compilation step, which would be added to an
> internal list of features within that specific compilation step's code.
> Then,
> we could regroup the classes for each feature in a file, or something of
> the likes ?
>
> Don't hesitate commenting those Ideas. I am a mere beginner in python,
> and every single piece of advice that could help is welcome; be it as a
> guideline or as a design advice.
For python, let's say that we work with attributes instead of interfaces/class.
However, I don't have your experience in parser/compiler to answer
this question.
That's why we should see how the others have achieve this goal.
>
>
> ~~~~
>
> Well, that was a bit long, but I hope it wasn't too hard to read through :)
Hard as fuck the morning, but really interesting though.
>
> I may have missed many things, but you are of course all welcome to fix
> it by talking about it.
>
> TL;DR: Here are my thoughts to use as a base for re-design discussions
> for the soon-to-come python rewrite of rathaxes. Please read it, and let's
> discuss it !
Next time, put it in the beginning :D
--
Thomas Sanchez