Hey Paul,
> 1: improve regex parsing of the source code
As mentioned by others: I'm also hesitant to further rely on regex
parsing, since it's tricky to get right. It also does not solve the
"#include inside #if" problem, which the next two options do solve.
> 2: replace regex parsing with coan
> 3: replace regex parsing with gcc -M (run only the preprocessor)
I hadn't looked at coan before, but it seems useful. Essentially, I
think it's a sort of preprocessort where you can have a bit more control
about what it outputs. In particular, it allows outputting a list of
includes, which seems a good fit.
I think both of these options would operate in a similar way, as you
already implmented in #2792. They could process the source, finding the
first missing include, adding a library for that to the include path,
find the next missing include, etc.
When looking at the gcc -M approach, I do have some doubts: where should
the gcc -M command be stored? Hardcoding it in the java source seems
counter to the recipe approach used for the normal compilation (plus
that the java doesn't actually know the name of the gcc binary?). The
current PR takes the regular compilation command and then modifies it by
removing and adding options, which looks like it will cause problems at
some point. What if there is a platform that uses clang or some other
non-gcc compiler (not sure if there are any of these). OTOH, if such a
platform exists, it would need some alternative way of generating
dependencies and we might add the needed recipes for that once it
actually pops up... A similar argument applies to the hardcoding of the
gcc -M output parsing.
I think the above can also be solved by using coan. By using a separate,
platform-independent tool, the needed commands and output formats can be
hardcoded (or at least configured outside of platform.txt). This would
make it easier to implement non-gcc based platforms.
However, using coan does pose new challenges: the actual compiler used
contains some implicit defines and other knowledge (int sizes, builtin
functions, etc). Since we're only interested in the preprocessor here,
I think the problem is limited to any macros that the compiler
implicitely defines. In particular, avr-gcc defines some macro for the
current CPU depending on -mmcu, which is again used by e.g. avr/io.h to
select the right io header file. There's also a ton of other macros.
Gcc has a -dM option, which outputs all macros defined during execution
of the preprocessor (including predefined macros), so you can run that
with an empty file to get the implicit macros, which you can then feed
into coan. But now we're back to the same problem as the gcc -M
approach: Where to store this -dM commandline?
I believe that the general approach of #2792 is good, but we should give
some thought to the above issues (even if we leave some for the future,
it's good to have an idea how they can be solved then). I'm not entirely
sure what the best approach is yet. Perhaps a good compromise would be
to use gcc -M, add a new recipe to platform.txt for it, and use Paul's
current implementation that munges the existing compiler command for
platforms that do not define this new recipe? The preprocessor output
parsing can probably be hardcoded, we can't really generalize that until
there is a real usecase for it.
Doing this doesn't fully prepare for all possible future platforms, but
at least prevents harcoding options in the source (except for backward
compatibility, which I think is acceptable).
Furthermore, I'd like to apply this approach to sketches as well, since
it will likely end up better than the regex-based approach and support
conditional includes. There is a small chance of compatibility problems,
but I think would only be corner cases that aren't of practical value.
It seems fine to first implement this for libraries and then sketches,
though.
Gr.
Matthijs