build dependency improvement

57 views
Skip to first unread message

Paul Stoffregen

unread,
May 2, 2015, 8:29:00 AM5/2/15
to devel...@arduino.cc
Arduino has always used a simple but limited way to construct its list
of libraries a sketch requires. In the last two years, at least 5
efforts have been made to improve (links below). All have languished.
My hope, within about a month, is to complete one of these to solve the
long-standing issues. To do this, consensus or an authoritative
decision is required, so the 6th attempt can avoid the same indecisive
fate...

Conversation on this mail list often goes terribly off-topic, especially
regarding Arduino's build system. Please, before anyone suggests
"simply" using a system like Make or Ant, or that Arduino should follow
conventions of other IDEs, please read ALL of this message. Please, I
beg of eveyone, refrain from straying off-topic to Arduino's many
controversial but unrelated design choices. Let's work towards a
consensus on specifically how Arduino should handle build dependency.

Traditional dependency analysis, as Make does, determines which files
must be recompiled. Let's call that "compile dependency". Make requires
a makefile as input, which is the "build dependency" information that
tells which code/modules/libraries should be used. While some impressive
self-modifying makefile hacks exist, generally you don't use Make to
discover build dependency. That's the input *you* give to Make.

Other IDEs generate makefiles. When thinking of Arduino's design in
terms of other systems, as I know many on this mail list love to do, the
process of automatically generating makefiles is the closest analogy to
"build dependency". Most other IDEs present a folders+files style GUI
hierarchy, typically on the left side, which provides visibility and
control over what code the IDE will attempt to build. While powerful,
that approach runs counter to Arduino's design goal of simplicity and
ease of learning.

Arduino tries to deduce build dependency by parsing for #include lines
and matching filenames against lists of known libraries. I'm know many
people who follow this list hate this idea, probably because it isn't
"standard" or "good practice", even though it's pretty similar to what
languages like Python and Perl do. I'm pretty sure any final decision
is going to keep this fundamental approach, so my main goal is how to
improve the parsing of #include lines, so Arduino discovers the correct
and complete list of libraries any sketch truly requires.

I believe work so far falls into 3 broad categories:

1: improve regex parsing of the source code
2: replace regex parsing with coan
3: replace regex parsing with gcc -M (run only the preprocessor)

These latter 2 have the possibility to recognize #if and #ifdef
conditional compilation, to ignore #include lines that won't actually be
used.

All 3 of these involve some combination of applying the parsing to some
or all sketch files, and perhaps to the files within libraries, or maybe
only to libraries (leaving old regex parsing to the sketch files). When
applied to library files, a recursive algorithm is needed to discover
libraries which depend upon other libraries.

There are MANY small details, which have already been discussed on these
pull requests:

https://github.com/arduino/Arduino/pull/2792
https://github.com/arduino/Arduino/pull/2729
https://github.com/arduino/Arduino/pull/2174
https://github.com/arduino/Arduino/pull/1726
https://github.com/arduino/Arduino/pull/1250

There are even more, stretching back to the days before Arduino moved
from Google Code to GitHub!

My long winded point is that a lot of very good work has already been
done on these 5 pull requests. Yet every appearance is all have stalled
and seem to be languishing.

My hope is the "sixth time's a charm".

I really want to do this. I already put quite a lot of work into
#2792. I'm willing to do much more and see it through to completion.
Let's reach a decision, so the next effort, assuming testing proves it
works well, can become merged into Arduino?


giuliano carlini

unread,
May 2, 2015, 4:41:04 PM5/2/15
to devel...@arduino.cc
Can one of the existing efforts be used either “as-is” or as a starting point? You put a lot of work into #2792. Why has it languished? What is lacking that it has not been accepted by the community. What needs to be done to push it through to completion? I fear that opening up the discussion makes it more likely to go off track. Perhaps starting with a concrete proposal might help.

My 2 bits, I mostly just “lurk”, going back to my hole ;->

giuliano
> --
> You received this message because you are subscribed to the Google Groups "Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to developers+...@arduino.cc.

Wayne Holder

unread,
May 2, 2015, 6:57:18 PM5/2/15
to Arduino Developers
I mostly lurk in this forum as a way to keep up with new developments, but I wanted to voice my approval for what Paul is trying to do.  Of the 3 options Paul mentions, I would prefer option 3 (gcc -M) because I think having two different ways for preprocessing to happen is inherently problematic and could result in some hard to diagnose issues.  However, I also agree with Paul's more pragmatic stand on this, as I think any progress (even options 1, or 2) is better than no progress, especially now that Arduino is starting to sweep in so many different processor options (ARM, Intel, etc.)  Not having this capability is going to make supporting these different processor options messier and potentially more difficult.

Wayne

Peter Feerick

unread,
May 2, 2015, 8:45:44 PM5/2/15
to devel...@arduino.cc
Hi Paul,

Another lurker wishing to comment :) I have looked at the commentary for
your #2792, and Fredrico's work on #2729. I see that he said in March
that his commit was doomed to fail on anything more than a "Blink"
sketch, so that is off the table for now? And am I right in thinking
that all that is needed for #2792 to go ahead is some more testing for
weird behaviors and approval?

I like your third suggestion also - GCC already has a standardised
method for describing dependencies, so why reinvent the wheel? It is
also an approach that can be used in sketch preprocessing, so there
would be some consistency... but that is for another thread. I'm
downloading the current build for Windows and will let you know in the
issue tracker if I come across any issues.

And I fully agree... incremental improvements... otherwise it becomes
bigger than Ben Hur and goes nowhere.

Peter

Peter Olson

unread,
May 2, 2015, 8:53:44 PM5/2/15
to developers
> On May 2, 2015 at 8:28 AM Paul Stoffregen <pa...@pjrc.com> wrote:
>
> Arduino has always used a simple but limited way to construct its list
> of libraries a sketch requires. In the last two years, at least 5
> efforts have been made to improve (links below). All have languished.
> My hope, within about a month, is to complete one of these to solve the
> long-standing issues. To do this, consensus or an authoritative
> decision is required, so the 6th attempt can avoid the same indecisive
> fate...

+1 gcc -M

This is not based on any thorough analysis of the problem (which others have
indeed made) but from a general mistrust of regex analysis of source code.

(For example the 1.6.x sketch preprocessing allegedly will be confused by my use
of vertical declaration of functions:

static
void
int
blah(int foo)
{
return foo ? 123 : 7;
}

I haven't actually looked at this recently. I tried to figure out how to fix it
in ctags and/or an addition to the pipeline into the process, but got bogged
down. I am currently stuck on 1.5.8.1 due to having received a shipment of
parts from the other guys.)

Peter Olson

Roger Clark

unread,
May 2, 2015, 11:46:41 PM5/2/15
to devel...@arduino.cc
I've noticed on Linux that the improved object file caching doesnt seem to work so well, especially for third party cores (STM32)

The STM32 core generally has a bit of an issue because its not compiled to a library, (because of issues with weak references that need to be fixed), however on Windows this isn't a big issue as the object caching seems to work fairly well.

But on Linux (tested against 1.6.3) it always seems to compile all files :-(



Matthijs Kooijman

unread,
May 3, 2015, 9:08:02 AM5/3/15
to devel...@arduino.cc
Hi Roger,

On Sun, May 03, 2015 at 01:46:35PM +1000, Roger Clark wrote:
> I've noticed on Linux that the improved object file caching doesnt seem to
> work so well, especially for third party cores (STM32)
>
> The STM32 core generally has a bit of an issue because its not compiled to
> a library, (because of issues with weak references that need to be fixed),
> however on Windows this isn't a big issue as the object caching seems to
> work fairly well.
>
> But on Linux (tested against 1.6.3) it always seems to compile all files :-(
This is not related at all to the topic of this thread. Please start a
new thread, or report an issue on github providing details.

Thanks,

Matthijs
signature.asc

Matthijs Kooijman

unread,
May 3, 2015, 9:44:50 AM5/3/15
to devel...@arduino.cc
Hey Paul,

> 1: improve regex parsing of the source code
As mentioned by others: I'm also hesitant to further rely on regex
parsing, since it's tricky to get right. It also does not solve the
"#include inside #if" problem, which the next two options do solve.

> 2: replace regex parsing with coan
> 3: replace regex parsing with gcc -M (run only the preprocessor)
I hadn't looked at coan before, but it seems useful. Essentially, I
think it's a sort of preprocessort where you can have a bit more control
about what it outputs. In particular, it allows outputting a list of
includes, which seems a good fit.

I think both of these options would operate in a similar way, as you
already implmented in #2792. They could process the source, finding the
first missing include, adding a library for that to the include path,
find the next missing include, etc.

When looking at the gcc -M approach, I do have some doubts: where should
the gcc -M command be stored? Hardcoding it in the java source seems
counter to the recipe approach used for the normal compilation (plus
that the java doesn't actually know the name of the gcc binary?). The
current PR takes the regular compilation command and then modifies it by
removing and adding options, which looks like it will cause problems at
some point. What if there is a platform that uses clang or some other
non-gcc compiler (not sure if there are any of these). OTOH, if such a
platform exists, it would need some alternative way of generating
dependencies and we might add the needed recipes for that once it
actually pops up... A similar argument applies to the hardcoding of the
gcc -M output parsing.

I think the above can also be solved by using coan. By using a separate,
platform-independent tool, the needed commands and output formats can be
hardcoded (or at least configured outside of platform.txt). This would
make it easier to implement non-gcc based platforms.

However, using coan does pose new challenges: the actual compiler used
contains some implicit defines and other knowledge (int sizes, builtin
functions, etc). Since we're only interested in the preprocessor here,
I think the problem is limited to any macros that the compiler
implicitely defines. In particular, avr-gcc defines some macro for the
current CPU depending on -mmcu, which is again used by e.g. avr/io.h to
select the right io header file. There's also a ton of other macros.

Gcc has a -dM option, which outputs all macros defined during execution
of the preprocessor (including predefined macros), so you can run that
with an empty file to get the implicit macros, which you can then feed
into coan. But now we're back to the same problem as the gcc -M
approach: Where to store this -dM commandline?


I believe that the general approach of #2792 is good, but we should give
some thought to the above issues (even if we leave some for the future,
it's good to have an idea how they can be solved then). I'm not entirely
sure what the best approach is yet. Perhaps a good compromise would be
to use gcc -M, add a new recipe to platform.txt for it, and use Paul's
current implementation that munges the existing compiler command for
platforms that do not define this new recipe? The preprocessor output
parsing can probably be hardcoded, we can't really generalize that until
there is a real usecase for it.

Doing this doesn't fully prepare for all possible future platforms, but
at least prevents harcoding options in the source (except for backward
compatibility, which I think is acceptable).


Furthermore, I'd like to apply this approach to sketches as well, since
it will likely end up better than the regex-based approach and support
conditional includes. There is a small chance of compatibility problems,
but I think would only be corner cases that aren't of practical value.
It seems fine to first implement this for libraries and then sketches,
though.

Gr.

Matthijs
signature.asc

Álvaro Lopes

unread,
May 3, 2015, 10:43:02 AM5/3/15
to devel...@arduino.cc, matt...@stdin.nl, Paul Stoffregen
On 03/05/15 14:44, Matthijs Kooijman wrote:

>> 3: replace regex parsing with gcc -M (run only the preprocessor)
> I hadn't looked at coan before, but it seems useful. Essentially, I think it's a sort of preprocessort where you can have a bit more control about what it
> outputs. In particular, it allows outputting a list of includes, which seems a good fit.

And you're adding yet another dependency. Plus, it is not guaranteeed that coan will give you same results as the compiler/preprocessor itself: they are the
ones generating the code afterall, so their output is 100% correct.

> When looking at the gcc -M approach, I do have some doubts: where should the gcc -M command be stored?

Add another target, perhaps, to just figure out dependencies. Fall back to use "-M" if needed, in case platform does not define it, and warn user about it
(this will not be end user, but core developer). Or fall back to a script. Or fall back to old behaviour.

> I think the above can also be solved by using coan. By using a separate, platform-independent tool, the needed commands and output formats can be
> hardcoded (or at least configured outside of platform.txt). This would make it easier to implement non-gcc based platforms.

Nothing that cannot be "fixed" by a new target in platform. I also don't use gcc on one of my platforms (although it's similar - LLVM+clang).
Honestly, if your target compiler/preprocessor cannot generate same output, you can always come up with some wrapper.

> Gcc has a -dM option, which outputs all macros defined during execution of the preprocessor (including predefined macros), so you can run that with an
> empty file to get the implicit macros, which you can then feed into coan. But now we're back to the same problem as the gcc -M approach: Where to store
> this -dM commandline?

Don't. Note also that these defines may change on a file-by-file basis. Honestly, this will be over-engineering, because most of time your compiler can tell
you this info (and you will not have to rely on coan).

Honestly, if compiler is the ultimate "user" of this information, I'd suggest:

a) use compiler/preprocessor to report dependencies, by using a different target in platforms.txt.
b) if it cannot report that information (i.e., not implemented), fall back to "old" dependency search, warn developer he should implement the dependency check
target in platforms.txt, or provide a wrapper to do so.

Paul,
I can eventually give you a hand on this. Please let me know.


Alvie



Paul Stoffregen

unread,
May 3, 2015, 12:16:00 PM5/3/15
to devel...@arduino.cc
On 05/02/2015 01:40 PM, giuliano carlini wrote:
> Can one of the existing efforts be used either “as-is” or as a starting point? You put a lot of work into #2792. Why has it languished? What is lacking that it has not been accepted by the community.

Well, I believe this feels like a big change. Even though we all know
the regex approach has problems, perhaps its simplicity and very long
history are comforting? So far, every effort has exposed a number of
unresolved decisions to be made. The absence of clear consensus and
authoritative decision probably makes change feel riskier. All humans
have a very natural tendency to stick with status quo in such situations.

Not long ago there was a thread about Arduino LLC's leadership. I
believe I said the challenge I intended to make was a friendly challenge
of major contributions that require much more thought and careful
decision making. Well, this is the sort of thing I had in mind.

There are indeed many small details and finer points, as well as
deciding between coan vs gcc -M, to be considered. Hopefully we can
come to some sort of consensus, with clear decisions from Federico &
Cristian. I'm hopeful, if we can achieve that, the 6th time really will
be a charm!

Cristian Maglie

unread,
May 12, 2015, 5:22:20 AM5/12/15
to devel...@arduino.cc

Il 02/05/2015 14:28, Paul Stoffregen ha scritto:
> Arduino has always used a simple but limited way to construct its list
> of libraries a sketch requires. In the last two years, at least 5
> efforts have been made to improve (links below). All have languished.

Hi Paul,

ok, let's move forward on this one.

I've tried your solution https://github.com/arduino/Arduino/pull/2792
it's working fine and I guess it's the best solution ATM, so I'd like to
work on that one.

Before merging it, I'd like to avoid the gcc command-line mangling that
happens inside the Compiler class. Matthijs K. already made a really
good proposal about that:

- use gcc -M, add a new recipe to platform.txt for it
- use Paul's current implementation that munges the existing compiler
command for platforms that do not define this new recipe
- the preprocessor output parsing can probably be hardcoded, we can't
really generalize that until there is a real usecase for it.

Do you think you have the time to work on it?
Otherwise I can work on this one, just tell me what's best for you.

C

William Westfield

unread,
May 15, 2015, 7:53:33 PM5/15/15
to devel...@arduino.cc

> Arduino tries to deduce build dependency by parsing for #include lines and matching filenames against lists of known libraries.

Is it totally out of the question to do something like putting arduino-specific information in easily parseable form, sort of like the way EMACS does “file-local variables” ( http://www.gnu.org/software/emacs/manual/html_node/emacs/Specifying-File-Variables.html#Specifying-File-Variables )

I mean, parsing arbitrary C code is difficult. Parsing something like:

/*
* #Arduino#Version#1.6.3
* #Arduino#CoreLibrary#SoftwareSerial
* #Arduino#UserLibrary#AdafruitHT1632
*/

would be easy. Maybe even something like SCCS Id strings

static char __Arduino_Libs__[] = “SoftwareSerial\n”
“AdafruitHT1632\n”;

(though I’m not sure what advantage that has over something buried in a comment.)

BillW

Reply all
Reply to author
Forward
0 new messages