I'd like to continue the discussion on this topic assuming someone is interested.
Following the discussion above, I've tried the approach of setting build order making sure that targets with .h extension are build before object files.
This didn't work well, because some of the .h files are generated using tools which are built. Thus, I can't make all .o to depend on all .h, as this creates loop.
Then I've tried to classify .o to objects which are for 'tools' and all the rest which can depend on all .o files. This didn't as well, because objects of some of the tools were depending on headers generated by other tools.
Given that, I realized I need to change ninja to re-read dep files during the build, and it helped indeed (fixing that took about the same time as above and was more fun, actually).
The patch is here:
https://github.com/maximuska/ninja/commit/2cbfe92ad2a1f61c83c791417ea20ec8b30b12bc.
It works like that:
Pick a ready edge, re-read dep file, update input dependencies, re-evaluate dirtiness, and (re)add all dependent edges to Plan.
Then check that AllInputsReady() and if not - simply go pick another (ready) edge from the Plan.
Some of the operations in "RecomputeDirty()" can be redundant in this scenario, but I am not feel very comfortable with the code yet to decide.
Overall, it works for me now, but I still have to keep some artificial ordering on building .h files:
Following the patch, the dep file can be read multiple times (basically, at most 'number of dependency changes' times per target. This is good, as it allows discovering automatically generated header files layer by layer (when there are multiple), but there is a problem with _building_ the deps. What I did was creating an artificial rule with '$out.d.' target with the same dependencies as '$out' has and making $out target to depend on it. This rebuilds deps just before the .o is built. The advantage is that deps are rebuilt as a task, not blocking main loop. The problem is that edges are only built once, by design, thus this dep rule will discover only the 1st layer of included headers, which is almost good but still requires adding some manual ordering rules.
It can be trivial to add a 'deps_evaluation_command' to be executed before rule 'command' on the main loop, before re-reading dependencies, but this will block and is not optimal, obviously.
Do you have any elegant suggestions to do this properly?