I am currently working on a game engine in my company, and I am investigating different build system in order to escape our current "batch script + MSBuild" situation which is a nightmare to maintain.
The engine build is quite complex, there are a lot of tools, multiple target platforms, code generators, and asset conventerto generate runfiles. (you can have more details on my previous post). The build include some custom tools which all have to deal with dynamic dependencies (like header files in C++).
In my case, the issue can be split in three situations:Situation 1: Code generators
code_generator.exe --output_header=my_service.h --output=my_service.cpp my_service.idl
In that situation my_service.idl could reference other .idl file defining some data type. Meaning that my_service.h and my_service.cpp can dynamically depends on other .idl file.
Situation 2: Shader Compiler
ShaderCompiler.exe --target=Win64 --include_dir=. --include_dir=../shader_include --output=my_shader.bin my_service.glsl
The situation is very similar to C++, my_service.glsl could depends on header files available in given include directory.
Situation 3: Asset Compiler
TextureCompiler.exe --target=Win64 -o my_texture.bin my_texture.tex
In that situation my_texture.tex is a file describe the texture properties, inside it references an image file (e.g my_texture.png), and it could reference other preset files.
I have read some years ago it was not yet possible to deal with dynamic depnedencies in Starlark, and I have not read anything new about the issue since that time. Is there anything new about it. Is it on the long term roadmap?
In the meantime, what are the workaround possible for the different scenario above?
For the Situation 1, have been adviced to add all .idl files as dependencies of any .idl target (Solution A). I think it is fine in that case because code generator is a very fast build step. Regenerating everything everytime I do a change is accetable. But it would not be acceptable in the Situation 2. Indeed compiling shader is much slower. If by changing one header file it trigger the rebuild of all the shaders, that would be very painful. And regarding Situation 3, that solution would be impossible because we cannot predict in advance which image file
could be used. Would it be .png? .bmp? .tga? We kind of name? A lot a different folder possible, etc.
Another solution would be for those build step to make bazel action run another build system behind the scene, anything which could skip running the real action if not necessary (Solution B). Example FileTracker of MSBuild's .vcxproj or calling ninja managing depfile, ... But those action would not be hermetic, by the would read of write dependency file unmanaged by Bazel. So I guess it would not work with Bazel? Or would be dangerous/error prone?
Otherwise I could generate code before the bazel build and compile shader and asset after the bazel build (Solution C).
It is not very satisfying because I would be back to today's situation where my build system would be some custom script
stiching other build system together. It feels I would lose the major benefit of Bazel.Conclusion,
- I wish the Bazel Community can provide some tips about how I can handle dynamic dependencies today. :)
- Also I have a meta-question about the dynamic dependencies situation. I guess if the issue is not handle by Bazel today, its because it does not appear as an important feature for the majority of user. It is very surprising to me, because from my point of view dynamic dependencies is every where regarding the concept of build system (C/C++, shader compilers, protobuf, other code generator, data converter, etc.). Am I biased because it is actually just a problem for the game industry? What is your opinion?
In the end it all boils down to:1. Find out what the exact input for each run is2. Find out what the exact output for each run is3. Code that into a rule(4. Find tune using Aspects)5. Repeat
Dynamic dependencies are only great if you don't have so scale.
If you want to scale you want to (correctly!) hit caches. This problem is very well understood inside of games. In order to hit caches you have to align and format your data correctly so that the underlying system can correctly reason about dependencies and only do a minimal set of work. This is basically not that different between CPU and Bazel caches.
@lcidfire, thank you very much for your answer. I am glad to get feedback about Bazel from other people of the game industry.
First, I would to like to explain how shader compilation could work according to your suggestion. I want to make sure I have understoop your solution:Because it is not possible to have dynamic dependencies, whatever the action graph required to compile a shader, in the BUILD file, a shader somehow depends on a list of all the header files which could be included.
The build can be split in two steps, for instance a pre-processing step gathering all included file into one intermediate file, then a regular compilation step of that intermediate file.
Every time I am changing a header file of the list above, every shader will have to be recompiled. First the pre-processing step will be run for all the shaders. Then thanks to bazel cache, the compilation step will run only for shaders whose pre-processing step gave a result which is not in cache. Other shader will be fetch from the cache.
Because the pre-processing step is much faster than the compilation, its kind of okay to pre-process all the shader everytime.
I am not very familiar with Aspects, I have read the documentation but it never came to my mind how Aspects could help me better defining the build graph. Do you have any example to show how Aspects could be useful in our context (game engine build, game assets build)?(4. Find tune using Aspects)5. Repeat
I feel I am misunderstanding something, because today I have the opposite opinion: a build system which does not support dynamic dependencies is intended not to scale at some point.Dynamic dependencies are only great if you don't have so scale.
In the scenario where it is impossible to know all the inputs during the build declaration (for instance building C++
(for instance building C++, or the shader compilation scenario above), you have to overdefine the list of your input. For instance for C++ you can make your target depend on all the .h file existing in the include directory list. If you do that all the .cpp files will be recompiled for any header which as been modified.
You can do the trick of preprocessing first and then only compile what cannot be fetched from the cache. It is definitely better, but I would not say that method "scale". Because if you increase the number of .cpp file to be compiled, at some point pre-processing all the file will not be neglictable anymore.
We could consider the solution where "the scenario where it is impossible to know all the inputs during the build declaration" is forbidden and considered as a bad practice. But it feels to me it is not sustainable. Not only because legacy languages are forcing us to deal with that problem, but because behind the issue of dynamic dependencies lies the general concept of composability.
As an input data author (whatever it is C++ source file, shader, asset, ...), by compositing together data referenced in other file than the input, you are creating a dynamic dependency, i.e a dependency which lies in the data and which cannot be known during the build declaration.
To give a concrete example, let's consider some data which define a texture material for a 3D model. This data could be a complex image processing graph defining how to merge several textures together. Building that data would output one texture. The input of that build step is the processing graph, but inside the graph lies the reference to all input texture. Those input texture cannot be defined in the BUILD because the artist is free to select and add any texture from a large list of textures. You could ask the artist to also change the BUILD file every time he adds/remove a texture from the graph but that would kill his user experience. You could re-generate the BUILD file according to the input data, but because you cannot predict which BUILD file needs to be re-generated, this generation step would need to be run everytime before any build system for every similar asset.
Forbidding dynamic dependencies means forbidding composability inside your data.I see a conceptual difference between CPU cache and Bazel cache.If you want to scale you want to (correctly!) hit caches. This problem is very well understood inside of games. In order to hit caches you have to align and format your data correctly so that the underlying system can correctly reason about dependencies and only do a minimal set of work. This is basically not that different between CPU and Bazel caches.
In the context of a CPU a trade off has been made. Your data has to be as cache-friendly as possible and we give up on user friendliness. Those data are not intended to be authored by human but came from an upstream process (loading or build in advance).On the other hand, in the context of Bazel, because the purpose of a build system itself is to convert from a user friendly data format to another format, the trade-off cannot be all in toward cache friendliness. Otherwise people will starting building a build system on top of Bazel to generate bazel optimized input data from user friendly data, defeating the all purpose.