This thread is probably not the right thread for a detailed discussion, but a quick summary of issues I vaguely recall:
There are two kinds of information needed: the information to allow exact code generations, and information to make code generation simple. The last part could lead to bloat.
Also, the current schema uses a DAG in a few places which makes efficient JSON translation impossible which is unfortunate for those languages that prefer to bootstrap from JSON, notably a more complex binary schema could be heavy to support in each new language.
Different languages generate code differently, but one requirement I have is that each schema should generate the exact same set of targetfiles regardless of whether the it was included or a root schema.
Identifiers are now fully qualitfied. This is big improvement.
Each identifier need to specify which file it belongs to (I think), and there need to be a list of files and a map to the the symbols it contains. The should be topologically sorted list of structs and tables based on their interdependencies. There should probably be two ordered list for a struct, one for definition order, and one ordered by attribute criteria, or this could be skipped and already decided before generating the binary schema. However, ideally the binary schema is able to print the text again without loss of information.
A special problem is the scenario where root schema file A includes file B and C. When C is used as root, it cannot see content of B, but if used in A, B might accidentally see content of B. Therefore each each schema file has its own view of the visible symbol table. If the code is always generated only for one specific root, the semantics can change so C may see B but then code generation becomes ambiguous.
Scope resolution (as flatcc defines it at least) starts by the scope of the current table unqualified, then relative the active namespace for the given table, and then as a fully qualfiied scope. This require that each symbol has information about these scope contexts.
The flatcc code generator the for JSON parser actually emits such scope tables in order to handle enum name lookups correctly, and it isn't as bad as it sounds, but it does prevent certain simplifications.
In addition there may be a need for more detailed alignment hints, to simplify codegenerations, but I am not sure, the current schema already has some information.