Hi Pluto,
The short answer is that if you have no memory constraints, but just need to use something that is not malloc, then use the macros. If you need detailed control over either very small amounts of memory allocations, or if you need detailed control over how a buffer fails when out of memory, then use a custom allocator. And you may also want to consider a custom emitter. The custom allocator deals with tempory stacks. The emitter is deals with collecting fragments of completed buffer bytes and place the in contiguous memory one way or the other. The default emitter stores data in pages, and finalizes by copying pages to a fresh memory block, then recycles the pages for the next buffer.
As to your second question on aligned allocation, I'll have to dig deeper - I'd be happy to help once I get a better understanding and you decide what you need. But I can say this: buffer finalization has _nothing_ to do with the custom allocator because the finalization step deals with the emitter objects buffer allocation, and the custom allocator deals with temporary stacks before data is emitted from the builder to the emitter. If you provide a custom emitter that just writes to a fixed preallocated memory block that is adequately aligned, you can get away with a memmove to align the buffer (as an example), but you'd have to make sure aligned_free calls in the builder understands this concept - which you should be able to do by overriding the ALIGNED macros unless something is missing in the API.
There are different kinds of ALLOC/FREE macros but they all fall back to one central set of macros by default that makes it easy to replace malloc, realloc, free, aligned_alloc and aligned_free. See the include directory for those default alloc macros.
Longer story:
Regarding the differences there are some historical, and some real reasons for the differences.
Originally the custom allocator was provided to accommodate really constrained environments where you might want to use fixed block allocations that cannot grow, or where you might want to raise some controlled error in case you cannot allocate memory without having to check for return values on all builder calls.
There is only one example of how to use it, and that is the default allocator implemented in the builder code.
The custom allocator takes a memory type as argument because flatcc uses around 7 or so different stack types, mean of which are not used, or will only use a small number of bytes. You can profile you code to see how much you need and than do a fixed preallocation so the stacks never have to grow, and fail hard if the limit is exceeded, just as an example. The default implemention uses malloc to provide memory, and more recently FLATCC_BUILDER macros to abstract away the allocation.
The emitter was, and is, designed to handle high performance buffer transmission where you can ship data before the buffer is complete, but it doesn't work too well with standard flatbuffers. A hypothetical modified flatbuffer version known as StreamBuffers would write buffer data front to back instead of back to front, which makes streaming much simpler. The emitter can also be used to store data on disk direktly, but again, without StreamBuffers there is a final step collect fragments due to the nature of how FlatBuffers are constructed.
At some point in time, someone figured out that the malloc did not exist on FreeRTOS and needed an alternative. In this case, the problem was not that there needed to be more granular allocation control. The default allocator which was fine, but it should call something different from malloc to provide OS memory. By adding ALLOC macros, the system allocator could be renamed and the code would compile as desired.
Later, someone figured out the hard way, the if you can override allocation, but rely on free to free memory from the finalize function, then you might mix up different allocators during linking. Therefore the builder also provides flatcc_builder_free calls.
It is also worth noting that the emitter is pluggable similar to the custom allocator, but it is not the same thing even if it also allocates memory during build.
The aligned allocation has the extra complication that C11 defines aligned_alloc but says that memory should be deallocated with free. It is impossible to create and efficient backport to older systems without a custom aligned_free function on Windows and some other systems. Therefore the non-standard aligned_free which defaults to free on POSIX. And then macros build on top of that to abstract further.