Generally, flatbuffers currently need a copy step to retrieve data such as the API's are currently designed - can't speak of all languages though.
But there is no reason why a buffer cannot be generated directly into huge virtual address range back to front using mmap or the equivalent on windows. However, since the there is a 2GB limitation on the data, it is much simpler to just allocated a 2GB working buffer and start from back to front or front to back. The result is about the same on x64 platforms. The mmap approach has the downside that it page faults every 4KB or so unless memory is committed up front, which is exactly what a large malloc would do.
The above does not work well when dealing with a large amount of concurrent flatbuffer jobs because you need to commit significant memory.
When you are memory constrained due to many jobs (like in a web server), you need to reallocate as you grow the buffer. Here it also doesn't matter much which way you build the buffer, except back to front makes bookkeeping more interesting internally using negative offsets.
Where back-to-front really has a disadvantage is on-the-fly streaming where the full buffer is not constructed before transmission. Here the receiver needs to deal with negative offsets and reconstruct data rather than just using a dynamically growing array. For this reason I'd like to see front-to-back support.
Another issue is that in FlatCC which maintains a stack, there is extra copying in some cases due the back to front ordering, but it is not likely that this will be reimplemented more efficiently with front to back due to the complexity of it all.
Still, signed offsets would add a lot of freedom for optimizations. If I were to start a new implementation, I really would have appreciated signed offsets.
But mostly it is more a matter of perception than a real problem.
As to verification:
FlatCC for C makes the same assumption about the unsigned offsets as FlatC for C++. But it also has a max nesting level setting. The verification is limited because it cannot trivially protect against malicously formed overlapping regions that do not exhibiti detectable buffer overruns but still makes in-place buffer modification unsafe. For this reason, I think the verification might as well just check for:
max nesting level
out of bounds
no 0 offset (self reference)
It might make some range checks a bit more expensive, or the opposite, compared to the current algorithm.
A more advanced check could also protect against cycles and overlapping regions, at a significant cost.