"(There’s an apparent inconsistency for the allocation size types shown here. In fact our actual
custom vector code also uses a tUnsigned32 typedef for it’s size type,
but I replaced this with ‘size_type’ in the example code to reduce
dependencies. Lets just consider size_type as equivalent to tUnsigned32
for the purpose of these examples!)" - Why not just use size_type in the earlier example? (and fix up later examples to match)
"Passing the allocator by reference helped us with controlling
constructor overloads, I think, but looking back this is perhaps a
mistake, and pointers might be a better choice." - Why would pointers be a better choice? Are you ever going to support being able to switch allocators in a vector? Might you rather have the vector control the lifetime of the allocator? I would say either explain why you would rather use pointers or just remove this comment. Just leaving it as-is just casts some doubt on your approach since you admit you aren't happy with it.
The code blocks slightly past the halfway mark go off the screen for me (viewing it on a 1920x1080 screen).
I don't quite understand why you talk about the BitSquid allocators other than to show different ways of implementing custom allocators. I think this could be removed and just replaced with a link (ie for examples of other ways to implement custom allocators, check out the BitSquid custom allocator scheme).
I'm also not a real fan of the fact you post code and then immediately after keep on having comments that effectively say, "Now the way we should have done it is..." I can understand the thought process behind here, but it just keeps me asking myself, "So, why didn't you just do that then?" It also makes the reader then go through the code again taking into account your suggestions of how you might do it differently.
"Buut the speedup", "sequencially" (might be other spelling errors that I missed)
"Nevertheless, if we’re asking whether custom allocation with realloc()
is ‘necessary or better’ in the specific case of PathEngine vector use
(and these specific benchmarks) the answer appears to be that no this
doesn’t really seem to make any concrete difference!" - Considering that there wasn't much in terms of final payoff for reading through all the realloc stuff, it feels like it wasn't worth it. I would heavily suggest trimming all that stuff down significantly. You could move the benchmark code somewhere else and then just have links to them. You can definitely still cover the fact that in your benchmarks you saw speed improvements (and the Windows vs Linux stuff), but I would remove most anything ancillary to that. As it has been noted by others and yourself, this post is quite long and I feel this would be a good area to remove stuff.