The array is the most fundamental and important data structure in computer science. If a problem can be efficiently solved with arrays
there is not likely to be a more optimal solution on modern hardware.
The C++ standard library provides only 2 array types. A static and fixed size std::array<T,N> and a dynamic and growable std::vector<T,N>.
These are a good foundation, but there are a lot of different array variations.
In performance critical code, I've found all of these to be very useful.
I'd really like to see these in the standard and am considering writing a paper. Many of them are already being discussed elsewhere or have active proposals. Here is the rough outline of the idea. A real paper would need a lot more detail and examples of course.
The different types of arrays you can create are split across a few dimensions. All of them have trade-offs.
One dimension is the storage policy:
static - The size is determined at compile time. When you can set compile time upper bounds on storage, you can construct tightly packed
data structures with no pointer indirections. That's about the best you could hope for in terms of cache efficiency. The limitation of course
is the fixed size. What do you do if the application wants to use more? Silently truncate? Throw an error?
dynamic - The size is determined at runtime and allocated on the heap (maybe stack via magic tricks like alloca()). This is of course more
flexible for all the obvious reasons, and can be safety created locally on the stack without worrying about using too much memory.
Dyanmic arrays can also be moved efficiently O(1) by swapping pointers, whereas with the static array you're forced to move all of
individual members O(N) in sequence. The downside are the costs of allocation and deallocation. Also if you have structures with lots
of different vectors, you'll be risking cache misses and as you do random access memory hops when accessing the different buffers.
sbo - The small buffer optimization where you have some small amount of static memory and fallback to dynamic allocation when the
capacity exceeds that amount. This is a great compromise, especially if you have a system where rare outliers in memory use need to
be handled and its acceptable to pay a penality for it. Its also not hard to add to some instrumentation and tune the static sizes after
the fact so that for the 80 part of the 80 / 20 rule, your application runs super fast and doesn't allocate memory.
Another is whether the storage is fixed or growable.
Growable Storage - This is std::vector. It is the most convenient and default option. It always works and the memory is laid out
linearly so normal access is super fast. Growable means in theory your application can possibly run out of memory and crash. This is not
such a bad thing if the limits are sufficiently high and you are not concerned about or are otherwise protected from malicious denial of
service attacks. It also leaves room for hardware memory upgrades to improve the capabilities of the
application. Growable arrays require that you have different concepts of "size" and "capacity", which means you need to store more
data in the array object to keep track of this information. The automatic resizing while amortized is still expensive when it happens, and
if you have a real-time application that can't handle random slowdowns like this then automatically growable arrays may be off the table.
Fixed Storage - Fixed size storage is less flexible but offers some performance benefits. Mostly in that you don't need to keep track of a
capacity, you can also choose to encode the size at compile time (std::array). Since the size is fixed, dynamic memory allocation can be
done once and done exactly for the space requested.
Fixed size arrays can also store non-copyable/non-movable elements, whereas growable arrays require the objects can be relocated or copied.
One big limitation fixed storage has right now is that as soon as you construct the array itself, all of the elements have to be constructed too.
If your T's are moveable, its easy to just default construct the array, and then loop through and carefully construct the real data on the side and then move it in.
If T's default constructor is complex and not inline, this default and then move approach can be less efficient. Its also impossible if T is not movable or copyable.
For non-movable types, fixed arrays make it difficult to pass different constructor arguments, or even call different constructors for each individual element.
I once rigged up a technique for doing this using lambdas and iterators, but it wasn't pretty.
Finally, the last dimension is ownership.
Owns the data - Pretty much all array types actually own and manage the underlying buffer.
Does not own the data - This is array_view, which I'll describe more later.
So given all of these dimensions, how can we combine them to create useful types. Here are the ones I've used:
Note that template arguments are:
T - the type of the data
N - a compile time size
A - an allocator
vector<T,A> - Dynamic and GrowableThe standard bread and butter of C++. Any implementation will require 3 data members for the data pointer, size, and capacity. It can be implemented using a pointer and 2 ints, or 3 pointers. On 64 bit platforms, std::vector will likely be 24 bytes. If you know you will never need to store more than 4 billion elements (an extremely reasonable assumption), you can shave it down to 16 bytes using a T* and 2 uint32_t's. Unfortunately, std::vector doesn't provide a way for us to switch on this optimization.
There's also the choice of the growth factor. 2 being the common naive one and 1.5 providing better underlying memory reuse.
Finally, theres also the discussion of realloc() that keeps popping up, and whether or not a standard version of realloc() would make the vector resize operation more efficient.
static_vector<T,N> - Static capacity, growableThis data structure has the exact same interface as std::vector. The difference is that the capacity is set at compile time and not stored within the object. If you try to insert an element once the capacity is full the container will throw an exception instead of allocating.
This one is less common. You get the cache benefits of a compile time capacity, which means you can embed this object directly in other data structures without pointer indirections. Its essentially a std::array<T,N>, with an extra counter embedded inside to keep track of how many objects are "active".
The resulting size of this data structure will be sizeof(T) * N + the size of a 64 bit or 32 bit size counter, with additional padding as needed.
One alternative to this is using an array<T,N> and having some state of T represent whether or not the object is active.
A very common example is an array<T*,N> where the index of the first null pointer indicates the "size". This is actually faster to iterate in a for loop than static_vector<T,N> since you can do only a single null check. You don't need to store a counter, but you may need to store a dummy sentinel at the end to terminate the loop. The downside is that the size() operation becomes O(N) instead of O(1).
For other cases, you may need to store an additional bool to say whether its "alive" or not. In terms of memory usage, you trade off the cost of a single counter vs
storing additional "alive" state in your T object. Because of padding,
storing an additional flag can actually be very expensive in terms of
memory use and cache efficiency. For non-pod complex types, it also requires
that all of them are "constructed" all the time, which may not be ideal.
small_vector<T,N,A> - Sbo capacity, growable.There have been a lot of other threads talking about small_vector and examples of its use in the wild. Examples include llvm, google, facebook, game engines, and more.. So I'm not going to go into huge detail here. In C++11, std::string already uses the sbo storage policy.
small_vector is a probabilistic bet. If most of the time your maximum size is <= N, you have the potential to gain large performance benefits. In particular a vector<small_vector<T,N>> will end up being a single linear buffer. If you end up exceeding N most of the time, that sbo space just becomes wasted memory and a simple vector would be faster.
Any small_vector implementation must allow for N to be configurable by the user. Unlike a string, vector has a nearly infinite set of use cases and performance critical applications will need to tune it carefully.
array<T,N> - Static and Fixed (also T[N])The fixed size statically allocated array. You get all of the benefits of the memory layout along with the limitations of fixed storage mentioned before.
The size is of course sizeof(T) * N.
Not much to say here, we've all been using this for decades.
dyn_array<T,A> - Dynamic and FixedNot to be confused with the earlier attempt at creating a magic array type with stack allocation.
Of all of the new types mentioned here, after small_vector, this is the most important one missing from the standard.
Essentially, this is like a std::vector without the insert, push_back, erase, methods. Not having these capabilities seems like a limitation, but its also a strength.
* The underlying memory store can be allocated exactly and only once during construction. vector implementations may speculatively allocate more, betting that you will call push_back() later. Unless you use reserve(), inserting a bunch of things into a vector will result in multiple memory allocations and deallocations. Slowing down your application and fragmenting your heap.
* Smaller, more cache friendly footprint. You only need a single pointer and a size, so on 64bit systems, sizeof(dyn_array<T,A>) == 16. On 32 bit systems, the size is only 8.
* Works with non-movable types, with the construction caveats I mentioned before.
* You can store const T, and still be able to move the array but not copy it.
* Application correctness: If the use case demands an array that cannot be resized, dyn_array provides exactly that. It becomes impossible for users to break this contract.
Compared to std::array, the advantages are:
* Determine the size at runtime
* O(1) move operations
* No worry about blowing out stack space with large allocations.
Finally, unlike array, dyn_array still has the ability to be resized at runtime. Its just that unlike vector, it has to be done manually be the user. In my implementations of this, I've just allowed copy and/or move operations to change the size. You could also argue that including the resize() method still makes sense.
auto a = dyn_array<int>(5, 0); //Initialize to {0, 0, 0, 0, 0}
a = dyn_array<int>(2, 1); //Change to {1, 1}
Finally, if you need some flexibility you can also create your data with a std::vector, and then move it into a dyn_array. If these types are using the same allocator, we should allow them to be movable to each other
//Do setup
std::vector<T> temp;
temp.emplace_back(stuff..);
temp.emplace_back(more complicated stuff...);
//All done, now do permanent storage.
auto permanent = dyn_array<T>(std::move(temp));
array_view<T> - Non-owning viewThere have already been several proposals for this. Personally I love array_view and use it all the time.
The best part is that it can be used with all of the above array types. This allows you to write generic functions which accept any kind of array. Like string_view, it allows generic code without the complexity and compile time overhead of using templates. Bjarne Stroustrup's GSL library has this and they call it "span<T>".
The standard 1-dimensional array_view requires only a begin and end pointer.
Theres also discussion of how to handle multiple dimensions. I can have a flat buffer of memory and construct a multi-dimension view over it.
While multi-dimensional views can be great for numeric code, I find that the single dimensional array_view is an absolutely essential tool. Again 80 / 20 principle.
For every dimension you add, you need to include additional state to store it, either in memory at runtime or at compile time. I'm leaving this subject alone for now as the various array_view proposals are trying to tackle it.
static_array_view<T,N> - Non-owning view with static size.This one seems a bit odd and obscure. Essentially this object stores only a single pointer and has a compile time configured size.
Why would you possibly want to use this thing? Personally I invented this one time because there was some generic code I wanted to optimize for single element case and multi-element case.
The generic code was written for array_view<T> and looped over it several times doing all kinds of nasty business logic. This code was absolutely performance critical. It was extremely common that we would pass in an array_view with size 1, with some scenarios never using the multi-element capability at all.
Writing 2 versions of the code was a maintenance nightmare I didn't want to deal with. Instead of writing all of this code twice, once with loops and again without them I simply templated it.
The multi-element case templated to array_view<T>. The single-element case templated to static_array_view<T,1>. In the static case, the compiler was smart enough to remove all of the loops and compile the code down as if I had written it in the single element.
All of these array types are not impossible to write on your own but its tedious and complicated. It takes a lot of time to write all of the access methods and then you must unit test it hard as its easy to have a bug in one of them that your application happens to not use often or ever.
Would you use any of these types in your code?