I was watching the cppcon 2016 talk “Building and Extending the Iterator Hierarchy in a Modern, Multicore World".
At the very beginning, the speaker asks the audience who likes and who loves iterators. Then he goes on to define iterator categories from scratch. While not much of this is exactly new, it really reminded me of the beauty of the standard algorithms and the iterator categories.
For those who haven't seen the talk, he basically defines iterators using a new syntax with concepts lite.
Iterator:
successor(i) -> returns next iterator
operator==() -> compare 2 iterators
Readable:
source(i) -> reads the value from the iterator.
Writable:
sink(i) -> writes the value to the iterator.
Bidirectional Iterator:
predecessor(i) -> returns the prev iterator
Random Access Iterator:
operator+(i,n) -> increment by N
operator-(i,n) -> decrement by N
operator-(i,j) -> difference between 2 iterators
He then goes on to further with contiguous iterators, segmented iterators, etc..
What I found most interesting is that instead of using the operator syntax we already have, he had to define his own in order to make the concepts he was explaining more clear. Finally I thought back to the first question "Who loves iterators?". When I heard this question I thought to myself that I certainly do not love iterators but as I watched the talk I came to realize why. Iterators themselves are actually great. The problem is that whenever I think about iterators, I think about the horrible confusing pain that comes with implementing them.
To get all the way up to random access iterator, we only need the above functions.
However in real C++ code, to write a compliant iterator you have to write all of this stupid boilerplate such as operator->(), operator++(int), operator--(int).
For random access, you get to write non-member operator+(i,j) along with member operator+=(j).
The big mistake is that in the beginning we tried to model the iterator API to match the API of pointers. A pointer is an iterator, but an iterator is not a pointer.
All of these extra functions such as operator +=() and ++(int) are useless. When reading the code, they only serve to obscure the definition of the iterator. For the developer, its more boilerplate functions that have to be unit tested. These functions are notoriously easy to write copy and paste errors. How many of you had a unit test catch a ++ inside of your operator--(int) implementation? Since they are required but not often used, such bugs can easily go unnoticed unless you are rigorous with unit testing.
Think back to your generic code, how useful is iterator post increment really? We even had to invent a rule about not using i++ post increment in for loops because i might be an iterator.
Operator [] for random access iterators is also problematic and pretty much useless. If I'm storing a proxy internally, its much easier to require the external user to increment the iterator (thus incrementing the proxy) and then simply return a reference to the new proxy state using operator*(). With operator[], I have to somehow update some state somewhere and then return a reference to it. Or a return a new proxy that somehow acts like a reference. The whole process is pointless and complicated, especially for a function like operator[] that nobody ever uses with iterators but still has to implement to be technically compliant.
I also find source(i) and sink(i) to be incredibly readable in terms of what happens when you try to read and write to the iterator. operator*() and operator*() const, along with paired operator->() and operator->() const not so much. The requirement of operator->() also greatly complicates using proxies because the result of operator->() needs to be dereferenced to the same object as what is returned by operator*().
Operator * and -> is also confusing when you have an iterator whose value type is a pointer, as you need to do multiple dereferences. source(i)->value is much cleaner than (*i)->value.
Finally, there is a huge teachability problem here. Generic algorithms are one of the key strengths of C++, but in order to teach them we have torture our students with non-sense about post increment and operator->(). All of this just serves to confuse the core idea behind iterators and generic algorithms.
So now, how can we make things better? I can think of a few solutions:
Solution 1: Add more defaulted operators.
This solution preserves the status quo but makes it less painful. It would also help in developing other mathematical and smart pointer types that use operator overloading. Some of this may already be provided by defaulted comparison operators proposals. Basically it would boil down to something like this:
* If T::operator++() is defined and T is copyable, autogenerate T::operator++(int).
* If T::operator--() is defined and T is copyable, autogenerate T::operator--(int).
* If T::operator*() is defined and returns an lvalue reference, autogenerate T::operator->().
* If T::operator*() const is defined and returns an lvalue reference, autogenerate T::operator->() const.
* If T::operator+=(U) is defined, autogenerate operator+(T, U). (or visa-versa)
* If T::operator-=(U) is defined, autogenerate operator-(T, U). (or visa-versa)
* If T::operator+(T,U) is defined and T::operator*() is defined, autogenerate T::operator[](U) (I could see this being problematic)
* If T::operator+(T,U) is defined and T::operator*() const is defined, autogenerate T::operator[](U) const (I could see this being problematic)
* If operator==(T,U) is defined, autogenerate operator!=(T,U). (or visa-versa)
* If operator<(T,U) is defined, autogenerate operator>=(T,U). (or visa-versa)
* If operator>(T,U) is defined, autogenerate operator<=(T,U). (or visa-versa)
* If operator<(T,U) and operator==(T,U) are defined, autogenerate operator>(T,U)
* If operator>(T,U) and operator==(T,U) are defined, autogenerate operator<(T,U)
Obviously, = delete can be used to suppress the generation of such functions and = default can be used to be explicit that you want the compiler generated defaults.
We still have this what I think are ugly and meaning obscuring operators. But at least we have removed a large implementation burden on the writer and and maintainability burden on the reader.
Solution 2: Use a wrapper template which essentially does solution 1 in the library.
I don't like this solution at all. Its less generic because it forces iterator implementations to inherit from some template. It doesn't help iterator types from other code bases which don't use the wrapper. We also don't gain the benefits of defaulted operators for other use cases such as writing mathematical types.
Solution 3: Completely redefine the iterator API, with implementation support backwards compatibility.
In this solution, we essentially redefine the iterator API based on named free functions, similar to the above and deprecate all of the operators. ADL lookup is used to resolve the function calls. The only operators that would be retained are the comparison operators, using the same autogeneration logic I proposed in solution 1.
In the best of all worlds, I would prefer something like this:
* source(i) - read from iterator i. (Readable concept)
* sink(i) - Return something that can be assigned to write to iterator i. (Writable concept)
* sucessor(i) - Increment the iterator by 1. (Iterator concept)
* operator==(i,j) - Compare equal (Iterator concept, auto generates operator!=() if not explicitly defined)
* predecessor(i) - Decrement the iterator by 1. (Bidir Iterator concept)
* sucessor(i, n) - Increment the iterator by n. (Random Access)
* predecessor(i, n) - Decrement the iterator by n. (Random Access)
* difference(i, j) - Difference between 2 iterators. (Random Access)
* operator<(i, j) - Less than comparison (Random Access Concept, autogenerates <=, >, >= if not explicitly defined).
Thats only 9 functions you need to define to get a random access iterator. Actually in many cases only 7, because I reused the names successor() and predecessor() so that many random access iterator implementations could just write one function with a default argument
RAIterator successor(RAIterator iter, int n = 1) { return RAIterator{ iter.impl + n; }; }
Unfortunately we already have some names reserved in the standard. So in order to reuse them, a more realistic approach might be something like this:
* source(i)
* sink(i)
* advance(i) - we already have std::advance()
* operator==(i,j) - auto generates operator!=() if not explicitly defined.
* retreat(i) - matches std::advance() name
* next(i, n) - we already have std::next() in C++11
* prev(i, n) - we already have std::prev() in C++11
* difference(i, j) - * Not std::distance()!! *
* operator<(i, j) - autogenerates <=, >, >= if not explicitly defined.
Note that difference() is very different from std::distance(). difference is a required operation of the random access iterator concept, and only defined for random access and above iterators. std::distance() is actually a generic algorithm, and I believe it belongs in the <algorithm> header, not the <iterator> header.
The big question is of course backwards compatibility. This one should actually not be hard and I believe it can be done without waiting for an STL 2.0.
First, all of the std:: versions of the above should include fallbacks to the old operators. Anyone writing a generic algorithm using the new paradigm can say using std::XYZ; (or using namespace std;) in their function body and automatically add support for legacy iterators. All of the standard library algorithm implementations would need to be migrated to the new paradigm.
In addition, all of the iterator tags in std::iterator_traits can be derived automatically by testing for the existence of these functions using meta programming. Iterator class authors should not need to write any typedefs. I expect iterator_traits eventually goes away once we have concepts.