I would like to propose the following API for: ICU 79
Please provide feedback by: next Wednesday, 2026-jun-10
Designated API reviewer: Robin
Ticket: https://unicode-org.atlassian.net/browse/ICU-23421
Pull request: https://github.com/unicode-org/icu/pull/4014
This is a small addition to the C++ code point iterator APIs. These are actually common functions in C++ standard iterators and ranges that we had overlooked.
From the ticket:
C++ “ranges” or “containers” usually have a front() function for the first element, and a back() function for the last one. This has turned out to be something that would be a useful addition to [Unsafe]UTFStringCodePoints. Simple example:
https://en.cppreference.com/cpp/string/basic_string_view/front
https://en.cppreference.com/cpp/string/basic_string_view/back
A C++ iterator often wraps another, “more basic” iterator, and commonly provides a base() function to access that. [Unsafe]UTFIterator wraps a code unit iterator; base() would return a code unit iterator at the current logical position. It would be like fetching the current CodeUnits and calling their begin(), except that those cannot be fetched from an exclusive-end() iterator; base() would be much easier to use.
Real example: use std::advance() to move a code point iterator forward n code points, then call its base() to find the position in the input string/string_view, which will usually have moved by a larger number of code units.
Similarly, a reverse_iterator usually has a base() function to return an original-direction iterator at the same logical position. The reverse_iterator wrappers over [Unsafe]UTFIterator should have that, too.
New API signatures in common/unicode/utfiterator.h. Note that the reverse_iterator classes are template specializations of C++ reverse_iterator and do not have the usual API docs.
class UTFIterator {
/**
* Returns the current position as a code unit iterator.
* Similar to iter->begin() but also works at the exclusive end().
*
* @return current position as a code unit iterator
* @draft ICU 79
*/
U_FORCE_INLINE UnitIter base() const
class std::reverse_iterator<...UTFIterator...> {
U_FORCE_INLINE U_HEADER_ONLY_NAMESPACE::UTFIterator<CP32, behavior, UnitIter> base() const
class UTFStringCodePoints {
/**
* Returns the CodeUnits for the first character/code point.
* Requires that the range is not empty.
*
* @return the CodeUnits for the first character/code point.
* @draft ICU 79
*/
auto front() const {
return *begin();
}
/**
* Returns the CodeUnits for the last character/code point.
* Requires that the range is not empty.
*
* @return the CodeUnits for the last character/code point.
* @draft ICU 79
*/
auto back() const {
return *(--end());
}
Same additions in the “unsafe” classes, API docs omitted here:
class UnsafeUTFIterator {
U_FORCE_INLINE UnitIter base() const
class std::reverse_iterator<...UnsafeUTFIterator...> {
U_FORCE_INLINE U_HEADER_ONLY_NAMESPACE::UnsafeUTFIterator<CP32, UnitIter> base() const
class UnsafeUTFStringCodePoints {
auto front() const
auto back() const