ICU4C API proposal: code point iterators & ranges: add base(), front(), back()

7 views
Skip to first unread message

Markus Scherer

unread,
Jun 2, 2026, 3:05:46 PM (8 days ago) Jun 2
to icu-design, Robin Leroy
Dear ICU team & users,

I would like to propose the following API for: ICU 79

Please provide feedback by: next Wednesday, 2026-jun-10

Designated API reviewer: Robin

Ticket: https://unicode-org.atlassian.net/browse/ICU-23421

Pull request: https://github.com/unicode-org/icu/pull/4014


This is a small addition to the C++ code point iterator APIs. These are actually common functions in C++ standard iterators and ranges that we had overlooked.


From the ticket:

C++ “ranges” or “containers” usually have a front() function for the first element, and a back() function for the last one. This has turned out to be something that would be a useful addition to [Unsafe]UTFStringCodePoints. Simple example:

A C++ iterator often wraps another, “more basic” iterator, and commonly provides a base() function to access that. [Unsafe]UTFIterator wraps a code unit iterator; base() would return a code unit iterator at the current logical position. It would be like fetching the current CodeUnits and calling their begin(), except that those cannot be fetched from an exclusive-end() iterator; base() would be much easier to use.

Real example: use std::advance() to move a code point iterator forward n code points, then call its base() to find the position in the input string/string_view, which will usually have moved by a larger number of code units.

Similarly, a reverse_iterator usually has a base() function to return an original-direction iterator at the same logical position. The reverse_iterator wrappers over [Unsafe]UTFIterator should have that, too.


New API signatures in common/unicode/utfiterator.h. Note that the reverse_iterator classes are template specializations of C++ reverse_iterator and do not have the usual API docs.


class UTFIterator {

    /**

     * Returns the current position as a code unit iterator.

     * Similar to iter->begin() but also works at the exclusive end().

     *

     * @return current position as a code unit iterator

     * @draft ICU 79

     */

    U_FORCE_INLINE UnitIter base() const


class std::reverse_iterator<...UTFIterator...> {

    U_FORCE_INLINE U_HEADER_ONLY_NAMESPACE::UTFIterator<CP32, behavior, UnitIter> base() const


class UTFStringCodePoints {

    /**

     * Returns the CodeUnits for the first character/code point.

     * Requires that the range is not empty.

     *

     * @return the CodeUnits for the first character/code point.

     * @draft ICU 79

     */

    auto front() const {

        return *begin();

    }


    /**

     * Returns the CodeUnits for the last character/code point.

     * Requires that the range is not empty.

     *

     * @return the CodeUnits for the last character/code point.

     * @draft ICU 79

     */

    auto back() const {

        return *(--end());

    }



Same additions in the “unsafe” classes, API docs omitted here:


class UnsafeUTFIterator {

    U_FORCE_INLINE UnitIter base() const


class std::reverse_iterator<...UnsafeUTFIterator...> {

    U_FORCE_INLINE U_HEADER_ONLY_NAMESPACE::UnsafeUTFIterator<CP32, UnitIter> base() const


class UnsafeUTFStringCodePoints {

    auto front() const

    auto back() const


Sincerely,
markus
Reply all
Reply to author
Forward
0 new messages