potential security

143 views
Skip to first unread message

ossdev

unread,
May 4, 2026, 3:30:07 PMMay 4
to icu-design
Hello,
I wanted to discuss potential security improvements to icu library. Specifically I wanted to look into adopting the Safe Buffers Programming Model to migrate from C-style buffers to C++ containers, including the use of -Wunsafe-buffer-usage.

Would the maintainers be interested in adopting safe buffers, I recall warnings showing in several thousands and the library could benefit from using safe buffers.

Sent with Proton Mail secure email.

Fredrik Roubert

unread,
May 4, 2026, 4:14:37 PMMay 4
to devtes...@gmail.com, icu-design
On Mon, May 4, 2026 at 9:30 PM ossdev <devtes...@gmail.com> wrote:

> Would the maintainers be interested in adopting safe buffers,

In general, yes, that's certainly a goal we'd like to work towards.
See for example some previous work in that direction done here:

https://unicode-org.atlassian.net/browse/ICU-22901
https://unicode-org.atlassian.net/browse/ICU-22696
https://unicode-org.atlassian.net/browse/ICU-22520
https://unicode-org.atlassian.net/browse/ICU-21289
https://unicode-org.atlassian.net/browse/ICU-21035
https://unicode-org.atlassian.net/browse/ICU-20803
https://unicode-org.atlassian.net/browse/ICU-20445
https://unicode-org.atlassian.net/browse/ICU-20158
https://unicode-org.atlassian.net/browse/ICU-13417

https://github.com/search?q=repo%3Aunicode-org%2Ficu+%22ICU-22901%22&type=commits
https://github.com/search?q=repo%3Aunicode-org%2Ficu+%22ICU-22696%22&type=commits
https://github.com/search?q=repo%3Aunicode-org%2Ficu+%22ICU-22520%22&type=commits
https://github.com/search?q=repo%3Aunicode-org%2Ficu+%22ICU-21289%22&type=commits
https://github.com/search?q=repo%3Aunicode-org%2Ficu+%22ICU-21035%22&type=commits
https://github.com/search?q=repo%3Aunicode-org%2Ficu+%22ICU-20803%22&type=commits
https://github.com/search?q=repo%3Aunicode-org%2Ficu+%22ICU-20445%22&type=commits
https://github.com/search?q=repo%3Aunicode-org%2Ficu+%22ICU-20158%22&type=commits
https://github.com/search?q=repo%3Aunicode-org%2Ficu+%22ICU-13417%22&type=commits

But ICU is a very old library that has made many promises in the past
that makes this work considerably more difficult than it would have
been with a more modern codebase.

First and foremost, there's a promise that the library should be fully
usable also when built without C++ exception handling together and a
promise that the library should always handle memory allocation
failures gracefully (reporting U_MEMORY_ALLOCATION_ERROR back to the
caller, not ignoring failures or terminating the process or throwing
exceptions), which unfortunately is fundamentally incompatible with
STL containers (which by design require exceptions to report memory
allocation failures). So it isn't possible to just replace existing
buffers with STL data structures like it would be possible to do in
most codebases.

So it seems unlikely to me that it'd be possible to make any large and
sweeping changes here, but if you were to pick some specific set of
buffers in ICU and devise a plan for how to replace them with
something safer (without breaking backward compatibility for callers)
then we'd sure be interested.

--
Fredrik Roubert
rou...@google.com

ossdev

unread,
May 10, 2026, 5:22:16 AMMay 10
to icu-design, Fredrik Roubert, icu-design, devtes...@gmail.com
Glad to find interest in adopting safe buffer practices. I spent the previous few days getting familiar with the codebase and also going through the attached commits. The icu library is fairly large and making large sweeping changes would be unfeasible rather going in smaller sets of unsafe buffer usage would be manageable. For example going through C style headers file source/common/cstring.h. The following are categorized as unsafe functions 

#define uprv_strcpy(dst, src) U_STANDARD_CPP_NAMESPACE  strcpy(dst, src)
#define uprv_strlen(str) U_STANDARD_CPP_NAMESPACE strlen(str)
#define uprv_strcmp(s1, s2) U_STANDARD_CPP_NAMESPACE strcmp(s1, s2)
#define uprv_strcat(dst, src) U_STANDARD_CPP_NAMESPACE strcat(dst, src)
#define uprv_strchr(s, c) U_STANDARD_CPP_NAMESPACE strchr(s, c)
#define uprv_strstr(s, c) U_STANDARD_CPP_NAMESPACE strstr(s, c)
#define uprv_strrchr(s, c) U_STANDARD_CPP_NAMESPACE strrchr(s, c)
#define uprv_strncpy(dst, src, size) U_STANDARD_CPP_NAMESPACE strncpy(dst, src, size)
#define uprv_strncmp(s1, s2, n) U_STANDARD_CPP_NAMESPACE strncmp(s1, s2, n)
#define uprv_strncat(dst, src, n) U_STANDARD_CPP_NAMESPACE strncat(dst, src, n)

e.g in locid.cpp
bool
Locale::operator==( const   Locale& other) const
{
    return uprv_strcmp(other.getName(), getName()) == 0;
}

Function 'strcmp' is unsafeclang(-Wunsafe-buffer-usage-in-libc-call)
macro uprv_strcmp
provided by "cstring.h"

I wanted to go through each of the defined macro and provide safe implementation also while preserving backward compatibility keeping the macro version as function, e.g
// #define uprv_strcpy(dst, src) U_STANDARD_CPP_NAMESPACE  strcpy(dst, src)
static
inline char* uprv_strcpy(char* dst, const char* src)
{
  #pragma clang unsafe_buffer_usage begin
  return U_STANDARD_CPP_NAMESPACE strcpy(dst, src);
  #pragma clang unsafe_buffer_usage end
}
#ifdef __cplusplus
U_NAMESPACE_USE
static
inline void uprv_strcpy(CharString &dst, const char* src, UErrorCode &status) {
    if (U_FAILURE(status)) return;
    if (dst == nullptr) {
        status = U_ILLEGAL_ARGUMENT_ERROR;
        return;
    }
    dst.clear();
    dst.append(src, status);
}
#endif /* __cplusplus */

Scope of work:
The defined macro gets converted into function and two versions are provided, one for backward compatibility and the other providing safe usage. Going through each of the defined macros the reference count is also high, e.g. uprv_strcpy is used in 59 files. When using unsafe version as is, the developer would get warning and get reminded to use safe the version.

The above changes are safe to implement and do not break backward-compatibility. If the changes make sense please respond and I am able to begin work on PR. 

ossdev

unread,
May 11, 2026, 8:29:51 AMMay 11
to icu-design, ossdev, Fredrik Roubert, icu-design, marku...@gmail.com
Dear ICU team & users,
The above proposal is awaiting discussion and approval. Ticket number does not exist in yet pending approval.
Please provide feedback by: next Monday, 2026-05-18
Designated API reviewer: Markus

Thanks

ossdev

unread,
May 15, 2026, 5:24:12 AMMay 15
to icu-design, ossdev, Fredrik Roubert, icu-design, marku...@gmail.com
Hi Markus,
From the earlier post, I gathered there is interest in implementing safe buffers just needs more direction as to how to proceed. I wanted to check if there's anything that could strengthen the Safe Buffers proposal from May 10, or if there's a particular format or template you'd prefer for API proposals like this one? Thanks.

ossdev

unread,
May 20, 2026, 2:02:13 AMMay 20
to icu-design, ossdev, Fredrik Roubert, icu-design
Hello Fredrik & ICU team,
Any thoughts or questions on the proposal? I am able to open pull request in github. The proposal is suggesting to replace unsafe functions with safe versions using ICU classes and objects i.e ( CharString). 
Thanks

On Sunday, May 10, 2026 at 2:22:16 PM UTC+5 ossdev wrote:

Fredrik Roubert

unread,
May 21, 2026, 9:04:01 AMMay 21
to devtes...@gmail.com, icu-design
On Sun, May 10, 2026 at 11:22 AM ossdev <devtes...@gmail.com> wrote:

> inline void uprv_strcpy(CharString &dst, const char* src, UErrorCode &status) {

I don't understand: When would it be preferable to call this helper
function instead of directly calling CharString::copyFrom() or
CharString::append()?

--
Fredrik Roubert
rou...@google.com

ossdev

unread,
May 25, 2026, 7:59:43 AMMay 25
to icu-design, Fredrik Roubert, icu-design, devtes...@gmail.com
I did more research on the icu codebase. The copyFrom  requires instance of CharString() and calls uprv_memcpy which calls unsafe libc function memcpy ends up calling append(), so another safe overload of uprv_memcpy would need to be defined in cmemory.h

#include <algorithm>
#define uprv_memcpy(dst, src, size) UPRV_BLOCK_MACRO_BEGIN { \
    U_ASSERT(dst != NULL); \
    U_ASSERT(src != NULL); \
    std::copy_n(reinterpret_cast<const char*>(src), (size), reinterpret_cast<char*>(dst)); \
The safe overload uses std::copy_n instead of memcpy.

I am finding better to keep the existing signature of, I was not aware CharString already have copyFrom which also calls append(), 
inline char* uprv_strcpy(char* dst, const char* src) rather than pass CharString like
inline void uprv_strcpy(CharString &dst, const char* src, UErrorCode &status) 

Also passing CharString instead of char* would require modifying about 78 files and 230+ call sites which is impractical. The goal is to try and provide safe versions of libc functions strcat 
keeping the current signature passing char * dst and char * src, the safe version for uprv_strcpy becomes

static
inline char* uprv_strcpy(char* dst, const char* src) {
    std::string_view sv(src);
    auto len = sv.size();
    std::copy_n(src, len + 1, dst);
    return dst;
}


When to use the CharString helper function? When an instance of CharString is available and use the uprv_strcpy when copying char * src onto char * dst e.g the following is using the safe version
uprv_strcpy(correctedPOSIXLocale, "en_US_POSIX");

The general idea of the proposal is to try and refactor the unsafe libc functions mentioned in the macro defines and provide safe versions while keeping the existing signature to minimize call site refactors. Does the proposal sort of makes sense? We can try out the safe uprv_strcpy version and then remainder of the macro defines.

Fredrik Roubert

unread,
May 26, 2026, 8:48:44 AMMay 26
to devtes...@gmail.com, icu-design
On Mon, May 25, 2026 at 1:59 PM ossdev <devtes...@gmail.com> wrote:

> Does the proposal sort of makes sense?

No, I'm sorry, but I'm finding it very hard to understand what exactly
your proposal really is.

--
Fredrik Roubert
rou...@google.com

ossdev

unread,
May 26, 2026, 11:58:04 AMMay 26
to icu-design, Fredrik Roubert, icu-design, devtes...@gmail.com
Hello Fredrik,
I am trying to implement safe buffers practices as explained in the document https://clang.llvm.org/docs/SafeBuffers.html. How about instead of finding task which is proving to be difficult,  I can use some guidance as to which task is available in terms of implementing safe buffers. Basically compile icu using  '-Wunsafe-buffer-usage'  vscode should show unsafe buffer warnings.
The defines listed in cstring.h are heavily used in entire codebase and make suitable targets for safe versions.
#define uprv_strcpy(dst, src) U_STANDARD_CPP_NAMESPACE  strcpy(dst, src)
#define uprv_strlen(str) U_STANDARD_CPP_NAMESPACE strlen(str)
#define uprv_strcmp(s1, s2) U_STANDARD_CPP_NAMESPACE strcmp(s1, s2)
#define uprv_strcat(dst, src) U_STANDARD_CPP_NAMESPACE strcat(dst, src)
#define uprv_strchr(s, c) U_STANDARD_CPP_NAMESPACE strchr(s, c)
#define uprv_strstr(s, c) U_STANDARD_CPP_NAMESPACE strstr(s, c)
#define uprv_strrchr(s, c) U_STANDARD_CPP_NAMESPACE strrchr(s, c)
#define uprv_strncpy(dst, src, size) U_STANDARD_CPP_NAMESPACE strncpy(dst, src, size)
#define uprv_strncmp(s1, s2, n) U_STANDARD_CPP_NAMESPACE strncmp(s1, s2, n)
#define uprv_strncat(dst, src, n) U_STANDARD_CPP_NAMESPACE strncat(dst, src, n)

Are their other potential tasks which I should look into more? Like I said I can use some guidance in finding which could be potential suitable tasks for adopting safe buffers.

Thanks.

ossdev

unread,
May 28, 2026, 4:30:29 AMMay 28
to icu-design, Fredrik Roubert, icu-design, devtes...@gmail.com
> No, I'm sorry, but I'm finding it very hard to understand what exactly
your proposal really is.

Thanks for the response. I have re-summarized the proposal, I am finding the proposal adds value to the existing icu codebase.

Summary
ICU heavily relies on raw C-string macros wrapped around standard libc functions (e.g., strcpy, strcmp). Modern compilers and static analyzers emit critical warnings for these (e.g., clang(-Wunsafe-buffer-usage-in-libc-call)). Additionally, because these are raw macros, developers lack compile-time guardrails against buffer overflows when writing new code.

The Problem
The current raw macro #define uprv_strcpy(dst, src) strcpy(dst, src) forces pointer decay. This completely hides buffer sizes from the compiler, preventing safety validation for stack arrays (char[]) and dynamic buffers (malloc).

Proposed Solution
Replace the macro with a static inline function using Clang attributes(pass_dynamic_object_size and __builtin_dynamic_object_size). This catches sizes automatically at the call site requiring no modication at the call site. Using Clang attributes, the refactor becomes much easier because the defined macros uprv_strcpy are heavily used in several call sites 70+ files. In the example below, the compiler keeps track of dynamic object size for when the char* dst, char * src is fixed array or dynamically allocated usin malloc.

#if defined(__clang__)
static inline char* uprv_strcpy(
    char* dst __attribute__((pass_dynamic_object_size(0))),
    const char* src,
    size_t dst_capacity = __builtin_dynamic_object_size(dst, 0)
) {
    if (dst && dst_capacity != (size_t)-1 && dst_capacity != 0) {
        if (U_STANDARD_CPP_NAMESPACE strlen(src) >= dst_capacity) {
            if (dst_capacity > 0) dst[0] = '\0';
            return nullptr;

        }
    }
#pragma clang unsafe_buffer_usage begin
    return U_STANDARD_CPP_NAMESPACE strcpy(dst, src);
#pragma clang unsafe_buffer_usage end
}
#else

static inline char* uprv_strcpy(char* dst, const char* src) {
    return U_STANDARD_CPP_NAMESPACE strcpy(dst, src);
}
#endif



Reply all
Reply to author
Forward
0 new messages