According to the reference manual, strings are relationally compared using the current locale:
> Otherwise, if both arguments are strings, then their values are compared according to the current locale.
From the implementation of this in l_strcmp it appears this means to invoke strcoll:
> int temp = strcoll(s1, s2);
Where it will repeatedly invoke the function to handle strings that contain zero bytes in them. Furthermore, it seems like there is no check on the contents of the strings beforehand so any string is accepted. However, according to the response to
DR 484:
> ... the behavior of strcoll in the face of invalid input is already clearly undefined.
The behavior is undefined if strcoll is given "invalid input", which for example in a UTF-8 locale I assume means invalid UTF-8 strings. I don't know of any platforms that do anything bad on invalid input, but I think this undefined behavior is worth considering. Also, I don't know of any feasible way to validate strings for strcoll. I would expect copying LC_COLLATE to LC_CTYPE, validating the strings with mbsrtowcs (requires C94), then restoring LC_CTYPE to work. But that seems like an excessive amount of extra work just to make string comparison work.