Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

What is the "correct type" for printf("%hd", ...)?

655 views
Skip to first unread message

Keith Thompson

unread,
Mar 3, 2014, 11:38:51 AM3/3/14
to
N1570 7.21.6.1p9, discussing fprintf, says:

If any argument is not the correct type for the corresponding
conversion specification, the behavior is undefined.

p7 discusses length modifiers:

h Specifies that a following d, i, o, u, x, or X conversion
specifier applies to a short int or unsigned short int
argument (the argument will have been promoted according to
the integer promotions, but its value shall be converted to
short int or unsigned short int before printing); or that
a following n conversion specifier applies to a pointer to
a short int argument.

Does this mean that the "expected type" for a "%hd" format is short int,
and that passing an int argument has undefined behavior?

For example:

#include <stdio.h>
#include <limits.h>
int main(void) {
printf("%hd\n", 0);
if (UINT_MAX > USHRT_MAX) {
printf("%hu\n", (unsigned)USHRT_MAX + 1);
}

printf("%hu\n", 0);
if (INT_MAX > SHRT_MAX) {
printf("%hd\n", (int)SHRT_MAX + 1);
}
}

If the expected type is int, then this program has no undefined
behavior (except perhaps that the conversion of ((int)SHRT_MAX+1)
from short to int might raise an implementation-defined signal).
If the expected type is short or short int, then all 4 printf
calls have undefined behavior -- and the only effect of the "h"
length modifier is to introduce undefined behavior.

The same applies to "hh", with signed char or unsigned char rather than
short.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

James Kuyper

unread,
Mar 3, 2014, 1:49:27 PM3/3/14
to
You asked a very similar question almost exactly 7 years ago, in the
thread titled 'printf("%hd\n", 42);'. My response to that message
referred back to an even older discussion on the topic, subject unspecified.

I stand by my final positions on the issues raised in the 2007
discussion. I don't believe that anything changed in C2011 that would
have affected any of the arguments raised by any of the participants.
--
James Kuyper

Hans-Bernhard Bröker

unread,
Mar 3, 2014, 4:10:34 PM3/3/14
to
On 03.03.2014 17:38, Keith Thompson wrote:
> Does this mean that the "expected type" for a "%hd" format is short int,
> and that passing an int argument has undefined behavior?

Of course not --- because you _cannot_ pass a short to a variadic
function (in the variable part of its argument list). Not if short
differs from int, anyway. I.e. the actual argument corresponding to
your "%hd" format specifier is always an int (or equivalent).

Or why did you think it says: "the argument will have been promoted...".

> If the expected type is int, then this program has no undefined
> behavior (except perhaps that the conversion of ((int)SHRT_MAX+1)
> from short to int

There is no such conversion. The only conversion short-->int here
happens to SHRT_MAX. The type of ((int)SHRT_MAX+1) already _is_ int.

> might raise an implementation-defined signal).

Even if the conversion existed, it would not be allowed to do that. int
is required to be able to hold all values of short int, so no
implementation-defined behaviour involved.

You might get some for the _opposite_ conversion, though, which the
above explicitly requires printf() to do here.

Keith Thompson

unread,
Mar 3, 2014, 5:59:19 PM3/3/14
to
Hans-Bernhard Bröker <HBBr...@t-online.de> writes:
> On 03.03.2014 17:38, Keith Thompson wrote:
>> Does this mean that the "expected type" for a "%hd" format is short int,
>> and that passing an int argument has undefined behavior?
>
> Of course not --- because you _cannot_ pass a short to a variadic
> function (in the variable part of its argument list). Not if short
> differs from int, anyway. I.e. the actual argument corresponding to
> your "%hd" format specifier is always an int (or equivalent).
>
> Or why did you think it says: "the argument will have been promoted...".

James Kuyper points out that I raised this question 7 years ago, and
it led to a rather lengthy discussion (which I had forgotten about).
You can see the discussion at:

https://groups.google.com/forum/#!topic/comp.std.c/plYWnLWUHL8/overview

I don't want to spend too much time rehashing what was said back then,
but it's not entirely obvious what the intent was. Either it requires a
short int argument, which will have been promoted (which is what it
actually says), or it requires an int argument, regardless of how that
int value is computed (which IMHO is more sensible, or at least more
consistent with the behavior of variadic functions in general)).

>> If the expected type is int, then this program has no undefined
>> behavior (except perhaps that the conversion of ((int)SHRT_MAX+1)
>> from short to int
>
> There is no such conversion. The only conversion short-->int here
> happens to SHRT_MAX. The type of ((int)SHRT_MAX+1) already _is_ int.

Sorry, I meant the conversion from int to short, which is done within
printf.

> > might raise an implementation-defined signal).

[...]

James Kuyper

unread,
Mar 3, 2014, 7:13:12 PM3/3/14
to
On 03/03/2014 05:59 PM, Keith Thompson wrote:
> Hans-Bernhard Bröker <HBBr...@t-online.de> writes:
>> On 03.03.2014 17:38, Keith Thompson wrote:
>>> Does this mean that the "expected type" for a "%hd" format is short int,
>>> and that passing an int argument has undefined behavior?
>>
>> Of course not --- because you _cannot_ pass a short to a variadic
>> function (in the variable part of its argument list). Not if short
>> differs from int, anyway. I.e. the actual argument corresponding to
>> your "%hd" format specifier is always an int (or equivalent).
>>
>> Or why did you think it says: "the argument will have been promoted...".
>
> James Kuyper points out that I raised this question 7 years ago, and
> it led to a rather lengthy discussion (which I had forgotten about).
> You can see the discussion at:
>
> https://groups.google.com/forum/#!topic/comp.std.c/plYWnLWUHL8/overview
>
> I don't want to spend too much time rehashing what was said back then,
> but it's not entirely obvious what the intent was. ...

I've always felt that the committee's intent was obvious, but that what
is now 7.21.6.1p9 does not correctly express it. It should have said "If
the promoted type of any argument is not the correct type for the
corresponding conversion specification, the behavior is undefined.", and
the %h and %hh should have specified that the "correct" type is the
corresponding promoted type.

However, Robert Gamble managed to convince me that 6.3.1.1p2 covers the
issue:

> The following may be used in an expression wherever an int or unsigned int may
> be used:
> — An object or expression with an integer type (other than int or unsigned int)
> whose integer conversion rank is less than or equal to the rank of int and
> unsigned int.
> — A bit-field of type _Bool, int, signed int, or unsigned int.

I still think that modifying 7.21.6.1p9 would be the clearer way of
expressing this.
--
James Kuyper

Keith Thompson

unread,
Mar 3, 2014, 7:31:36 PM3/3/14
to
I'm not sure it does -- and I've just noticed that 6.3.1.1p2 could use
some clearer wording.

As written, it could imply that, given:

short s;
int i;

the expression s can be used in place of i as the operand of the
unary "&" operator.

The phrase "an int or unsigned int" presumably means "an expression
of type int or unsigned int". The phrase "An object or expression"
is odd; objects exist only at run time, and if it's supposed to
be the *name* of an object, then that's already covered by "or
expression" -- unless it's in a context requiring an lvalue.

I think the intent is clear enough, but it would be nice if the
wording actually expressed that intent.

But in the case we're discussing:

printf("%hd", s);

that rule only lets us use s where i is permitted; we know that s
is permitted, but that doesn't imply that i is also permitted.

> I still think that modifying 7.21.6.1p9 would be the clearer way of
> expressing this.

Agreed (plus modifying p7 to say that the "correct type" is int or
unsigned int).

Tim Rentsch

unread,
Mar 8, 2014, 1:03:45 PM3/8/14
to
Keith Thompson <ks...@mib.org> writes:

> N1570 7.21.6.1p9, discussing fprintf, says:
>
> If any argument is not the correct type for the corresponding
> conversion specification, the behavior is undefined.
>
> p7 discusses length modifiers:
>
> h Specifies that a following d, i, o, u, x, or X conversion
> specifier applies to a short int or unsigned short int
> argument (the argument will have been promoted according to
> the integer promotions, but its value shall be converted to
> short int or unsigned short int before printing); or that
> a following n conversion specifier applies to a pointer to
> a short int argument.
>
> Does this mean that the "expected type" for a "%hd" format is short
> int, and that passing an int argument has undefined behavior?

No. It means that %hd will collect an argument value using the
same semantics as a `va_arg(ap,int)` would, because a short int
argument always promotes to int, and then convert the collected int
value to a short int value before printing. Provided conversions
of out-of-range values are well-defined, using %hd with any int
argument value is also well-defined.

> For example:
>
> #include <stdio.h>
> #include <limits.h>
> int main(void) {
> printf("%hd\n", 0);
> if (UINT_MAX > USHRT_MAX) {
> printf("%hu\n", (unsigned)USHRT_MAX + 1);
> }
>
> printf("%hu\n", 0);
> if (INT_MAX > SHRT_MAX) {
> printf("%hd\n", (int)SHRT_MAX + 1);
> }
> }
>
> If the expected type is int, then this program has no undefined
> behavior (except perhaps that the conversion of ((int)SHRT_MAX+1)
> from short to int might raise an implementation-defined signal).
> If the expected type is short or short int, then all 4 printf
> calls have undefined behavior -- and the only effect of the "h"
> length modifier is to introduce undefined behavior.
> [and similarly %hhd, etc]

To simplify the discussion let's assume that the narrowing
conversions involved (ie, to short or signed char) are all
well-defined. (Of course conversions to unsigned short or
unsigned char are always well-defined.) Also let's assume that
the two short types are narrower than their respective regular
types, ie, SHRT_MAX < INT_MAX and USHRT_MAX < UINT_MAX, so
all four printf() statements will execute.

Under these assumptions, the behaviors of the four printf()
statements shown are all well-defined, except in the pathological
circumstance where USHRT_MAX == INT_MAX, in which case the second
printf() has undefined behavior. If USHRT_MAX == INT_MAX, an
expression of type 'unsigned short' promotes to 'int', so
printf() will expect an 'int' argument for a '%hu' format
specification. However the printf() shown supplies an unsigned
int value for its argument. This is fine as long as the value
supplied is no larger than INT_MAX, because it's okay to read an
int value using unsigned int, or vice versa, for values in the
common subset. But when USHRT_MAX == INT_MAX, then the
expression (unsigned)USHRT_MAX + 1 will not be in the common
subset (under the stated assumption that USHRT_MAX < UINT_MAX).
Because printf() expects an argument that would be promoted to
int, and the actual argument is an unsigned int, and the value
of that unsigned int is not in the range [ 0 .. INT_MAX ], the
behavior in that case would be undefined.

Incidentally, doing a google search (now that "google groups" is
essentially useless), turned up this link (apparently from the
2007 thread mentioned in another response) -

http://bytes.com/topic/c/answers/591625-usefulness-hd-printf

My comments here are consistent with those of P.J. Plauger in his
several responses shown on this page. P.J. Plauger has been
involved with C standardization efforts since the original
ANSI work, served as project editor for WG14 in the early 1990's,
and is widely recognized as a leading authority (if not the
leading authority) on the C standard library. So I am reasonably
confident that my analysis here is correct.

Keith Thompson

unread,
Mar 8, 2014, 4:30:22 PM3/8/14
to
Tim Rentsch <t...@alumni.caltech.edu> writes:
> Keith Thompson <ks...@mib.org> writes:
>> N1570 7.21.6.1p9, discussing fprintf, says:
>>
>> If any argument is not the correct type for the corresponding
>> conversion specification, the behavior is undefined.
>>
>> p7 discusses length modifiers:
>>
>> h Specifies that a following d, i, o, u, x, or X conversion
>> specifier applies to a short int or unsigned short int
>> argument (the argument will have been promoted according to
>> the integer promotions, but its value shall be converted to
>> short int or unsigned short int before printing); or that
>> a following n conversion specifier applies to a pointer to
>> a short int argument.
>>
>> Does this mean that the "expected type" for a "%hd" format is short
>> int, and that passing an int argument has undefined behavior?
>
> No. It means that %hd will collect an argument value using the
> same semantics as a `va_arg(ap,int)` would, because a short int
> argument always promotes to int, and then convert the collected int
> value to a short int value before printing. Provided conversions
> of out-of-range values are well-defined, using %hd with any int
> argument value is also well-defined.

I *like* that interpretation, and I think it describes both the
way any reasonable implementation would behave and the intent of
the committee, but I'm still not convinced that it's supported by
the wording of the standard. I now think that this is just a bit
of sloppy wording in the standard.

The description of the "h" modifier says it:

Specifies that a following d, i, o, u, x, or X conversion
specifier applies to a short int or unsigned short int argument
(the argument will have been promoted according to the integer
promotions, but its value shall be converted to short int
or unsigned short int before printing); or that a following
n conversion specifier applies to a pointer to a short int
argument.

There's no problem with "%hn", since it takes a pointer. But it's
not completely crazy to read that as requiring that the argument
*before promotion* must be of type short int or unsigned short int,
and that the behavior is undefined otherwise. One implication of
such an interpretation is that a compiler could *reject* a call like

int n = 42;
printf("%hd\n", n);

since its behavior would be undefined. printf("%hd\n", 0) would also
have undefined behavior.

I don't claim that that's not absurd, merely that it's consistent with
the current wording in the standard.

If the intent is that passing an int argument is well defined (as long
as the conversion is well defined), that intent would have been better
expressed by saying that "h":

Specifies that a following d, i, o, u, x, or X conversion
specifier applies to an int or unsigned int argument whose
value shall be converted to short int or unsigned short int
before printing; or that a following n conversion specifier
applies to a pointer to a short int argument.

perhaps with a footnote suggesting that the argument is typically a
short int or unsigned short int argument which will be promoted to int
or unsigned int.

[snip]

> Incidentally, doing a google search (now that "google groups" is
> essentially useless), turned up this link (apparently from the
> 2007 thread mentioned in another response) -
>
> http://bytes.com/topic/c/answers/591625-usefulness-hd-printf

bytes.com appears to show content from Usenet while pretending that it
was posted in their own "community". There's no reference to Usenet or
to comp.lang.c on that page. Looks like plagiarism to me.

With a little searching, the same thread can be found on
groups.google.com:

https://groups.google.com/forum/#!msg/comp.lang.c/-MQtBj-5cYU/VtegsfBVobYJ

(I think at least some of the Google Groups "advanced search" features
are still available; they're not just documented or presented in any
sensible manner.)

> My comments here are consistent with those of P.J. Plauger in his
> several responses shown on this page. P.J. Plauger has been
> involved with C standardization efforts since the original
> ANSI work, served as project editor for WG14 in the early 1990's,
> and is widely recognized as a leading authority (if not the
> leading authority) on the C standard library. So I am reasonably
> confident that my analysis here is correct.

Plauger also pointed out a case where "h" is actually useful. This:

printf("%hhx %hx %x\n", -1, -1, -1);

prints:

ff ffff ffffffff

on a system with 8-bit char, 16-bit short, and 32-bit int.

"%hd" is less useful (unless you're willing to depend on the
implementation-defined semantics of int-to-short conversion),
but it would have been awkward to exclude it.

James Kuyper

unread,
Mar 8, 2014, 6:04:35 PM3/8/14
to
On 03/08/2014 04:30 PM, Keith Thompson wrote:
...
> (I think at least some of the Google Groups "advanced search" features
> are still available; they're not just documented or presented in any
> sensible manner.)

If you do a group search (like

<https://groups.google.com/forum/#!forum/comp.lang.c>

the search box near the top of the screen will contain a small
down-arrow on the right-hand side. Clicking on that arrow will bring up
a window that allows a somewhat more sophisticated search; though it's
not as sophisticated as "advanced search" used to be. One thing in
particular that annoys me is it does not support an all-groups search.
You can search all groups in the ordinary search box, or a specific
group with the advanced search window, but there's no way to do both.

I miss deja-news.
--
James Kuyper

Tim Rentsch

unread,
Mar 11, 2014, 1:12:53 PM3/11/14
to
I agree that the wording could be improved. I will take up
the other question a little further on...

> The description of the "h" modifier says it:
>
> Specifies that a following d, i, o, u, x, or X conversion
> specifier applies to a short int or unsigned short int argument
> (the argument will have been promoted according to the integer
> promotions, but its value shall be converted to short int
> or unsigned short int before printing); or that a following
> n conversion specifier applies to a pointer to a short int
> argument.
>
> There's no problem with "%hn", since it takes a pointer. But it's
> not completely crazy to read that as requiring that the argument
> *before promotion* must be of type short int or unsigned short int,
> and that the behavior is undefined otherwise.

IMO a more natural reading of the first part of that sentence is
as saying [un]signed short is what type is _expected_ but not what
type is _required_. "Specifies that a ... specifier applies" is
meant to convey something about how the function behaves, and only
indirectly imposes a requirement on types supplied by the caller.
In the longer length modifiers, eg, "%ld", the correspondence is
more exact, but that's because of how integer promotions work -
and the Standard takes the trouble to point out this difference,
in normative text, for the h and hh length modifiers.

> One implication of
> such an interpretation is that a compiler could *reject* a call like
>
> int n = 42;
> printf("%hd\n", n);
>
> since its behavior would be undefined. printf("%hd\n", 0) would also
> have undefined behavior.

Right. And this nutty result is one reason I think this
interpretation can be safely ignored.

> I don't claim that that's not absurd, merely that it's consistent
> with the current wording in the standard.

I agree it's consistent. I just don't think it's the best fit
for other parts of the Standard that (might) relate to this
question.

> If the intent is that passing an int argument is well defined (as long
> as the conversion is well defined), that intent would have been better
> expressed by saying that "h":
>
> Specifies that a following d, i, o, u, x, or X conversion
> specifier applies to an int or unsigned int argument whose
> value shall be converted to short int or unsigned short int
> before printing; or that a following n conversion specifier
> applies to a pointer to a short int argument.
>
> perhaps with a footnote suggesting that the argument is typically a
> short int or unsigned short int argument which will be promoted to int
> or unsigned int.

There's a potential problem with this suggested change, not for
signed types like %hd but for the unsigned specifiers o,u,x,X.
Please see below.

>> My comments here are consistent with those of P.J. Plauger in his
>> several responses shown on this page. P.J. Plauger has been
>> involved with C standardization efforts since the original
>> ANSI work, served as project editor for WG14 in the early 1990's,
>> and is widely recognized as a leading authority (if not the
>> leading authority) on the C standard library. So I am reasonably
>> confident that my analysis here is correct.
>
> Plauger also pointed out a case where "h" is actually useful. This:
>
> printf("%hhx %hx %x\n", -1, -1, -1);
>
> prints:
>
> ff ffff ffffffff
>
> on a system with 8-bit char, 16-bit short, and 32-bit int.
>
> "%hd" is less useful (unless you're willing to depend on the
> implementation-defined semantics of int-to-short conversion),
> but it would have been awkward to exclude it.

In the original wording, the behavior of "%hhx" and "%hx" are
both defined when used with -1 arguments (ie, on 8/16/32-width
implementations). The reason is, both "%hhx" and "%hx" expect
arguments that would have been promoted to 'int', and so read
them as int's, before converting to unsigned {char/short}.
However, under the changed wording suggested above, both "%hhx"
and "%hx" would read their arguments as _unsigned_ int, which
leads to undefined behavior if -1 is given as an argument.

(Incidental note: giving a -1 argument for a %x specifier is
undefined behavior regardless.)

The difference in whether an argument value for, eg, %hx, is
read as 'int' or 'unsigned int' is part of the motivation for
having the h and hh length modifiers to begin with. If we have,
for example, an unsigned short value to supply, we can use %hx as
a specifier, and not care whether unsigned short promotes to int
or unsigned int. Of course we could cast an unsigned short
argument to unsigned int, and avoid the problem that way, but
using %hx obviates the need for casting.

Joe keane

unread,
Mar 11, 2014, 6:04:37 PM3/11/14
to
In article <lnmwh0v...@nuthaus.mib.org>,
Keith Thompson <ks...@mib.org> wrote:
>I *like* that interpretation, and I think it describes both the
>way any reasonable implementation would behave and the intent of
>the committee, but I'm still not convinced that it's supported by
>the wording of the standard. I now think that this is just a bit
>of sloppy wording in the standard.

It says integer promotion will give:

the same value *and representation*

So you can interpret that as you like. A two-byte integer promoted to a
four-byte integer is really still a two-byte integer, it just -somehow-
takes up four bytes. Maybe the memory is a skip list.

It does have the same value though.

We can improve it to:

the same value *and probably not the same representation*,
unless it is the same representation, then it is

Keith Thompson

unread,
Mar 11, 2014, 6:51:49 PM3/11/14
to
j...@panix.com (Joe keane) writes:
> In article <lnmwh0v...@nuthaus.mib.org>,
> Keith Thompson <ks...@mib.org> wrote:
>>I *like* that interpretation, and I think it describes both the
>>way any reasonable implementation would behave and the intent of
>>the committee, but I'm still not convinced that it's supported by
>>the wording of the standard. I now think that this is just a bit
>>of sloppy wording in the standard.
>
> It says integer promotion will give:
>
> the same value *and representation*

Where does it say that? Please cite the section and paragraph number.

> So you can interpret that as you like. A two-byte integer promoted to a
> four-byte integer is really still a two-byte integer, it just -somehow-
> takes up four bytes. Maybe the memory is a skip list.

I'll respond to that after I see the citation.

> It does have the same value though.
>
> We can improve it to:
>
> the same value *and probably not the same representation*,
> unless it is the same representation, then it is

Joe keane

unread,
Mar 14, 2014, 9:46:37 PM3/14/14
to
In article <lniorks...@nuthaus.mib.org>,
Keith Thompson <ks...@mib.org> wrote:
>Where does it say that? Please cite the section and paragraph number.

Of course, i should have re-read it *before* posting.

I was thinking-- But you know, it doesn't matter.

I think people are applying 'as if' backward.

The translator is free to do what it feels like, provided that it gives
the same result -for programs that follow the rules-.

And the rules are quite clear. The argument type must be 'short' or
'unsigned short'. It is undefined behavior if the type is not right.
...If they wanted to say something different, they could have.

It can make a change like

printf("%hd", val);
->
__superprint$(__supershort$(val));

What happens if 'val' is not the right type? I have no idea, because i
just made that up. And it's not productive to speculate on why it may
or may not work. We do have a right to complain if it doesn't work
correctly when 'val' is a short type. Not otherwise.

Hans-Bernhard Bröker

unread,
Mar 15, 2014, 9:52:47 AM3/15/14
to
On 15.03.2014 02:46, Joe keane wrote:
> And the rules are quite clear.

Actually, no. Which the whole problem.

> The argument type must be 'short' or
> 'unsigned short'.

It isn't. It can't be, because we're talking variadic arguments here,
which undergo mandatory integer promotion before being passed. So the
actual argument type _cannot_ be 'short'. Not when it actually makes
any difference, that is, i.e. if 'short' is not the same as 'int'.

> It is undefined behavior if the type is not right.
> ...If they wanted to say something different, they could have.

The problem with that idea is that they didn't say what you think they
said, either.

The question is not what the compiler might to with a program once it's
been determined to lie outside the standard's limits. The problem is
that the standard, as-is, appears to fail at defining whether a program
like this

printf("%hu", -1);

actually _does_ violate the standard enough to open that rabbithole for
the compiler.

Bruce Evans

unread,
Mar 15, 2014, 7:10:51 PM3/15/14
to
In article <boj49j...@mid.dfncis.de>,
And it does this to fill a much-needed gap. The correct way to print
-1 converted to an unsigned short is to use plain %u format and cast
the arg to unsigned short yourself.

I used to think that the behaviour is defined in all cases (to be that
of taking the (promoted) int or unsigned arg and casting it, with an
implementation-defined result in some cases). Now I am unsure what
the standard requires if there is a type mismatch even if the result
is representable.

I used to think that the behaviour is clearer for scanf() -- that
scanf() fills a much-needed gap for most arg types, because its
behaviour is undefined if the result is not representable so it
is unusable on input that hasn't been verified by other means.
However, its wording specifies representability of _converted_
results, and converted results are always representable (by the
definition of result). From n869.txt (n1570.pdf is no different):

% [#10] Except in the case of a % specifier, the input item
% (or, in the case of a %n directive, the count of input
% characters) is converted to a type appropriate to the
% conversion specifier. If the input item is not a matching

"appropriate" seems to have its non-technical meaning. Therefore, it
is the C type literally matching the conversion specifier.

% sequence, the execution of the directive fails: this
% condition is a matching failure. Unless assignment
% suppression was indicated by a *, the result of the
% conversion is placed in the object pointed to by the first
% argument following the format argument that has not already
% received a conversion result. If this object does not have
% an appropriate type, or if the result of the conversion
% cannot be represented in the object, the behavior is
% undefined.

This wording makes no sense. "Conversion" can only be read as being
to the "appropriate" type, not some infinite-precision non-string
type capable of representing any string. Some conversions to the
"appropriate" type (mainly ones involving floating point) have no result
since conversion gives undefined behaviour on overflow. But if there
is a result, then it is representable.

Floating point types give the additional problem that the infinite-
precision result might be unrepresentable because it cannot be exactly
representable, so the specification cannot be changed to say that the
conversion is to an intermediate infinite-precision type.

%
% [#11] The length modifiers and their meanings are:
%
% hh Specifies that a following d, i, o, u, x, X, or
% n conversion specifier applies to an argument
% with type pointer to signed char or unsigned
% char.

This seems to say that scanf("%hhu", &u_char_var) on input -1 _is_
defined, since the result of the conversion of -1 to unsigned char
is just UCHAR_MAX and that is surely representable.

Related case for floating point:
- ("%f", &float_var) on input 0.1 is defined, since the result of
conversion of 0.1 to float is 0.1F. Decimal 0.1 might be
unrepresentable as a binary float, but the behaviour is defined
since the converted value is surely representable in the converted
type.
- ("%f", &float_var) on input 1e6666666 is defined, at least in
implementations that support infinities, since the result of conversion
of 1e6666666 to float is INFINITY. 1e6666666 is unrepresentable as a
float, but the behaviour is defined, as above.

This is clearly wrong. The behaviour should be undefined on overflow,
but defined as the result of conversion when the infinite-precision
value is not exactly representable, except possibly when the conversion
undeflows.

I used to think that the behaviour is clearer for the atol() family --
that it fills a much-needed gap by giving undefined behaviour like
sscanf(). The specification of these functions doesn't mention
conversion before representability:

% [#1] The functions atof, atoi, atol, and atoll need not
% affect the value of the integer expression errno on an
% error. If the value of the result cannot be represented,
% the behavior is undefined.

However, it is unclear what "the value of the result" means. What is
the difference between this and "the result"? I think "the result"
literally means just the result of conversion to the function return
type and the point about representability makes no sense, as above.
If it doesn't mean that, then it must mean an infinite-precision
intermediate result, but that makes no sense since it gives undefined
bahaviour for results that are not exactly representable.

These bugs are all missing for the strtol() family. This family doesn't
fill gaps with undefined behaviour, and the wording of its specification
makes sense. For example, for the floating point subset:

% [#10] The functions return the converted value, if any. If
% no conversion could be performed, zero is returned. If the
% correct value is outside the range of representable values,

By saying "correct value" instead of "converted value", it doesn't
require conversion before considering representability. "correct"
is underspecified, but everyone knows what it means.

% plus or minus HUGE_VAL, HUGE_VALF, or HUGE_VALL is returned
% (according to the return type and sign of the value), and
% the value of the macro ERANGE is stored in errno. If the
% result underflows (7.12.1), the functions return a value
% whose magnitude is no greater than the smallest normalized
% positive number in the return type; whether errno acquires
% the value ERANGE is implementation-defined.

Even the behaviour on underflow is specified (in more detail than
rounding for non-underflowing cases).

Bruce

Tim Rentsch

unread,
Mar 30, 2014, 2:05:28 AM3/30/14
to
j...@panix.com (Joe keane) writes:

> In article <lniorks...@nuthaus.mib.org>,
> Keith Thompson <ks...@mib.org> wrote:
>>Where does it say that? Please cite the section and paragraph number.
>
> Of course, i should have re-read it *before* posting.
>
> I was thinking-- But you know, it doesn't matter.
>
> I think people are applying 'as if' backward.
>
> The translator is free to do what it feels like, provided that it gives
> the same result -for programs that follow the rules-.
>
> And the rules are quite clear. The argument type must be 'short'
> or 'unsigned short'. It is undefined behavior if the type is not
> right. [snip elaboration]

These conclusions are at odds with comments made by P.J. Plauger
in postings shown on the webpage I cited upthread. So you may
want to reconsider your position here, at least as regards the
question of how clear the rules are.

Phil Carmody

unread,
Mar 31, 2014, 12:15:05 PM3/31/14
to
If we're having the same discussion in the same places between
the same people for about a decade, it's clear that there's
plenty of sloppy wording in the standard.
...

> If the intent is that passing an int argument is well defined (as long
> as the conversion is well defined), that intent would have been better
> expressed by saying that "h":
>
> Specifies that a following d, i, o, u, x, or X conversion
> specifier applies to an int or unsigned int argument whose
> value shall be converted to short int or unsigned short int
> before printing; or that a following n conversion specifier
> applies to a pointer to a short int argument.
>
> perhaps with a footnote suggesting that the argument is typically a
> short int or unsigned short int argument which will be promoted to int
> or unsigned int.

I much prefer that wording (and footnote),

Does anyone *not* prefer that wording (and footnote)?

> [snip]
>
> > Incidentally, doing a google search (now that "google groups" is
> > essentially useless), turned up this link (apparently from the
> > 2007 thread mentioned in another response) -
> >
> > http://bytes.com/topic/c/answers/591625-usefulness-hd-printf
...
> With a little searching, the same thread can be found on
> groups.google.com:
>
> https://groups.google.com/forum/#!msg/comp.lang.c/-MQtBj-5cYU/VtegsfBVobYJ

And lets see how well that works:
"""
$ w3m -dump 'https://groups.google.com/forum/#!msg/comp.lang.c/-MQtBj-5cYU/VtegsfBVobYJ'
Google'i gruppide arutelude kasutamiseks peate lubama oma brauseri seadetest
JavaScripti ning seej?rel seda lehte v?rskendama.

<xmp>.</xmp>
<div style="display:none" id="__top_header"><div id=gbar><nobr><a class=gb1 href="https://www.google.ee/webhp?tab=gw">Otsing</a> <a class=gb1 href="http://www.google.ee/imghp?hl=et&tab=gi">Pildid</a> <a class=gb1 href="https://maps.google.ee/maps?hl=et&tab=gl">Maps</a> <a class=gb1 href="https://www.youtube.com/?tab=g1">YouTube</a> <a class=gb1 href="https://mail.google.com/mail/?tab=gm">Gmail</a> <a class=gb1 href="https://drive.google.com/?tab=go">Drive</a> <a class=gb1 href="https://translate.google.ee/?hl=et&tab=gT">T?lge</a> <a class=gb1 href="https://www.blogger.com/?tab=gj">Blogger</a> </nobr></div><div id=guser width=100%><nobr><span id=gbn class=gbi></span><span id=gbf class=gbf></span><span id=gbe><a target="_blank" href="http://support.google.com/groups/bin/answer.py?answer=46601" class=gb4>Abi</a> | <a href="javascript:fb();" class=gb4>Teata Google'i gruppidega seotud probleemist</a> | <a href="javascript:showKeyboardShortcutPopup();" class=gb4>Klaviatuuri otseteed</a> | </span><a target=_top id=gb_70 href="https://www.google.com/a/UniversalLogin?continue=https://groups.google.com/forum/&hl=et&service=groups2&hd=default" class=gb4>Logi sisse</a></nobr></div><div class=gbh style=left:0></div><div class=gbh style=right:0></div>
<div style="clear:both"></div></div></body></html>
"""

I think we can call that pretty much useless. The bytes.com
renders perfectly readably under similar test conditions.

> (I think at least some of the Google Groups "advanced search" features
> are still available; they're not just documented or presented in any
> sensible manner.)
>
> > My comments here are consistent with those of P.J. Plauger in his
> > several responses shown on this page. P.J. Plauger has been
> > involved with C standardization efforts since the original
> > ANSI work, served as project editor for WG14 in the early 1990's,
> > and is widely recognized as a leading authority (if not the
> > leading authority) on the C standard library. So I am reasonably
> > confident that my analysis here is correct.
>
> Plauger also pointed out a case where "h" is actually useful. This:
>
> printf("%hhx %hx %x\n", -1, -1, -1);
>
> prints:
>
> ff ffff ffffffff
>
> on a system with 8-bit char, 16-bit short, and 32-bit int.
>
> "%hd" is less useful (unless you're willing to depend on the
> implementation-defined semantics of int-to-short conversion),
> but it would have been awkward to exclude it.

Plaugher pointed that out in a newsgroup, where it can be
referred back to anecdotally in the future. However, it should
say something which explicitly and unambiguously leads to that
conclusion in the *standard* itself, where it can be referred
back to definitively in the future.

That's what standards are for.

Phil
--
Religion is too important a matter to its devotees to be a subject of
ridicule. If they indulge in absurdities, they are to be pitied rather
than ridiculed. -- Immanuel Kant (1724-1804), lecture at Konigsberg, 1775

Phil Carmody

unread,
Mar 31, 2014, 12:36:45 PM3/31/14
to
Tim Rentsch <t...@alumni.caltech.edu> writes:
> Keith Thompson <ks...@mib.org> writes:
> > Tim Rentsch <t...@alumni.caltech.edu> writes:
> >> My comments here are consistent with those of P.J. Plauger in his
> >> several responses shown on this page. P.J. Plauger has been
> >> involved with C standardization efforts since the original
> >> ANSI work, served as project editor for WG14 in the early 1990's,
> >> and is widely recognized as a leading authority (if not the
> >> leading authority) on the C standard library. So I am reasonably
> >> confident that my analysis here is correct.
> >
> > Plauger also pointed out a case where "h" is actually useful. This:
> >
> > printf("%hhx %hx %x\n", -1, -1, -1);
> >
> > prints:
> >
> > ff ffff ffffffff
> >
> > on a system with 8-bit char, 16-bit short, and 32-bit int.
> >
> > "%hd" is less useful (unless you're willing to depend on the
> > implementation-defined semantics of int-to-short conversion),
> > but it would have been awkward to exclude it.
>
> In the original wording, the behavior of "%hhx" and "%hx" are
> both defined when used with -1 arguments (ie, on 8/16/32-width
> implementations). The reason is, both "%hhx" and "%hx" expect
> arguments that would have been promoted to 'int', and so read
> them as int's, before converting to unsigned {char/short}.

That's not my interpretation of the original wording.
Or should I say there's enough ambiguity in the original
that it's certainly not guaranteed by that wording, and
I would have presumed the alternative interpretation should
be the more reasonable one:

"h -- Specifies that a following d, i, o, u, x, or X conversion
specifier applies to a short int or unsigned short int argument..."

I interpreted that to mean that d and i would expect a short
int argument expression, and that o, u, x, and X would expect
an unsigned short int argument expression. (Which would of
course be promoted to non-short actual arguments.)

> However, under the changed wording suggested above, both "%hhx"
> and "%hx" would read their arguments as _unsigned_ int, which
> leads to undefined behavior if -1 is given as an argument.
>
> (Incidental note: giving a -1 argument for a %x specifier is
> undefined behavior regardless.)

That directly contradicts the "prints:" assertion which
Keith makes above. And adds an element of surprise that
I would hope was unwanted by the C standard committee.
-1 is not more usable where an unsigned short is desired
than it is where an unsigned int is desired in any other
context that I can think of. What's so special about printf()?

> The difference in whether an argument value for, eg, %hx, is
> read as 'int' or 'unsigned int' is part of the motivation for
> having the h and hh length modifiers to begin with. If we have,
> for example, an unsigned short value to supply, we can use %hx as
> a specifier, and not care whether unsigned short promotes to int
> or unsigned int. Of course we could cast an unsigned short
> argument to unsigned int, and avoid the problem that way, but
> using %hx obviates the need for casting.

Or the standard could have been stricter, and defined the
promotion rules less flexibly.

Keith Thompson

unread,
Mar 31, 2014, 2:13:48 PM3/31/14
to
Phil Carmody <thefatphi...@yahoo.co.uk> writes:
> Keith Thompson <ks...@mib.org> writes:
>> Tim Rentsch <t...@alumni.caltech.edu> writes:
[...]
>> > Incidentally, doing a google search (now that "google groups" is
>> > essentially useless), turned up this link (apparently from the
>> > 2007 thread mentioned in another response) -
>> >
>> > http://bytes.com/topic/c/answers/591625-usefulness-hd-printf
> ...
>> With a little searching, the same thread can be found on
>> groups.google.com:
>>
>> https://groups.google.com/forum/#!msg/comp.lang.c/-MQtBj-5cYU/VtegsfBVobYJ
>
> And lets see how well that works:
> """
> $ w3m -dump 'https://groups.google.com/forum/#!msg/comp.lang.c/-MQtBj-5cYU/VtegsfBVobYJ'
> Google'i gruppide arutelude kasutamiseks peate lubama oma brauseri seadetest
> JavaScripti ning seej?rel seda lehte v?rskendama.
>
> <xmp>.</xmp>
[...]
> <div style="clear:both"></div></div></body></html>
> """
>
> I think we can call that pretty much useless. The bytes.com
> renders perfectly readably under similar test conditions.

Yes, it's pretty much useless to view it using w3m.

The groups.google.com link works correctly in browsers that support
JavaScript. lynx, another text-based browser, displays an error
message that suggests enabling JavaScript (which as far as I know
is possibly only by using a different browser).

I'm not arguing that that's a reasonable restriction. But bytes.com
appears to have appropriated my writing and that of others who post
to this newsgroup (which isn't necessary bad, since Usenet posts
are public) and presented it with no acknowledgement of where
it originated, implying that it was posted to their own forums
(which I object to). That's reason enough for me to avoid using
or referencing bytes.com.

[...]

> Plaugher pointed that out in a newsgroup, where it can be
> referred back to anecdotally in the future. However, it should
> say something which explicitly and unambiguously leads to that
> conclusion in the *standard* itself, where it can be referred
> back to definitively in the future.
>
> That's what standards are for.

Agreed.

Tim Rentsch

unread,
Apr 10, 2014, 8:20:17 PM4/10/14
to
I agree the current (aka original) wording is not completely
clear about what is meant. In fact I think we are mostly in
agreement, with the confusion arising from how things are
expressed.

> and I would have presumed the alternative interpretation

We have some crossed wires here. I was comparing two wordings,
not two interpretations of the same wording. I didn't say,
because I thought it was obvious from context, that the original
wording was meant (ie, should be interpreted as) how P.J. Plauger
explicated it should (which also coincides with my own view, not
entirely coincidentally). The alternate wording (proposed by
Keith Thompson) was quoted in my posting but somehow got taken
out of your response. When you say "alternative interpretation"
I don't know whether you mean another interpretation of the
original wording, or some (other?) interpretation of the wording
proposed by Keith Thompson.

> should be the more reasonable one:
>
> "h -- Specifies that a following d, i, o, u, x, or X conversion
> specifier applies to a short int or unsigned short int argument..."
>
> I interpreted that to mean that d and i would expect a short
> int argument expression, and that o, u, x, and X would expect
> an unsigned short int argument expression. (Which would of
> course be promoted to non-short actual arguments.)

I take it much the same way, but with an notable difference:
that d and i expect--but do not require--a short int argument was
given, and behave accordingly; and similarly o, u, x, and X with
unsigned short int. Part of "behaving accordingly" includes
converting the value received, which must be either an int or an
unsigned int, to short or unsigned short, respectively, in case
the original argument expression was not of the type expected
(ie, before being promoted via the standard promotion rules).
Note that the Standard states, in normative text, that this
conversion will take place; there is no reason for the Standard
to say that unless the possibility of a non-short type argument
expression being given has to be accommodated and is meant to be
defined behavior.


>> However, under the changed wording suggested above, both "%hhx"
>> and "%hx" would read their arguments as _unsigned_ int, which
>> leads to undefined behavior if -1 is given as an argument.
>>
>> (Incidental note: giving a -1 argument for a %x specifier is
>> undefined behavior regardless.)
>
> That directly contradicts the "prints:" assertion which
> Keith makes above.

The "prints:" assertion was made under (Plauger's interpretation
of) the original wording. My statement about undefined behavior
was made under Keith's proposed revised wording (which I believe
I interpreted as Keith intended, although there too there may be
some ambiguity). So my statement doesn't contradict the stated
assertion, as they are made under different sets of assumptions.

> And adds an element of surprise that
> I would hope was unwanted by the C standard committee.
> -1 is not more usable where an unsigned short is desired
> than it is where an unsigned int is desired in any other
> context that I can think of. What's so special about printf()?

What I think you're saying here, at least partly, is that the
rules for argument passing to printf() are (or should be) just the
same as any other function that uses <stdarg.h>, va_arg(), etc.
I agree. More specifically, I agree that they should be, and
also believe the authors of the C standard intend them to be, the
same rules in both cases.


>> The difference in whether an argument value for, eg, %hx, is
>> read as 'int' or 'unsigned int' is part of the motivation for
>> having the h and hh length modifiers to begin with. If we have,
>> for example, an unsigned short value to supply, we can use %hx as
>> a specifier, and not care whether unsigned short promotes to int
>> or unsigned int. Of course we could cast an unsigned short
>> argument to unsigned int, and avoid the problem that way, but
>> using %hx obviates the need for casting.
>
> Or the standard could have been stricter, and defined the
> promotion rules less flexibly.

I'm not sure what you're trying to say here. The rules for
promotion of printf() arguments are just the same as for any
other variadic function. The point of %hx is (among other
things) to make it easy to match a conversion specifier to a
type, while observing the regular promotion rules, but
without having to worry about possible implementation
dependencies. Are you saying something about the regular
argument promotion rules, or are you saying something
specifically about the rules for printf() arguments (which I
believe are meant to be the same as any other variadic
function)? Or am I completely misunderstanding you and you
are talking about something else altogether? Of course the
Standard could have required %hx be supplied with an unsigned
short typed expression for that argument, and any other type
would be undefined behavior -- but doing that would put the
rules for printf() distinctly at odds with those for regular
variadic functions, which IMO would be a very poor choice,
and also I believe contrary to what the Standard's authors
intended for how printf() et al should work (as evidenced by
the rule for converting the argument value to a type other
than the type used to pass it, among other things).

Tim Rentsch

unread,
Apr 10, 2014, 8:44:28 PM4/10/14
to
I don't, because it has (as I read it) different semantics
than what I believe are the originally intended semantics,
as explained in my earlier posting.

>> > My comments here are consistent with those of P.J. Plauger in his
>> > several responses shown on this page. P.J. Plauger has been
>> > involved with C standardization efforts since the original
>> > ANSI work, served as project editor for WG14 in the early 1990's,
>> > and is widely recognized as a leading authority (if not the
>> > leading authority) on the C standard library. So I am reasonably
>> > confident that my analysis here is correct.
>>
>> Plauger also pointed out a case where "h" is actually useful. This:
>>
>> printf("%hhx %hx %x\n", -1, -1, -1);
>>
>> prints:
>>
>> ff ffff ffffffff
>>
>> on a system with 8-bit char, 16-bit short, and 32-bit int.
>>
>> "%hd" is less useful (unless you're willing to depend on the
>> implementation-defined semantics of int-to-short conversion),
>> but it would have been awkward to exclude it.
>
> Plaugher pointed that out in a newsgroup, where it can be
> referred back to anecdotally in the future. However, it should
> say something which explicitly and unambiguously leads to that
> conclusion in the *standard* itself, where it can be referred
> back to definitively in the future.

I don't disagree, but members of the WG14 committee may feel
they have already done that. For better or worse, they are
the ones who will decide whether or what action might be
indicated in this matter.
0 new messages