Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Getting started with AVR and C

137 views
Skip to first unread message

Ivan Shmakov

unread,
Nov 24, 2012, 4:18:32 PM11/24/12
to
>>>>> Robert Roland <fa...@ddress.no> writes:

[Cross-posting to news:comp.lang.c, in the hope that someone
could provide more suggestions, or correct me on C.]

[...]

> 2. Where do I start learning C? Is there a good online tutorial
> somewhere? I'd also be willing to buy a book. Is there one that
> stands out as the best?

Frankly, I don't quite understand how did I learn C myself.
FWIW, there were hardly any good book on that that I've read.

Two aspects of C are probably to be highlighted:

* first of all, unlike some other, and higher-level, languages
(like Pascal, BASIC, etc.), C has a very concise set of
syntactic constructs; most of the power lies in libraries, and
should you end up using AVR Libc, be sure to check its
reference manual [1] (it isn't as good as the GNU C Library
manual I'm using for the most time I need information on C,
but it's still useful);

* the C constructs tend to be translated into assembly in a
rather straightforward manner (unless advanced optimization is
involved, that is); consider, e. g.:

int8_t i; /* not necessarily translated; may force
the compiler to "allocate" a register
(say, r7), or a memory cell */
i = 3; /* ldi r7, 3 */
i += 2; /* adi r7, 2 */
i++; /* inc r7 */
if (i < 9) { /* cpi r7, 9 ; brge else_a */
int8_t j = 5; /* ldi r8, 5 */
while (j >= 3) { /* while_a:
cpi r8, 3
brlt end_while_a */
PORTB ^= 1; /* in r9, PORTB
eoi r9, 1
out PORTB, r9 */
} /* jmp while_a
end_while_a: */
} /* else_a: */

Not to undermine its value, but as could be seen (I hope) from
this example, for the most part, C only manages registers and
memory (including function calling conventions) while the rest
of its /syntax/ is comparable to that of a library of assembly
language macros of some sort.

A cheat sheet for most of the C operators would probably be
something like the following (where a, b, c, ... are either
numeric literals, variable identifiers, or expressions.) Note
that whenever all the operands are integer, an integer operation
is performed. (So, 7 / 3 is 2.) The result is as wide as the
widest of the operands. (So, i + j is 0 if i is 255, j is 1,
and both are declared to be of the 8-bit unsigned integer
uint8_t type.)

Operation Value Side-effect

Operations free of side-effects

+ a a
- a - a
* a the value of the memory cell at address
a
& a the address of a (must be an "l-value")
~ a bitwise negated a
! a 1 if a is 0,
0 otherwise
a, b b
NB: a is evaluated first, its result discarded.
a + b a + b
a - b a - b
a * b a b
a / b a / b (quotient of)
a % b remainder of a / b
a & b a (bitwise and) b
a | b a (bitwise or) b
a ^ b a (bitwise exclusive or) b
a << b a times 2 ^b (shift left)
a >> b a times 2 ^(-b) (shift right)
a < b 1 if a is less than b,
0 otherwise
a > b 1 if a is greater than b,
0 otherwise
a == b 1 if a is equal to b,
0 otherwise
a <= b 1 if a is less than or equal to b,
0 otherwise
a >= b 1 if a is greater than or equal to b,
0 otherwise

Conditional operations

a && b a is evaluated;
if a is non-zero, the value is b;
otherwise, the value is 0, while b is
not evaluated at all
a || b a is evaluated;
if a is zero, the value is b;
otherwise, the value is a, while b is
not evaluated at all
a ? b : c a is evaluated first;
if a is non-zero, the value is b;
otherwise, the value is c;
the other ("unused") expression is not
evaluated

Operations with side-effects

NB: a must be an "l-value"
a++ a a set to a + 1
a-- a a set to a - 1
++a a + 1 a set to value
--a a - 1 a set to value
a = b b a set to value
a += b a + b a set to value
a -= b a - b a set to value
a *= b a b a set to value
a /= b a / b a set to value
a &= b a (bitwise and) b a set to value
a |= b a (bitwise or) b a set to value
a ^= b a (bitwise xor) b a set to value
a <<= b a times 2 ^b a set to value
a >>= b a times 2 ^(-b) a set to value

Naturally, both "=" and "," can be "nested", thus:

for (a = b = 0, c = 5; c > 0; a++, b++, c--) {
/* ... */
}

To note is that the for () <statement> form is just a short-hand
for a specific while () loop. For instance, the for ()
statement above can be rewritten as follows:

a = b = 0, c = 5;
while (c > 0) {
/* ... */
a++, b++, c--;
}

Thus, the only convenience of for () is that it allows for the
"at-the-end-of-the-loop" part to be written above the loop body
itself (i. e., together with the loop condition.)

One more thing to note is that there're two basic contexts: the
statement context, and the expression context. The switch from
the former to the latter usually takes place in obvious places,
while it isn't possible (in standard C; AFAIK) to switch from
the latter to the former. E. g.:

/* statement context */
while (a < 5 /* expression context */) {
/* statement context */
b = 4 /* expression context */ ;
/* NB: cannot switch back to the statement context, like: */
/* c = while (b > 0) { /* ... */ } ; */
}

As one may need a conditional operator in either context, C has
both the ?:-operator (see above), and (perhaps a more
conventional) if ():

if (a) {
/* the code here will be executed iff a is non-zero */
} else {
/* the code here will be executed otherwise */
}

The { }-grouping is only necessary if more than one statement is
needed as the body; otherwise, it may be elided, like:

if (a) b = c;

This allows for convenient nesting, like:

if (a) {
/* ... */
} else if (b) {
/* ... */
} else {
/* ... */
}

A similar idiom is possible for the ?:-operator just as well.
Consider, e. g.:

a = (b ? c
: d ? e
: f);

which is not dissimilar to more verbose (and error-prone):

if (b) {
a = c;
} else if (d) {
a = e;
} else {
a = f;
}

An example program for an AVR could be as follows.

#include <avr/io.h> /* for PORTB, DDRB, etc. */
#include <util/delay.h> /* for _delay_ms () */

/* global variable declarations; not necessary in this example */

static void
blink_led (void)
{
PORTB ^= (1 << PB0); /* toggle PB0 */
_delay_ms (500); /* wait for 0.5 s */
PORTB ^= (1 << PB0); /* toggle PB0 again */
_delay_ms (500); /* wait for 0.5 s more */
}

/* the main () function is the conventional program's entry point */
int
main ()
{
/* set up PB0 for output, all the other tri-stated */
DDRB = (DDRB0 << 1);

/* enter infinite loop */
while (1) {
/* call our function */
blink_led ();
}

/* never reached */
return 0;
}

For sure, there's over than a dozen of individual syntactic
constructs more (and then there's a handful or so of the
preprocessor #-directives, too), but I hope that with the above,
reading the sources would become a bit easier task.

[1] http://www.nongnu.org/avr-libc/user-manual/

--
FSF associate member #7257

Richard Damon

unread,
Nov 24, 2012, 5:09:11 PM11/24/12
to
On 11/24/12 4:18 PM, Ivan Shmakov wrote:
> for (a = b = 0, c = 5; c > 0; a++, b++, c--) {
> /* ... */
> }
>
> To note is that the for () <statement> form is just a short-hand
> for a specific while () loop. For instance, the for ()
> statement above can be rewritten as follows:
>
> a = b = 0, c = 5;
> while (c > 0) {
> /* ... */
> a++, b++, c--;
> }

This is not completely correct in general. For instance if you replace
the /* ... */ with continue; then the for loop jumps from the continue
statement to the increment clause of the loop, but the while loop jumps
to the beginning of the loop body (causing an infinite loop).

Ben Bacarisse

unread,
Nov 24, 2012, 5:12:04 PM11/24/12
to
Ivan Shmakov <onei...@gmail.com> writes:

>>>>>> Robert Roland <fa...@ddress.no> writes:
>
> [Cross-posting to news:comp.lang.c, in the hope that someone
> could provide more suggestions, or correct me on C.]

I'll have a look...

<snip>
> [...] Note
> that whenever all the operands are integer, an integer operation
> is performed. (So, 7 / 3 is 2.) The result is as wide as the
> widest of the operands. (So, i + j is 0 if i is 255, j is 1,
> and both are declared to be of the 8-bit unsigned integer
> uint8_t type.)

No the return will be of type int in that case. The rules are rather
involved, but the gist of it is that everything "smaller" than an int
gets converted to an int. When the types involved are int or larger,
both get converted to the larger type. Mixed signed and unsigned types
generally result in the signed operand being converted to the type of
the unsigned operand (after promotion).

The standard takes pages to describe these rules, so there is no way I
can summarise them here with 100% accuracy. Given that you summary is
not short, it might be worth including them. You'd then need to say
to which operators they apply (for example they don't apply to the
shift operators).

> Operation Value Side-effect
>
> Operations free of side-effects
>
> + a a
> - a - a
> * a the value of the memory cell at address
> a

This rather reinforces a view of C are lower-level than it really is.
The result might not square with how people think of a memory cell (for
example, if a is a function pointer, or when it is pointer to a struct
type).

> & a the address of a (must be an "l-value")
> ~ a bitwise negated a
> ! a 1 if a is 0,
> 0 otherwise

You don't talk about array the indexing operator, [], not the function
call operator, (). there are others, too, like sizeof and cast
operators. In tabular summary, I don't think it hurts to be complete.

> a, b b
> NB: a is evaluated first, its result discarded.
> a + b a + b
> a - b a - b
> a * b a b
> a / b a / b (quotient of)
> a % b remainder of a / b
> a & b a (bitwise and) b
> a | b a (bitwise or) b
> a ^ b a (bitwise exclusive or) b
> a << b a times 2 ^b (shift left)
> a >> b a times 2 ^(-b) (shift right)

These ones are tricky because of all the corner cases (a shift equal or
greater than the operand size, a shift of a bit into the sign position,
a right shift of a negative quantity). In a summary like this may just
a footnote to "beware".
Yes and it might help to say which of the expression yield and lvalue.
For example, you can write ++*a but not ++!a. It's might well be
obvious, but you could have a column for "is an lvalue".

> a++ a a set to a + 1
> a-- a a set to a - 1
> ++a a + 1 a set to value
> --a a - 1 a set to value
> a = b b a set to value
> a += b a + b a set to value
> a -= b a - b a set to value
> a *= b a b a set to value
> a /= b a / b a set to value

a %=b is missing. But maybe it's better to generalise: a op= b and say what
op can be?

> a &= b a (bitwise and) b a set to value
> a |= b a (bitwise or) b a set to value
> a ^= b a (bitwise xor) b a set to value
> a <<= b a times 2 ^b a set to value
> a >>= b a times 2 ^(-b) a set to value
>
> Naturally, both "=" and "," can be "nested", thus:

In most tables of operators, you see both priority and associativity.
Assign ment does not yield an lvalue, so a = b = 0 only works because =
associates to the right a = (b = 0). Most C binary operators associate
to the left (i.e. a - b - c means (a - b) - c).

You could make a really rich summary table that shows priority,
associativity, whether the expression denotes an lvalue and what happens
to the operands (are they just promoted as for the shift operands or are
the "usual arithmetic conversions" applied as for + and *). I can see
why you would want to avoid too much detail in a simple explanation like
this, but it does seem like a useful thing to do.

> for (a = b = 0, c = 5; c > 0; a++, b++, c--) {
> /* ... */
> }
>
> To note is that the for () <statement> form is just a short-hand
> for a specific while () loop. For instance, the for ()
> statement above can be rewritten as follows:
>
> a = b = 0, c = 5;
> while (c > 0) {
> /* ... */
> a++, b++, c--;
> }

Provided that /* ... */ contains no continue statements (except as part
of a nested statement of course).

> Thus, the only convenience of for () is that it allows for the
> "at-the-end-of-the-loop" part to be written above the loop body
> itself (i. e., together with the loop condition.)
>
> One more thing to note is that there're two basic contexts: the
> statement context, and the expression context. The switch from
> the former to the latter usually takes place in obvious places,
> while it isn't possible (in standard C; AFAIK) to switch from
> the latter to the former. E. g.:
>
> /* statement context */
> while (a < 5 /* expression context */) {
> /* statement context */
> b = 4 /* expression context */ ;
> /* NB: cannot switch back to the statement context, like: */
> /* c = while (b > 0) { /* ... */ } ; */
> }

A sad omission for fans of BCPL!

> As one may need a conditional operator in either context, C has
> both the ?:-operator (see above), and (perhaps a more
> conventional) if ():
>
> if (a) {
> /* the code here will be executed iff a is non-zero */
> } else {
> /* the code here will be executed otherwise */
> }
>
> The { }-grouping is only necessary if more than one statement is
> needed as the body; otherwise, it may be elided, like:
>
> if (a) b = c;
>
> This allows for convenient nesting, like:
>
> if (a) {
> /* ... */
> } else if (b) {
> /* ... */
> } else {
> /* ... */
> }
>
> A similar idiom is possible for the ?:-operator just as well.
> Consider, e. g.:
>
> a = (b ? c
> : d ? e
> : f);
>
> which is not dissimilar to more verbose (and error-prone):

You will find disagreement about that parenthetical remark in comp.lang.c.

> if (b) {
> a = c;
> } else if (d) {
> a = e;
> } else {
> a = f;
> }

<snip example>
--
Ben.

Nick Keighley

unread,
Nov 25, 2012, 8:41:42 AM11/25/12
to
On Nov 24, 9:18 pm, Ivan Shmakov <oneing...@gmail.com> wrote:
> >>>>> Robert Roland <f...@ddress.no> writes:
>
>         [Cross-posting to news:comp.lang.c, in the hope that someone
>         could provide more suggestions, or correct me on C.]
>
> [...]
>
>  > 2. Where do I start learning C?  Is there a good online tutorial
>  > somewhere?  I'd also be willing to buy a book.  Is there one that
>  > stands out as the best?
>
>         Frankly, I don't quite understand how did I learn C myself.
>         FWIW, there were hardly any good book on that that I've read.

K&R?

Rosario1903

unread,
Nov 27, 2012, 4:21:21 AM11/27/12
to
On Sun, 25 Nov 2012 04:18:32 +0700, Ivan Shmakov <onei...@gmail.com>
wrote:


> int8_t i; /* not necessarily translated; may force
> the compiler to "allocate" a register
> (say, r7), or a memory cell */
> i = 3; /* ldi r7, 3 */
> i += 2; /* adi r7, 2 */
> i++; /* inc r7 */
> if (i < 9) { /* cpi r7, 9 ; brge else_a */
> int8_t j = 5; /* ldi r8, 5 */
> while (j >= 3) { /* while_a:
> cpi r8, 3
> brlt end_while_a */
> PORTB ^= 1; /* in r9, PORTB
> eoi r9, 1
> out PORTB, r9 */
> } /* jmp while_a
> end_while_a: */
> } /* else_a: */

for me

> i = 3; /* ldi r7, 3 */
> i += 2; /* adi r7, 2 */
> i++; /* inc r7 */

are the same
one call "r7", "i" and are the same

for the remain i possibly prefer my macroized
part of the right side

than in the right side one can controll better the stack...

>[1] http://www.nongnu.org/avr-libc/user-manual/

Ivan Shmakov

unread,
Nov 28, 2012, 12:54:45 AM11/28/12
to
>>>>> Dave Nadler <d...@nadler.com> writes:
>>>>> On Saturday, November 24, 2012 5:12:09 PM UTC-5, Ben Bacarisse wrote:

[Cross-posting to news:comp.lang.c, and dropping
news:comp.arch.embedded from Followup-To:.]

[...]

>>> /* statement context */
>>> while (a < 5 /* expression context */) {
>>> /* statement context */
>>> b = 4 /* expression context */ ;
>>> /* NB: cannot switch back to the statement context, like: */
>>> /* c = while (b > 0) { /* ... */ } ; */
>>> }

>> A sad omission for fans of BCPL!

> Wow ! I last programmed in BCPL in late 1977. And I still miss this
> construct !

Well, it's available as a GCC extension [1] at the least.
Consider, e. g.:

$ cat < g4u68ss8m8usbkqkx33wtfwws8.c
/*** g4u68ss8m8usbkqkx33wtfwws8.c -*- C -*- */

#include <stdio.h> /* for printf () */

int
main ()
{
int a = ({
int x = 21, y;
if (1) {
y = 2 * x;
} else {
y = 13;
}
/* . */
y;
});

printf ("a = %d\n", a);

/* . */
return 0;
}

/*** g4u68ss8m8usbkqkx33wtfwws8.c ends here */
$ make g4u68ss8m8usbkqkx33wtfwws8
cc g4u68ss8m8usbkqkx33wtfwws8.c -o g4u68ss8m8usbkqkx33wtfwws8
$ ./g4u68ss8m8usbkqkx33wtfwws8
a = 42
$

[1] http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Statement-Exprs.html

> Though I don't miss the Lvalue/Rvalue persnickity business (though
> BCPL wasn't as silly as BLISS).

> Thanks for the memories,

Ivan Shmakov

unread,
Nov 28, 2012, 1:37:49 AM11/28/12
to
>>>>> Ben Bacarisse <ben.u...@bsb.me.uk> writes:
>>>>> Ivan Shmakov <onei...@gmail.com> writes:

[...]

>> Note that whenever all the operands are integer, an integer
>> operation is performed. (So, 7 / 3 is 2.) The result is as wide as
>> the widest of the operands. (So, i + j is 0 if i is 255, j is 1,
>> and both are declared to be of the 8-bit unsigned integer uint8_t
>> type.)

> No, the return will be of type int in that case. The rules are
> rather involved, but the gist of it is that everything "smaller" than
> an int gets converted to an int. When the types involved are int or
> larger, both get converted to the larger type.

Thus:

Generally, the result is as wide as the widest of the operands, or
"int", if no operand is wider than "int". The result then may be
truncated on function application or assignment. (For instance,
i += j is 0 if i is 1, j is 255, and both are declared to be of the
8-bit unsigned integer uint8_t type.)

> Mixed signed and unsigned types generally result in the signed
> operand being converted to the type of the unsigned operand (after
> promotion).

> The standard takes pages to describe these rules, so there is no way
> I can summarise them here with 100% accuracy. Given that you summary
> is not short, it might be worth including them. You'd then need to
> say to which operators they apply (for example they don't apply to
> the shift operators).

Perhaps. Though this seems to open yet another can of worms.

[...]

>> * a the value of the memory cell at address
>> a

> This rather reinforces a view of C are lower-level than it really is.
> The result might not square with how people think of a memory cell
> (for example, if a is a function pointer, or when it is pointer to a
> struct type).

Yes, but nowhere in the summary I talk about struct's (even
should they be regarded as one of the essential conveniences of
the language), and function pointers seem far too advanced a
concept for those just starting to use C. And yes, pointers in
C are typed, which made me hesitate to mention them at all in
this summary.

>> & a the address of a (must be an "l-value")
>> ~ a bitwise negated a
>> ! a 1 if a is 0,
>> 0 otherwise

> You don't talk about array the indexing operator, [], not the
> function call operator, (). there are others, too, like sizeof and
> cast operators.

... And also . and ->. But then, I've omitted both arrays and
structs altogether, and show function calls and variable
declarations only on examples.

> In tabular summary, I don't think it hurts to be complete.

Frankly, I've tried to focus on "consistency", not completeness.
That is, this summary was intended to provide an example of
"basic", "self-contained" C programming, even if unsuitable for
the majority of practical tasks.

[...]

>> a << b a times 2 ^b (shift left)
>> a >> b a times 2 ^(-b) (shift right)

> These ones are tricky because of all the corner cases (a shift equal
> or greater than the operand size, a shift of a bit into the sign
> position,

Shouldn't these issues be already familiar to those coming from
the embedded programming background?

> a right shift of a negative quantity).

Seems like an interesting case, indeed. (I fail to recall if I
ever needed to do that.)

> In a summary like this may just a footnote to "beware".

ACK, thanks.

[...]

>> Operations with side-effects

>> NB: a must be an "l-value"

> Yes and it might help to say which of the expression yield and
> lvalue. For example, you can write ++*a but not ++!a. It's might
> well be obvious, but you could have a column for "is an lvalue".

Indeed, thanks.

[...]

>> a += b a + b a set to value
>> a -= b a - b a set to value
>> a *= b a b a set to value
>> a /= b a / b a set to value

> a %= b is missing.

Indeed. Seems like quite a rarely used operator, though.

> But maybe it's better to generalise: a op= b and say what op can be?

It may have its merits, but as long as simple text search is
considered, it makes sense to mention the exact form of all the
operators. (Even if they share the description.)

[...]

>> Naturally, both "=" and "," can be "nested", thus:

> In most tables of operators, you see both priority and associativity.
> Assign ment does not yield an lvalue, so a = b = 0 only works because
> = associates to the right a = (b = 0). Most C binary operators
> associate to the left (i. e. a - b - c means (a - b) - c).

> You could make a really rich summary table that shows priority,
> associativity, whether the expression denotes an lvalue and what
> happens to the operands (are they just promoted as for the shift
> operands or are the "usual arithmetic conversions" applied as for +
> and *). I can see why you would want to avoid too much detail in a
> simple explanation like this, but it does seem like a useful thing to
> do.

Well, it seems like there already is such a table at [1].

My point is that, more often than not, one doesn't bother about
precedence: the arithmetics follows the usual rules (a + b * c =
a + (b * c)), and when the other operators are involved, it does
no harm to parenthesize the subexpressions to make the order
explicit.

[1] http://en.wikibooks.org/wiki/C_Programming/Reference_Tables#Table_of_Operators

>> for (a = b = 0, c = 5; c > 0; a++, b++, c--) {
>> /* ... */
>> }

>> To note is that the for () <statement> form is just a short-hand for
>> a specific while () loop. For instance, the for () statement above
>> can be rewritten as follows:

>> a = b = 0, c = 5;
>> while (c > 0) {
>> /* ... */
>> a++, b++, c--;
>> }

> Provided that /* ... */ contains no continue statements (except as
> part of a nested statement of course).

Indeed, I've missed this case. Thanks!

[...]

>> A similar idiom is possible for the ?:-operator just as well.
>> Consider, e. g.:

>> a = (b ? c
>> : d ? e
>> : f);

>> which is not dissimilar to more verbose (and error-prone):

>> if (b) {
>> a = c;
>> } else if (d) {
>> a = e;
>> } else {
>> a = f;
>> }

> You will find disagreement about that parenthetical remark in
> comp.lang.c.

My point here is that when one (for whatever reason) has to
rename "a", it's easier to forget to update all the "if"
branches in the former example than the single reference in the
latter.

The same logic dictates the preference for a += b; over more
"Fortran-friendly" a = a + b;.

Ben Bacarisse

unread,
Nov 28, 2012, 9:49:41 AM11/28/12
to
Ivan Shmakov <onei...@gmail.com> writes:

>>>>>> Ben Bacarisse <ben.u...@bsb.me.uk> writes:
>>>>>> Ivan Shmakov <onei...@gmail.com> writes:
<snip>
> >> a << b a times 2 ^b (shift left)
> >> a >> b a times 2 ^(-b) (shift right)
>
> > These ones are tricky because of all the corner cases (a shift equal
> > or greater than the operand size, a shift of a bit into the sign
> > position,
>
> Shouldn't these issues be already familiar to those coming from
> the embedded programming background?

The trouble is that people often think that what happened on the CPUs
they've used before is what will happen on the next one. In other words
there's a tendency to extrapolate form "what happens" to "what is
defined to happen". But maybe the world of embedded programming is so
diverse that people rarely make these assumptions.

<snip>
> > You could make a really rich summary table that shows priority,
> > associativity, whether the expression denotes an lvalue and what
> > happens to the operands (are they just promoted as for the shift
> > operands or are the "usual arithmetic conversions" applied as for +
> > and *). I can see why you would want to avoid too much detail in a
> > simple explanation like this, but it does seem like a useful thing to
> > do.
>
> Well, it seems like there already is such a table at [1].

Well I meant something more than that, but I understand your desire to
keep things simple.

The interesting things about C operators are the result type and value,
the conversions that are done to the operands (simple promotion or "the
usual arithmetic conversions"), the precedence and associativity,
whether the result denotes an lvalue, and any side effects. Maybe
that's too much for a single table, but I might have a go though.

<snip>
--
Ben.

Frank Miles

unread,
Nov 28, 2012, 12:53:06 PM11/28/12
to

> Generally, the result is as wide as the widest of the operands, or
> "int", if no operand is wider than "int". The result then may be
> truncated on function application or assignment. (For instance, i
> += j is 0 if i is 1, j is 255, and both are declared to be of the
> 8-bit unsigned integer uint8_t type.)
>

And you have to be careful about how/when any expansions occur. For
example with gcc-avr, if you want

int32_t = int16_t * int16_t

(the full 32 bit result of a 16x16 bit multiply), you have to cast each
of the 16-bit operands to 32bits.

Grant Edwards

unread,
Nov 28, 2012, 1:05:06 PM11/28/12
to
Shouldn't casting just one of the 16 bit values work the same as
casting both of them?

--
Grant Edwards grant.b.edwards Yow! It's NO USE ... I've
at gone to "CLUB MED"!!
gmail.com

James Kuyper

unread,
Nov 28, 2012, 1:15:39 PM11/28/12
to
On 11/28/2012 12:53 PM, Frank Miles wrote:
...
> And you have to be careful about how/when any expansions occur. For
> example with gcc-avr, if you want
>
> int32_t = int16_t * int16_t
>
> (the full 32 bit result of a 16x16 bit multiply), you have to cast each
> of the 16-bit operands to 32bits.

I'm not familiar with gcc-avr. That constitutes a significant deviation
from standard C, where casting either operand would be sufficient to
guarantee implicit conversion of the other operand, in accordance with
the "usual arithmetic conversions". What is the reason for this difference?

Arlet Ottens

unread,
Nov 28, 2012, 1:17:18 PM11/28/12
to
There's no difference. For gcc-avr it also suffices to cast just one
operand.

Tim Wescott

unread,
Nov 28, 2012, 1:56:37 PM11/28/12
to
On Wed, 28 Nov 2012 18:05:06 +0000, Grant Edwards wrote:

> On 2012-11-28, Frank Miles <f...@u.washington.edu> wrote:
>>
>>> Generally, the result is as wide as the widest of the operands, or
>>> "int", if no operand is wider than "int". The result then may be
>>> truncated on function application or assignment. (For instance, i
>>> += j is 0 if i is 1, j is 255, and both are declared to be of the
>>> 8-bit unsigned integer uint8_t type.)
>>>
>>>
>> And you have to be careful about how/when any expansions occur. For
>> example with gcc-avr, if you want
>>
>> int32_t = int16_t * int16_t
>>
>> (the full 32 bit result of a 16x16 bit multiply), you have to cast each
>> of the 16-bit operands to 32bits.
>
> Shouldn't casting just one of the 16 bit values work the same as casting
> both of them?

Yes. But that's if you take "should" as indicating a moral direction,
rather than as an indication of what you can reasonably expect from every
tool chain.

I would expect that gcc would be ANSI compliant, and would therefore
promote both 16-bit integers to 32-bit before doing the multiply. But
I've worked with compilers in the past that didn't do this, so when
writing code that may be used in multiple places, I up-cast the same way
one votes in Chicago: early and often.

--
My liberal friends think I'm a conservative kook.
My conservative friends think I'm a liberal kook.
Why am I not happy that they have found common ground?

Tim Wescott, Communications, Control, Circuits & Software
http://www.wescottdesign.com

glen herrmannsfeldt

unread,
Nov 28, 2012, 4:45:51 PM11/28/12
to
In comp.lang.c Tim Wescott <t...@seemywebsite.com> wrote:
> On Wed, 28 Nov 2012 18:05:06 +0000, Grant Edwards wrote:

>> On 2012-11-28, Frank Miles <f...@u.washington.edu> wrote:

>>> And you have to be careful about how/when any expansions occur. For
>>> example with gcc-avr, if you want

>>> int32_t = int16_t * int16_t

>>> (the full 32 bit result of a 16x16 bit multiply), you have to cast each
>>> of the 16-bit operands to 32bits.

>> Shouldn't casting just one of the 16 bit values work the same as casting
>> both of them?

> Yes. But that's if you take "should" as indicating a moral direction,
> rather than as an indication of what you can reasonably expect from every
> tool chain.

> I would expect that gcc would be ANSI compliant, and would therefore
> promote both 16-bit integers to 32-bit before doing the multiply.

Maybe I am missing something here, but are there versions of gcc for 16
bit processors, with 16 bit int? If so, then promotion to int won't
promote to 32 bits without a cast.

-- glen

Tim Wescott

unread,
Nov 28, 2012, 4:55:41 PM11/28/12
to
Yes. But a compliant processor will take

int16_t a, b;
int32_t c

c = (int32_t)a * b;

and cast both a and b to 32-bit.

I have worked with compilers that would _demote_ a 32-bit to a 16 bit and
do the math, unless _both_ operands were cast to 32-bit, i.e.

c = (int32_t)a * (int32_t)b;

Wrong? Yes. But if that's what the compiler do, that's what you work
with...

Grant Edwards

unread,
Nov 28, 2012, 5:26:20 PM11/28/12
to
On 2012-11-28, Tim Wescott <t...@seemywebsite.com> wrote:
> On Wed, 28 Nov 2012 18:05:06 +0000, Grant Edwards wrote:

>>> And you have to be careful about how/when any expansions occur. For
>>> example with gcc-avr, if you want
>>>
>>> int32_t = int16_t * int16_t
>>>
>>> (the full 32 bit result of a 16x16 bit multiply), you have to cast each
>>> of the 16-bit operands to 32bits.
>>
>> Shouldn't casting just one of the 16 bit values work the same as casting
>> both of them?
>
> Yes. But that's if you take "should" as indicating a moral direction,
> rather than as an indication of what you can reasonably expect from every
> tool chain.
>
> I would expect that gcc would be ANSI compliant, and would therefore
> promote both 16-bit integers to 32-bit before doing the multiply.

Nope. On the target in question (AVR), an "int" is 16 bits (at least
by default). Same for msp430 (and maybe for some of the H8 targets).
I think there is a command-line option for some 16-bit targets to tell
gcc to use 32-bit representations for "int" instead of the defautl 16
bits.

> But I've worked with compilers in the past that didn't do this, so
> when writing code that may be used in multiple places, I up-cast the
> same way one votes in Chicago: early and often.

If, like AVR and msp430, an "int" is 16 bits, then you must cast at
least one of the two operands to a 32 bit integer type if you want a
16x16=>32 multiply.

--
Grant Edwards grant.b.edwards Yow! Mr and Mrs PED, can I
at borrow 26.7% of the RAYON
gmail.com TEXTILE production of the
INDONESIAN archipelago?

Tim Wescott

unread,
Nov 28, 2012, 6:32:02 PM11/28/12
to
On Wed, 28 Nov 2012 22:26:20 +0000, Grant Edwards wrote:

> On 2012-11-28, Tim Wescott <t...@seemywebsite.com> wrote:
>> On Wed, 28 Nov 2012 18:05:06 +0000, Grant Edwards wrote:
>
>>>> And you have to be careful about how/when any expansions occur. For
>>>> example with gcc-avr, if you want
>>>>
>>>> int32_t = int16_t * int16_t
>>>>
>>>> (the full 32 bit result of a 16x16 bit multiply), you have to cast
>>>> each of the 16-bit operands to 32bits.
>>>
>>> Shouldn't casting just one of the 16 bit values work the same as
>>> casting both of them?
>>
>> Yes. But that's if you take "should" as indicating a moral direction,
>> rather than as an indication of what you can reasonably expect from
>> every tool chain.
>>
>> I would expect that gcc would be ANSI compliant, and would therefore
>> promote both 16-bit integers to 32-bit before doing the multiply.
>
> Nope. On the target in question (AVR), an "int" is 16 bits (at least by
> default). Same for msp430 (and maybe for some of the H8 targets). I
> think there is a command-line option for some 16-bit targets to tell gcc
> to use 32-bit representations for "int" instead of the defautl 16 bits.

You mean "Absolutely", not "Nope". At least you do if you're referring
to a 16-bit int as being conformant to ANSI C.

Per ANSI C99 (http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf),
page 34, the minimum allowable value of INT_MAX is 32767. That fits
nicely inside a 16-bit signed number.

>> But I've worked with compilers in the past that didn't do this, so when
>> writing code that may be used in multiple places, I up-cast the same
>> way one votes in Chicago: early and often.
>
> If, like AVR and msp430, an "int" is 16 bits, then you must cast at
> least one of the two operands to a 32 bit integer type if you want a
> 16x16=>32 multiply.

Yes. And if you're using a broken, not-quite-compliant compiler that
needs to see _both_ numbers as 32-bit before it'll do a 32-bit operation,
then you need to cast _both_. (I'm pretty sure it was Intel's C compiler
for the '196).

James Kuyper

unread,
Nov 28, 2012, 7:02:32 PM11/28/12
to
On 11/28/2012 06:32 PM, Tim Wescott wrote:
> On Wed, 28 Nov 2012 22:26:20 +0000, Grant Edwards wrote:
>
>> On 2012-11-28, Tim Wescott <t...@seemywebsite.com> wrote:
...
>>> I would expect that gcc would be ANSI compliant, and would therefore
>>> promote both 16-bit integers to 32-bit before doing the multiply.
>>
>> Nope. On the target in question (AVR), an "int" is 16 bits (at least by
>> default). Same for msp430 (and maybe for some of the H8 targets). I
>> think there is a command-line option for some 16-bit targets to tell gcc
>> to use 32-bit representations for "int" instead of the defautl 16 bits.
>
> You mean "Absolutely", not "Nope". At least you do if you're referring
> to a 16-bit int as being conformant to ANSI C.
>
> Per ANSI C99 (http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf),
> page 34, the minimum allowable value of INT_MAX is 32767. That fits
> nicely inside a 16-bit signed number.

I'm not sure the point you're making here. The type resulting from
integer promotions is always either an 'int' or an 'unsigned int', and
they occur only for values of other integer types whose entire range can
be represented in the promoted type. Therefore, if int is 16 bits,
int16_t operands will not be promoted at all, much less promoted to a
32-bit int, in conflict with what you said you expected of an ANSI
compliant compiler. That's what his "Nope" was referring to.

On such a platform, the usual arithmetic conversions will cause one of
the operands to be converted implicitly to a 32-bit int if the other one
is explicitly converted to a 32-bit int. However, section 6.3.1.1p2
defines what an "integer promotion" is, and that definition doesn't
include those conversions.

Tim Wescott

unread,
Nov 28, 2012, 7:29:04 PM11/28/12
to
Grant did not include all of the context, so you need to read back a bit.

The original statement was that (a) int16_t * int16_t coughs up a 16-bit
result, unless (b) one of the int16_t numbers is cast to 32 bit.

Then I pointed out that (c) there are some older, non-compliant compilers
where you have to cast _both_ 16-bit operands to 32 bits to get a 32 bit
result, and (d) that I trusted that the gcc compiler was ANSI C
compliant. Statement (c) is important for the embedded space (which is
the group that I am replying from -- you must be from comp.lang.c)
because one does not always have the luxury of using a compliant tool
chain in embedded.

Then Grant came in, and if I'm correctly reading what he said, stated
that (e) the gnu-avr compiler is not ANSI-C compliant because it has 16
bit integers.

So I disagreed with (e), and pointed out where in the ANSI specification
type 'int' is, indeed, allowed to be 16 bit (and 1's compliment or sign-
magnitude, if you've got a perverse processor).

So you are correcting statement -- uh, (0), because no one made it (the
first quote from me refers to statement (b), and appears in its native
habitat two or three posts up in the thread). You are correct that
statement (0) is not true, however you are not correct in thinking that
it was said. I assume that you inferred it because you did not pick up
the context that Grant trimmed out.

David Brown

unread,
Nov 28, 2012, 7:35:07 PM11/28/12
to
The correct behaviour for C standards compliance is that when you
multiply two operands of different int size, the smaller one is promoted
to the size of the larger one. Then the multiply is carried out modulo
the size of the larger one. Then the result is truncated or extended as
needed to fit the target variable.

So the bit-size of the processor, and the bit-size of "int" on that
particular target, is irrelevant. And the size of the result variable
is also irrelevant (this catches out some newbies).

Given:

int16_t a, b;
int32_t c

c = (int32_t)a * b;

Then b is cast to int32_t, the 32-bit multiplication is carried out, and
the result assigned to c.

If you write just "c = a * b", then the multiplication is carried it at
16-bit, then promoted to 32-bit. This applies regardless of the
bit-size of the target - you will get the same effect on a 64-bit cpu as
on the 8-bit AVR.


If your compiler does 16-bit multiplications when you have "c =
(int32_t) a * b", and requires two "int32_t" casts to do 32-bit
multiplication, then your compiler is very badly broken. As Tim says,
badly broken compilers /do/ exist, so if you have to use them, then you
need to use two casts. But I personally don't think you need to write
your code to work with broken toolchains unless you actually have to.



Hans-Bernhard Bröker

unread,
Nov 28, 2012, 7:47:53 PM11/28/12
to
On 29.11.2012 01:35, David Brown wrote:

> The correct behaviour for C standards compliance is that when you
> multiply two operands of different int size, the smaller one is promoted
> to the size of the larger one.

Close, but not cigar. You forgot about types smaller than the
platform's "int". Those will be converted up to either signed or
unsigned int anyway, i.e. even if both operands are of the same size.

> So the bit-size of the processor, and the bit-size of "int" on that
> particular target, is irrelevant.

Incorrect. It is very relevant as soon as either of the operands' types
is smaller than "int" on the particular target.

The rule to remember is that C never does arithmetic on anything smaller
than an 'int'.

> Given:
>
> int16_t a, b;
> int32_t c
>
> If you write just "c = a * b", then the multiplication is carried it at
> 16-bit, then promoted to 32-bit.

Not if you're on a 32-bit target it isn't. Default conversion to 32-bit
int takes place first, so both operands are first converted to 32-bit,
then a 32 x 32 --> 32 bit multiply is carried out. At least in
principle (that is: modulo the "as-if rule").

Ben Bacarisse

unread,
Nov 28, 2012, 8:18:42 PM11/28/12
to
David Brown <david...@removethis.hesbynett.no> writes:
<snip>
> The correct behaviour for C standards compliance is that when you
> multiply two operands of different int size, the smaller one is
> promoted to the size of the larger one.

Not exactly, no, though there is some confusion because you talk of
different int sizes. int is one C type so there is only one int size,
but I'm assuming you meant "integer types of different size".

If that's what you meant, it's not quite right because multiplying a
short by a char (for example) will involve promoting both operands to
int. Other more outlandish examples include multiplying a char by a
_Bool and many cases involving bit fields.

> Then the multiply is carried
> out modulo the size of the larger one.

That's one commonly observed behaviour but it is not "the correct
behaviour". If the common type arrived at by the arithmetic conversions
is a signed type, the multiplication may overflow and anything at all
can happen (i.e. what happens is undefined by the C standard). Unsigned
integer arithmetic does not overflow.

> Then the result is truncated
> or extended as needed to fit the target variable.

Again, not quite. The result is converted to type of the object it is
being assigned to, and a great deal of leeway is given to
implementations when the target type is a signed int. If the result
can't be represented in the target type, either the result is
implementation defined or an implementation defined signal is raised.

For unsigned types, the behaviour is entirely defined by the C standard
(conversion modulo 2^width which is, as you say, truncation).

> So the bit-size of the processor, and the bit-size of "int" on that
> particular target, is irrelevant. And the size of the result variable
> is also irrelevant (this catches out some newbies).
>
> Given:
>
> int16_t a, b;
> int32_t c
>
> c = (int32_t)a * b;
>
> Then b is cast to int32_t, the 32-bit multiplication is carried out,
> and the result assigned to c.

(unless int happens to be wider than 32 bits)

> If you write just "c = a * b", then the multiplication is carried it
> at 16-bit, then promoted to 32-bit. This applies regardless of the
> bit-size of the target - you will get the same effect on a 64-bit cpu
> as on the 8-bit AVR.

The machine bit-size is not really the thing that matters. What matters
is the sizes assigned to the various types by the C implementation.
What you say is roughly correct for an implementation with a 16 bit int
type ("roughly" because of the possibility of overflow).

The size given to int is often the natural one (or one of the natural
ones) for the machine in question. When this is the case, the bit size
of the target does matter, but only because of the differing int sizes.

> If your compiler does 16-bit multiplications when you have "c =
> (int32_t) a * b", and requires two "int32_t" casts to do 32-bit
> multiplication, then your compiler is very badly broken. As Tim says,
> badly broken compilers /do/ exist, so if you have to use them, then
> you need to use two casts. But I personally don't think you need to
> write your code to work with broken toolchains unless you actually
> have to.

It leads to a special kind of hell! When you can't ever shake off the
idea that x, y or z once went wrong on compiler p, q or r, you end up
having to fold every trick you ever used to get your code past bad
compilers into every program.

--
Ben.

Tim Wescott

unread,
Nov 28, 2012, 8:19:47 PM11/28/12
to
On Wed, 28 Nov 2012 12:56:37 -0600, Tim Wescott wrote:

> On Wed, 28 Nov 2012 18:05:06 +0000, Grant Edwards wrote:
>
>> On 2012-11-28, Frank Miles <f...@u.washington.edu> wrote:
>>>
>>>> Generally, the result is as wide as the widest of the operands,
>>>> or "int", if no operand is wider than "int". The result then may
>>>> be truncated on function application or assignment. (For
>>>> instance, i += j is 0 if i is 1, j is 255, and both are declared
>>>> to be of the 8-bit unsigned integer uint8_t type.)
>>>>
>>>>
>>> And you have to be careful about how/when any expansions occur. For
>>> example with gcc-avr, if you want
>>>
>>> int32_t = int16_t * int16_t
>>>
>>> (the full 32 bit result of a 16x16 bit multiply), you have to cast
>>> each of the 16-bit operands to 32bits.
>>
>> Shouldn't casting just one of the 16 bit values work the same as
>> casting both of them?
>
> Yes. But that's if you take "should" as indicating a moral direction,
> rather than as an indication of what you can reasonably expect from
> every tool chain.
>
> I would expect that gcc would be ANSI compliant, and would therefore
> promote both 16-bit integers to 32-bit before doing the multiply. But
> I've worked with compilers in the past that didn't do this, so when
> writing code that may be used in multiple places, I up-cast the same way
> one votes in Chicago: early and often.

Just for clarification, since what I said above seems to be easy to
misread unless you pay close attention to context: take "promote both 16-
bit integers to 32-bit" and add in the context (about casting one or more
of the 16 bit values); the result reads "promote both 16-bit integers to
32-bit _if you cast just one 16-bit integer to 32 bit_".

The primary intent of my comment above was to point out that while an
ANSI C compliant compiler will convert both operands to and the result to
32-bits if you cast just one operand to 32 bits, there are compilers out
there that won't.

I did not mean to say -- and indeed it is not the case -- that a compiler
with 16-bit integers will automatically promote them to 32 bits just
because you want them to, or even just because the result is getting
stuck into a 32-bit number. For a nice ANSI-C compliant compiler you
only have to tell it once (by casting one of the operands). For at least
one compiler that I have used in the past, you had to tell it so over and
over again (by casting everything to 32 bits, oh joy).

Tim Wescott

unread,
Nov 28, 2012, 8:30:32 PM11/28/12
to
It's certainly what I would expect from gcc-avr. There's no reason you
can't make a beautifully compliant, reasonably efficient compiler that
works well on the AVR.

Tim Wescott

unread,
Nov 28, 2012, 9:07:30 PM11/28/12
to
Me, I just try to remember that x, y or z went wrong on some compiler
some time, so that if I see symptoms again those problems are on my short
list, and maybe even a fix or two.

Like Texas Instrument's Code Composter for the TMS320F2812, which has a
32-bit "double". #$%@.

glen herrmannsfeldt

unread,
Nov 28, 2012, 9:31:36 PM11/28/12
to
In comp.lang.c David Brown <david...@removethis.hesbynett.no> wrote:

(snip, someone wrote)
>>> I would expect that gcc would be ANSI compliant, and would therefore
>>> promote both 16-bit integers to 32-bit before doing the multiply.

(snip, then I wrote)
>> Maybe I am missing something here, but are there versions of gcc for 16
>> bit processors, with 16 bit int? If so, then promotion to int won't
>> promote to 32 bits without a cast.

> The correct behaviour for C standards compliance is that when you
> multiply two operands of different int size, the smaller one is promoted
> to the size of the larger one. Then the multiply is carried out modulo
> the size of the larger one. Then the result is truncated or extended as
> needed to fit the target variable.

I haven't read the standard so recently, but I thought that was only
after the default promotions. Values smaller than int would be promoted
to int, then the size of the multiply (and product) determined.

If you multiply two 8 bit unsigned char values, is the product
modulo 256? I don't think so.

> So the bit-size of the processor, and the bit-size of "int" on that
> particular target, is irrelevant. And the size of the result variable
> is also irrelevant (this catches out some newbies).

> Given:

> int16_t a, b;
> int32_t c

> c = (int32_t)a * b;

> Then b is cast to int32_t, the 32-bit multiplication is carried out, and
> the result assigned to c.

Maybe not completely irrelevent, consider a system with a 64 bit int.

> If you write just "c = a * b", then the multiplication is carried it at
> 16-bit, then promoted to 32-bit. This applies regardless of the
> bit-size of the target - you will get the same effect on a 64-bit cpu as
> on the 8-bit AVR.

Regardless of the target size, but not of int size.

> If your compiler does 16-bit multiplications when you have "c =
> (int32_t) a * b", and requires two "int32_t" casts to do 32-bit
> multiplication, then your compiler is very badly broken. As Tim says,
> badly broken compilers /do/ exist, so if you have to use them, then you
> need to use two casts. But I personally don't think you need to write
> your code to work with broken toolchains unless you actually have to.

Now it gets interesting. When were the int_32_t and int_16_t added to C?

Seems to me that compilers only claiming a version of the standard
before they were added wouldn't have to use the same rules.

Consider that a compiler might have a int_128_t that it could add and
subtract, but not multiply or divide. Maybe it can generate a 128 bit
product from two 64 bit operands. Does the standard prohibit a
compiler from offering those operations?

-- glen

Ben Bacarisse

unread,
Nov 28, 2012, 10:36:12 PM11/28/12
to
glen herrmannsfeldt <g...@ugcs.caltech.edu> writes:

> In comp.lang.c David Brown <david...@removethis.hesbynett.no> wrote:
>
> (snip, someone wrote)
>>>> I would expect that gcc would be ANSI compliant, and would therefore
>>>> promote both 16-bit integers to 32-bit before doing the multiply.
>
> (snip, then I wrote)
>>> Maybe I am missing something here, but are there versions of gcc for 16
>>> bit processors, with 16 bit int? If so, then promotion to int won't
>>> promote to 32 bits without a cast.
>
>> The correct behaviour for C standards compliance is that when you
>> multiply two operands of different int size, the smaller one is promoted
>> to the size of the larger one. Then the multiply is carried out modulo
>> the size of the larger one. Then the result is truncated or extended as
>> needed to fit the target variable.
>
> I haven't read the standard so recently, but I thought that was only
> after the default promotions. Values smaller than int would be promoted
> to int, then the size of the multiply (and product) determined.

Yes, it's two-stage process.

In case it helps, here is the terminology as used by the C standard:

"integer promotions" These are the conversions that often occur prior
to preforming an arithmetic operation. They form part of the:

"usual arithmetic conversions" which is how a common type is arrived at
for arithmetic operations that requite it. Unless complex or floating
types are involved, the integer promotions are performed on both
operands and then a set of rules is used to determine the common type.
For integer types it is usually simply the widest of the two, even if
that is an unsigned type and the other is a signed type.

"default argument promotions" apply to function calls in the absence of
a prototype. These are the integer promotions augmented by a conversion
of float to double.

The standard never refers to a conversion as a cast (a cast is an
operator that performs an explicit conversion) and it uses the term
"promotion" only in the context of the implicit conversions described
above. A conversion of one type to a wider one in some other context is
not called a promotion.

> If you multiply two 8 bit unsigned char values, is the product
> modulo 256? I don't think so.

No, it's not.

<snip>
> Now it gets interesting. When were the int_32_t and int_16_t added to
> C?

1999. The types are intN_t and uintN_t (no extra _) for various N.
They are optional, but must be defined if the implementation has
suitable types (basically 2's complement and no padding bits). Other,
similar, types like int_leastN_t and int_fastN_t are required in all
implementations. For example, int_least32_t is the smallest type that
has at least 32 (value) bits.

> Seems to me that compilers only claiming a version of the standard
> before they were added wouldn't have to use the same rules.

True.

> Consider that a compiler might have a int_128_t that it could add and
> subtract, but not multiply or divide. Maybe it can generate a 128 bit
> product from two 64 bit operands. Does the standard prohibit a
> compiler from offering those operations?

I don't think it could define int128_t unless it could multiply them.

--
Ben.

glen herrmannsfeldt

unread,
Nov 28, 2012, 10:55:43 PM11/28/12
to
In comp.lang.c Ben Bacarisse <ben.u...@bsb.me.uk> wrote:

(snip, I wrote)

>> If you multiply two 8 bit unsigned char values, is the product
>> modulo 256? I don't think so.

> No, it's not.

For comparison purposes, I believe that Fortran does not have this rule.

If you add, subtract, multiply, or (I believe) divide 8 bit integers
the result is, generally, eight bits. One should, at least, not be
surprised if the result is computed modulo some small value.

I like the C rules better.

PL/I generally tries to keep the bits until it reaches the
implementation maximum. That is complicated when scaled fixed
point (non-integer) values are used, where it keeps the appropriate
digits (binary or decimal) to the right of the radix point,
possibly truncating on the left.

-- glen

James Kuyper

unread,
Nov 28, 2012, 11:39:52 PM11/28/12
to
On 11/28/2012 07:29 PM, Tim Wescott wrote:
...
> Grant did not include all of the context, so you need to read back a bit.
>
> The original statement was that (a) int16_t * int16_t coughs up a 16-bit
> result, unless (b) one of the int16_t numbers is cast to 32 bit.
>
> Then I pointed out that (c) there are some older, non-compliant compilers
> where you have to cast _both_ 16-bit operands to 32 bits to get a 32 bit
> result, and (d) that I trusted that the gcc compiler was ANSI C
> compliant. Statement (c) is important for the embedded space (which is
> the group that I am replying from -- you must be from comp.lang.c)
> because one does not always have the luxury of using a compliant tool
> chain in embedded.
>
> Then Grant came in, and if I'm correctly reading what he said, stated
> that (e) the gnu-avr compiler is not ANSI-C compliant because it has 16
> bit integers.

Sort-of, but not quite. When he said "Nope", he wasn't referring to your
expectation that gcc-avr was ANSI compliant. He was referring to your
expectation that it would promote 16-bit integers to 32 bits. On a
conforming implementation of C with 16-bit ints, promotion of integer
types halts at 16 bits, and goes no further.

> So you are correcting statement -- uh, (0), because no one made it (the
> first quote from me refers to statement (b), and appears in its native
> habitat two or three posts up in the thread).

It was that first quote from you that I'm correcting. Not any other
statement. Specifically:

> I would expect that gcc would be ANSI compliant, and would therefore
> promote both 16-bit integers to 32-bit before doing the multiply.
--
James Kuyper

James Kuyper

unread,
Nov 28, 2012, 11:44:00 PM11/28/12
to
On 11/28/2012 08:19 PM, Tim Wescott wrote:
...
> Just for clarification, since what I said above seems to be easy to
> misread unless you pay close attention to context: take "promote both 16-
> bit integers to 32-bit" and add in the context (about casting one or more
> of the 16 bit values); the result reads "promote both 16-bit integers to
> 32-bit _if you cast just one 16-bit integer to 32 bit_".

There is a conversion to 32-bits, but it is NOT a promotion. See
6.3.1.1p2 for a definition of the integer promotions.
--
James Kuyper

Tim Wescott

unread,
Nov 29, 2012, 1:59:44 AM11/29/12
to
On Wed, 28 Nov 2012 23:39:52 -0500, James Kuyper wrote:

> On 11/28/2012 07:29 PM, Tim Wescott wrote:
> ...
>> Grant did not include all of the context, so you need to read back a
>> bit.
>>
>> The original statement was that (a) int16_t * int16_t coughs up a
>> 16-bit result, unless (b) one of the int16_t numbers is cast to 32 bit.
>>
>> Then I pointed out that (c) there are some older, non-compliant
>> compilers where you have to cast _both_ 16-bit operands to 32 bits to
>> get a 32 bit result, and (d) that I trusted that the gcc compiler was
>> ANSI C compliant. Statement (c) is important for the embedded space
>> (which is the group that I am replying from -- you must be from
>> comp.lang.c) because one does not always have the luxury of using a
>> compliant tool chain in embedded.
>>
>> Then Grant came in, and if I'm correctly reading what he said, stated
>> that (e) the gnu-avr compiler is not ANSI-C compliant because it has 16
>> bit integers.
>
> Sort-of, but not quite. When he said "Nope", he wasn't referring to your
> expectation that gcc-avr was ANSI compliant. He was referring to your
> expectation that it would promote 16-bit integers to 32 bits. On a
> conforming implementation of C with 16-bit ints, promotion of integer
> types halts at 16 bits, and goes no further.

How can you possibly know? Do you read his mind? Have an uncited
conversation with him? Is he your sock-puppet?

>> So you are correcting statement -- uh, (0), because no one made it (the
>> first quote from me refers to statement (b), and appears in its native
>> habitat two or three posts up in the thread).
>
> It was that first quote from you that I'm correcting. Not any other
> statement. Specifically:
>
>> I would expect that gcc would be ANSI compliant, and would therefore
>> promote both 16-bit integers to 32-bit before doing the multiply.

Oh Christ. READ THE CONTEXT. That statement was made in reply to a
question asking about what would happen if you cast one of the operands
to 32 bit! And you're replying to a post that told you that it was
misleading without its context, and again taking it out of context.

Missing the context the first time is understandable -- that statement
came about after two previous postings, and you do have to follow the
conversation.

But I have just told you to READ THE CONTEXT. So where do you get off
with repeating a statement of mine, out of context, which YOU'VE DAMNED
WELL BEEN TOLD is misleading when taken out of context, then criticizing
that false meaning of it?

That's shooting straight through "rude" and getting right into
"dishonest".

--
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

David Brown

unread,
Nov 29, 2012, 3:24:18 AM11/29/12
to
Of course you are correct here. I should not be posting so late - or I
should have drunk more coffee first, because I know this stuff (as long
as my brain is functioning correctly!).

Apologies if I've added to the confusion here, and thanks for the
correction.

David Brown

unread,
Nov 29, 2012, 3:37:57 AM11/29/12
to
On 29/11/2012 03:31, glen herrmannsfeldt wrote:
> In comp.lang.c David Brown <david...@removethis.hesbynett.no> wrote:
>
> (snip, someone wrote)
>>>> I would expect that gcc would be ANSI compliant, and would therefore
>>>> promote both 16-bit integers to 32-bit before doing the multiply.
>
> (snip, then I wrote)
>>> Maybe I am missing something here, but are there versions of gcc for 16
>>> bit processors, with 16 bit int? If so, then promotion to int won't
>>> promote to 32 bits without a cast.
>
>> The correct behaviour for C standards compliance is that when you
>> multiply two operands of different int size, the smaller one is promoted
>> to the size of the larger one. Then the multiply is carried out modulo
>> the size of the larger one. Then the result is truncated or extended as
>> needed to fit the target variable.
>
> I haven't read the standard so recently, but I thought that was only
> after the default promotions. Values smaller than int would be promoted
> to int, then the size of the multiply (and product) determined.

As pointed out by Hans-Bernhard, you are correct here. I'm sorry for
causing confusion by posting while half asleep.

Default "int" promotions are done first for each operand. They are
promoted to "signed int", "unsigned int", "signed long int" or "unsigned
long int" (and "long long" for newer C standards), stopping at the first
type that covers the entire range. In practice, this means anything
smaller than an "int" will get promoted to a "signed int".

>
> If you multiply two 8 bit unsigned char values, is the product
> modulo 256? I don't think so.
>
>> So the bit-size of the processor, and the bit-size of "int" on that
>> particular target, is irrelevant. And the size of the result variable
>> is also irrelevant (this catches out some newbies).
>
>> Given:
>
>> int16_t a, b;
>> int32_t c
>
>> c = (int32_t)a * b;
>
>> Then b is cast to int32_t, the 32-bit multiplication is carried out, and
>> the result assigned to c.
>
> Maybe not completely irrelevent, consider a system with a 64 bit int.
>
>> If you write just "c = a * b", then the multiplication is carried it at
>> 16-bit, then promoted to 32-bit. This applies regardless of the
>> bit-size of the target - you will get the same effect on a 64-bit cpu as
>> on the 8-bit AVR.
>
> Regardless of the target size, but not of int size.
>
>> If your compiler does 16-bit multiplications when you have "c =
>> (int32_t) a * b", and requires two "int32_t" casts to do 32-bit
>> multiplication, then your compiler is very badly broken. As Tim says,
>> badly broken compilers /do/ exist, so if you have to use them, then you
>> need to use two casts. But I personally don't think you need to write
>> your code to work with broken toolchains unless you actually have to.
>
> Now it gets interesting. When were the int_32_t and int_16_t added to C?
>

The types were officially added with C99, but they existed in practice
before that as "long int" and "short int" on most compilers (some
targets don't support 16-bit types, and thus have "short int" as 32-bit
and no int16_t. And the standards allow compilers with a "short" of
64-bit or more, in which case neither "int32_t" nor "int16_t" would
exist - but I have never heard of such a beast).

> Seems to me that compilers only claiming a version of the standard
> before they were added wouldn't have to use the same rules.

The rules haven't changed (again, sorry for my mistaken post). Types
such as "int32_t" are just typedef's for "normal" C types.

>
> Consider that a compiler might have a int_128_t that it could add and
> subtract, but not multiply or divide. Maybe it can generate a 128 bit
> product from two 64 bit operands. Does the standard prohibit a
> compiler from offering those operations?
>

The "int128_t" here is either a typedef for an existing C type (which
could include "long long int" in C99), in which case it would have to
support all integral operations, or it is purely a compiler extension,
in which case it is non-standard. But I believe the standards say that
/if/ a type of this form "int128_t" is defined in standard headers for
the compiler, then it must act as a full integral type of that size.


Phil Carmody

unread,
Nov 29, 2012, 5:03:45 AM11/29/12
to
James Kuyper <james...@verizon.net> writes:
> On 11/28/2012 07:29 PM, Tim Wescott wrote:
> ...
> > Grant did not include all of the context, so you need to read back a bit.
> >
> > The original statement was that (a) int16_t * int16_t coughs up a 16-bit
> > result, unless (b) one of the int16_t numbers is cast to 32 bit.
> >
> > Then I pointed out that (c) there are some older, non-compliant compilers
> > where you have to cast _both_ 16-bit operands to 32 bits to get a 32 bit
> > result, and (d) that I trusted that the gcc compiler was ANSI C
> > compliant. Statement (c) is important for the embedded space (which is
> > the group that I am replying from -- you must be from comp.lang.c)
> > because one does not always have the luxury of using a compliant tool
> > chain in embedded.
> >
> > Then Grant came in, and if I'm correctly reading what he said, stated
> > that (e) the gnu-avr compiler is not ANSI-C compliant because it has 16
> > bit integers.
>
> Sort-of, but not quite. When he said "Nope", he wasn't referring to your
> expectation that gcc-avr was ANSI compliant. He was referring to your
> expectation that it would promote 16-bit integers to 32 bits. On a
> conforming implementation of C with 16-bit ints, promotion of integer
> types halts at 16 bits, and goes no further.

The context was

>>> Shouldn't casting just one of the 16 bit values work the same as
>>> casting both of them?

i.e. that there was one cast to 32 bits already. Therefore ANSI C
says that there would be a second conversion to 32 bits of the other operand.
Or at least the resulting code should behave *as if* that had happened.
(I'm pretty sure I've seen an architecture with shorter*longer->longer
opcodes.)

Initially I thought what Tim wrote was in error, but upon unravelling
the thread, I worked out that he had gone forward from the premises
correctly, and others hadn't. Without those premises - confusion ensues.

Phil
--
Regarding TSA regulations:
How are four small bottles of liquid different from one large bottle?
Because four bottles can hold the components of a binary liquid explosive,
whereas one big bottle can't. -- camperdave responding to MacAndrew on /.

glen herrmannsfeldt

unread,
Nov 29, 2012, 6:31:37 AM11/29/12
to
In comp.lang.c David Brown <da...@westcontrol.removethisbit.com> wrote:

(snip)
>> Now it gets interesting. When were the int_32_t and int_16_t added to C?

> The types were officially added with C99, but they existed in practice
> before that as "long int" and "short int" on most compilers (some
> targets don't support 16-bit types, and thus have "short int" as 32-bit
> and no int16_t. And the standards allow compilers with a "short" of
> 64-bit or more, in which case neither "int32_t" nor "int16_t" would
> exist - but I have never heard of such a beast).

>> Seems to me that compilers only claiming a version of the standard
>> before they were added wouldn't have to use the same rules.

> The rules haven't changed (again, sorry for my mistaken post). Types
> such as "int32_t" are just typedef's for "normal" C types.

Most likely, but as they aren't in the C89 standard, unless the
user typedef's them, seems to me the compiler is free to implement
then in any way desired.

-- glen

glen herrmannsfeldt

unread,
Nov 29, 2012, 6:39:16 AM11/29/12
to
In comp.lang.c Phil Carmody <thefatphi...@yahoo.co.uk> wrote:

(snip, someone wrote)

>>>> Shouldn't casting just one of the 16 bit values work the same as
>>>> casting both of them?

> i.e. that there was one cast to 32 bits already. Therefore ANSI C
> says that there would be a second conversion to 32 bits of the other operand.
> Or at least the resulting code should behave *as if* that had happened.
> (I'm pretty sure I've seen an architecture with shorter*longer->longer
> opcodes.)

Maybe, but many have N*N-->2N multiply. Some compilers figure out
that if you cast one (or both) from a shorter length that they can use
such a multiply on the shorter length. This especially important when
the size is large enough that the hardware doesn't support it.

Many 32 bit machines have a 32*32 --> 64 multiply, and a 64 bit
(long long) type. If you cast one (or both) 32 bit int to 64 bit
(long long), the compiler knows to use the 32 bit multiply.

> Initially I thought what Tim wrote was in error, but upon unravelling
> the thread, I worked out that he had gone forward from the premises
> correctly, and others hadn't. Without those premises - confusion ensues.

-- glen

Hans-Bernhard Bröker

unread,
Nov 29, 2012, 7:22:48 AM11/29/12
to
Since we were nit-picking anyway: not quite. As of C99 the standard
explicitly foresees the possible need to have more than the usual 10
different integer types ({signed|unsigned} {char|short|int|long|long
long}) in a target. That's why they included a provision for "extended
integer types". These types don't have standardized names (because they
can't), but their behaviour is still covered by the standard.

So the type behind int128_t need not be an "existing C type" (as in:
something that was already defined before), nor is it allowed to be a
pure compiler extension (which the standard would have no say over at
all). If it's an extension, it has to be a standard extension, so its
behaviour is ruled by the standard.

> But I believe the standards say that
> /if/ a type of this form "int128_t" is defined in standard headers for
> the compiler, then it must act as a full integral type of that size.

Yes.

David Brown

unread,
Nov 29, 2012, 7:44:38 AM11/29/12
to
That is correct - but I would be very surprised to see a compiler that
did have a type with a name like that, and did not implement it the
obvious way. It might be /legal/ under C89 rules for the compiler to
have a type called "int32_t" with different behaviour, but I can't
imagine it actually being the case. I'm sure Tim Wescott can think of
an exception, however!


James Kuyper

unread,
Nov 29, 2012, 7:57:48 AM11/29/12
to
On 11/29/2012 01:59 AM, Tim Wescott wrote:
> On Wed, 28 Nov 2012 23:39:52 -0500, James Kuyper wrote:
>
>> On 11/28/2012 07:29 PM, Tim Wescott wrote:
...
>> Sort-of, but not quite. When he said "Nope", he wasn't referring to your
>> expectation that gcc-avr was ANSI compliant. He was referring to your
>> expectation that it would promote 16-bit integers to 32 bits. On a
>> conforming implementation of C with 16-bit ints, promotion of integer
>> types halts at 16 bits, and goes no further.
>
> How can you possibly know? Do you read his mind? Have an uncited
> conversation with him? Is he your sock-puppet?

I can read and understand English, and in particular, the specialized
dialect of it which is sometimes called "standardese". I understood
precisely what he was talking about. In particular, I understand what
"promotion" means in the context of the C standard, and know that you
used the term incorrectly, something which you still do not seem to have
understood - nothing in your comments indicates any awareness that this
is the issue we're both talking about.

>> It was that first quote from you that I'm correcting. Not any other
>> statement. Specifically:
>>
>>> I would expect that gcc would be ANSI compliant, and would therefore
>>> promote both 16-bit integers to 32-bit before doing the multiply.
>
> Oh Christ. READ THE CONTEXT. That statement was made in reply to a
> question asking about what would happen if you cast one of the operands
> to 32 bit! And you're replying to a post that told you that it was
> misleading without its context, and again taking it out of context.

A conforming implementation of C will promote integer values to 32 bits
only if 'int' is exactly 32-bits. Do you believe that the context I've
missed changed 'int' to a 32-bit type? If not, your use of "promote" to
describe that conversion is incorrect, though your expectation that
there would be such a conversion is accurate.
--
James Kuyper

James Kuyper

unread,
Nov 29, 2012, 8:04:51 AM11/29/12
to
On 11/29/2012 05:03 AM, Phil Carmody wrote:
> James Kuyper <james...@verizon.net> writes:
...
>> Sort-of, but not quite. When he said "Nope", he wasn't referring to your
>> expectation that gcc-avr was ANSI compliant. He was referring to your
>> expectation that it would promote 16-bit integers to 32 bits. On a
>> conforming implementation of C with 16-bit ints, promotion of integer
>> types halts at 16 bits, and goes no further.
>
> The context was
>
>>>> Shouldn't casting just one of the 16 bit values work the same as
>>>> casting both of them?
>
> i.e. that there was one cast to 32 bits already. Therefore ANSI C
> says that there would be a second conversion to 32 bits of the other operand.

Of course. I knew that context, and knew that the conclusion you
describe was the correct result. However, that's not the conclusion Tim
Wescott reached - the conversion you describe is part of the usual
arithmetic conversions (6.3.1.8p1) but is NOT an integer promotion
(6.3.1.1p2).

> Initially I thought what Tim wrote was in error, but upon unravelling
> the thread, I worked out that he had gone forward from the premises
> correctly, and others hadn't. Without those premises - confusion ensues.

Those premises had nothing to do with the confusion, which is about the
meaning of the word "promote" in the context of the C standard.
--
James Kuyper

Ben Bacarisse

unread,
Nov 29, 2012, 8:20:09 AM11/29/12
to
That's not quite right, though I'm not exactly sure what you mean. As
you say, the integer promotions are done first. That can produce either
an int or an unsigned int, or it may have no effect at all if the type
is already "bigger" than an int. Then one operand, but sometimes both
operands, are further converted (not promoted) to get a common type.
You can read the rules at

www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf (sec. 6.3.1.8)

but the conversion does not stop at the first type that covers the
entire range (at least according to how I interpret that phrase). For
example, in (unsigned)1 * (signed)-1 the signed operand is converted to
the type of the unsigned one even though that type can't cover the
entire range of the operand or operands.

Almost any summary of the rules is going to be wrong; if an accurate
summary can be written it should go into the language standard -- it
would be a great boon -- but I don't think that's possible. For example
I had to put "bigger" in quotes because the rule is based on a technical
term called the conversion rank of the type and not on its size. (The
integer promotions have no effect on a long int even on systems where a
long int is no bigger than an int).

<snip>
--
Ben.

Ben Bacarisse

unread,
Nov 29, 2012, 8:32:10 AM11/29/12
to
Hans-Bernhard Bröker <HBBr...@t-online.de> writes:
<snip>
> Since we were nit-picking anyway: not quite. As of C99 the standard
> explicitly foresees the possible need to have more than the usual 10
> different integer types ({signed|unsigned} {char|short|int|long|long
> long}) in a target.

These may be the usual ones, but there are 11 "standard integer types"
because they include _Bool. (Well, you did say we are nit-picking!)

--
Ben.

James Kuyper

unread,
Nov 29, 2012, 9:07:29 AM11/29/12
to
On 11/29/2012 03:37 AM, David Brown wrote:
...
> Default "int" promotions are done first for each operand. They are
> promoted to "signed int", "unsigned int", "signed long int" or "unsigned
> long int" (and "long long" for newer C standards), stopping at the first
> type that covers the entire range. In practice, this means anything
> smaller than an "int" will get promoted to a "signed int".

Not quite. The integer conversions never change anything to any type
other than 'int' or 'unsigned int'.

"If an int can represent all values of the original type (as restricted
by the width, for a bit-field), the value is converted to an int;
otherwise, it is converted to an unsigned int. These are called the
integer promotions.58) All other types are unchanged by the integer
promotions." (6.3.1.1p2) The first use of "integer promotions" in that
clause is italicized, which is an ISO convention indicating that the
sentence containing that phrase serves as the definition of the phrase.

..
>> Now it gets interesting. When were the int_32_t and int_16_t added to C?
>>
>
> The types were officially added with C99, but they existed in practice
> before that as "long int" and "short int" on most compilers (some
> targets don't support 16-bit types, and thus have "short int" as 32-bit
> and no int16_t. And the standards allow compilers with a "short" of
> 64-bit or more, in which case neither "int32_t" nor "int16_t" would
> exist - but I have never heard of such a beast).

I've heard of machines with 32-bit short, but not 64-bit. Note that
while int32_t and int16_t could not be provided by <stdint.h> for such a
compiler, int_least32_t and int_fast32_t (and similarly for 16) must be.

>> Seems to me that compilers only claiming a version of the standard
>> before they were added wouldn't have to use the same rules.
>
> The rules haven't changed (again, sorry for my mistaken post). ...

Yes they have. In C99 and later, <stdint.h> and <inttypes.h> are
standard headers, and if #included, the identifiers they define must
meet certain well-specified requirements. In C90, there were no such
standard headers, no guarantees on what a header file with that name
would contain if you successfully #included it, and no corresponding
restrictions on how user code could use those identifiers after
#including those header files.

> ... Types
> such as "int32_t" are just typedef's for "normal" C types.

That depends upon what you mean by 'normal'. The C99 standard
distinguishes between standard and extended integer types. The standard
integer types have names specified by the C standard; extended types are
implementation-defined, and may have other names. There are many
standard typedefs that are required to have either arithmetic or integer
type; but there are none that are restricted to standard integer types.
Would you consider __extended_integer_type to be a "normal" C type?

...
> The "int128_t" here is either a typedef for an existing C type (which
> could include "long long int" in C99), in which case it would have to
> support all integral operations, or it is purely a compiler extension,
> in which case it is non-standard.

No, supporting int128_t would not be a non-standard extension, it's just
providing an optional feature of standard C. The key difference is that
if an implementation chooses to support an optional feature, it must
support it in precisely the manner specified by the standard for that
feature; extensions give an implementation a lot more freedom. In C2011,
there's a lot of optional features.

The only size-named types that a conforming implementation of <stdint.h>
must provide are [u]int_leastN_t, and [u]int_fastN_t for N = 8, 16, 32,
and 64. For all other values of N, and for [u]intN_t for all values of
N, the typedefs are optional. You can determine precisely which of the
optional <stdint.h> types are supported by #ifdef of the corresponding
*_MAX macro. If that macro is #defined, you can use the corresponding
type in full confidence that it behaves precisely as specified by the
standard.
--
James Kuyper

James Kuyper

unread,
Nov 29, 2012, 9:17:18 AM11/29/12
to
On 11/29/2012 09:07 AM, James Kuyper wrote:
...
> "If an int can represent all values of the original type (as restricted
> by the width, for a bit-field), the value is converted to an int;
> otherwise, it is converted to an unsigned int. These are called the
> integer promotions.58) All other types are unchanged by the integer
> promotions." (6.3.1.1p2) The first use of "integer promotions" in that
> clause is italicized, which is an ISO convention indicating that the
> sentence containing that phrase serves as the definition of the phrase.

I just realized that the meaning of the phrase "All other types" is not
clear without the preceding part of that clause which I snipped:

> The following may be used in an expression wherever an int or unsigned int may
> be used:
> — An object or expression with an integer type (other than int or unsigned int)
> whose integer conversion rank is less than or equal to the rank of int and
> unsigned int.
> — A bit-field of type _Bool, int, signed int, or unsigned int.
--
James Kuyper

Grant Edwards

unread,
Nov 29, 2012, 10:17:58 AM11/29/12
to
On 2012-11-29, Tim Wescott <t...@seemywebsite.com> wrote:

> Grant did not include all of the context, so you need to read back a bit.
>
> The original statement was that (a) int16_t * int16_t coughs up a 16-bit
> result, unless (b) one of the int16_t numbers is cast to 32 bit.
>
> Then I pointed out that (c) there are some older, non-compliant compilers
> where you have to cast _both_ 16-bit operands to 32 bits to get a 32 bit
> result, and (d) that I trusted that the gcc compiler was ANSI C
> compliant. Statement (c) is important for the embedded space (which is
> the group that I am replying from -- you must be from comp.lang.c)
> because one does not always have the luxury of using a compliant tool
> chain in embedded.

I misread part of your post as claiming that even without casts a
compliant compiler would promote both 16-bit operands to 32-bits. I
was attempting to point out that isn't true if an "int" is 16-bits
(it's a not-uncommon misapprehension that gcc is only available with
32 or 64 bit ints).


> Then Grant came in, and if I'm correctly reading what he said, stated
> that (e) the gnu-avr compiler is not ANSI-C compliant because it has 16
> bit integers.

Ah, that's not what I intended to write. AFAICT, everybody agrees
with everybody else, we just aren't managing to express that clearly.

--
Grant Edwards grant.b.edwards Yow! People humiliating
at a salami!
gmail.com

Grant Edwards

unread,
Nov 29, 2012, 10:19:13 AM11/29/12
to
That is indeed what I meant.

--
Grant Edwards grant.b.edwards Yow! Now KEN and BARBIE
at are PERMANENTLY ADDICTED to
gmail.com MIND-ALTERING DRUGS ...

Grant Edwards

unread,
Nov 29, 2012, 10:23:27 AM11/29/12
to
On 2012-11-29, Tim Wescott <t...@seemywebsite.com> wrote:
Or on the TMS320C40, where char, int, long, long long, float and
double are all 32 bits and all have a sizeof() 1. Trying to impliment
any sort of communications protocol with that was fun.

--
Grant Edwards grant.b.edwards Yow! Is it NOUVELLE
at CUISINE when 3 olives are
gmail.com struggling with a scallop
in a plate of SAUCE MORNAY?

Grant Edwards

unread,
Nov 29, 2012, 10:29:43 AM11/29/12
to
On 2012-11-29, Tim Wescott <t...@seemywebsite.com> wrote:

> It's certainly what I would expect from gcc-avr. There's no reason you
> can't make a beautifully compliant, reasonably efficient compiler that
> works well on the AVR.

avr-gcc does indeed work very nicely as long as you don't look at the
code generated when you use pointers. You'll go blind -- especially
if you're used to something like the msp430. It's easy to forget that
the AVR is an 8-bit CPU not a 16-bit CPU like the '430, and use of
16-bit pointers on the AVR requires a lot of overhead.

--
Grant Edwards grant.b.edwards Yow! I wonder if I could
at ever get started in the
gmail.com credit world?

James Kuyper

unread,
Nov 29, 2012, 11:01:34 AM11/29/12
to
On 11/29/2012 10:23 AM, Grant Edwards wrote:
> On 2012-11-29, Tim Wescott <t...@seemywebsite.com> wrote:
>> On Thu, 29 Nov 2012 01:18:42 +0000, Ben Bacarisse wrote:
>
>> Me, I just try to remember that x, y or z went wrong on some compiler
>> some time, so that if I see symptoms again those problems are on my short
>> list, and maybe even a fix or two.
>>
>> Like Texas Instrument's Code Composter for the TMS320F2812, which has a
>> 32-bit "double". #$%@.
>
> Or on the TMS320C40, where char, int, long, long long, float and
> double are all 32 bits and all have a sizeof() 1. ...

There's a key difference there. The implementation you describe could be
fully conforming. The one he described could not; you can't meet the
standard's precision and range requirements for double with a 32-bit
data type.

> ... Trying to impliment
> any sort of communications protocol with that was fun.

Thanks for that information. Claims have frequently been made on
comp.lang.c that, while the C standard allows CHAR_BIT != 8, the
existence of such implementations is a myth. I'm glad to have a specific
counter example to cite.
There's something I've wondered about such machines: when data from
other machines containing data types smaller than 32 bits (for instance,
ASCII text files) is transferred to the TMS320C40, how is this usually
handled? I could imagine three main possibilities:

a) Four 8-bit bytes of data are packed into each 32-bit byte
b) Each field value stored in a data type smaller than 32 bits is
converted to a 32-bit type, and stored as such. For instance, a file
containing 45,678 8-bit bytes of text gets converted into a file
containing 45,678 32-bit bytes of text.
c) Different methods are used in different contexts, leading to constant
headaches. This strikes me as the most likely possibility.
--
James Kuyper

John Devereux

unread,
Nov 29, 2012, 11:36:34 AM11/29/12
to
Grant Edwards <inv...@invalid.invalid> writes:

> On 2012-11-29, Tim Wescott <t...@seemywebsite.com> wrote:
>
>> It's certainly what I would expect from gcc-avr. There's no reason you
>> can't make a beautifully compliant, reasonably efficient compiler that
>> works well on the AVR.
>
> avr-gcc does indeed work very nicely as long as you don't look at the
> code generated when you use pointers. You'll go blind -- especially
> if you're used to something like the msp430. It's easy to forget that
> the AVR is an 8-bit CPU not a 16-bit CPU like the '430, and use of
> 16-bit pointers on the AVR requires a lot of overhead.

Other problem with it is the separate program and data memory
spaces. Fine for small deeply embedded things but started to show strain
when I wanted a LCD display, menus etc. I would not use it for a new
project unless there was a very good reason, ultra-low power
perhaps. Cortex M3 is much nicer but the chips are much more complicated
of course.


--

John Devereux

Tim Wescott

unread,
Nov 29, 2012, 2:02:49 PM11/29/12
to
On Thu, 29 Nov 2012 07:57:48 -0500, James Kuyper wrote:

> On 11/29/2012 01:59 AM, Tim Wescott wrote:
>> On Wed, 28 Nov 2012 23:39:52 -0500, James Kuyper wrote:
>>
>>> On 11/28/2012 07:29 PM, Tim Wescott wrote:
> ...
>>> Sort-of, but not quite. When he said "Nope", he wasn't referring to
>>> your expectation that gcc-avr was ANSI compliant. He was referring to
>>> your expectation that it would promote 16-bit integers to 32 bits. On
>>> a conforming implementation of C with 16-bit ints, promotion of
>>> integer types halts at 16 bits, and goes no further.
>>
>> How can you possibly know? Do you read his mind? Have an uncited
>> conversation with him? Is he your sock-puppet?
>
> I can read and understand English, and in particular, the specialized
> dialect of it which is sometimes called "standardese". I understood
> precisely what he was talking about. In particular, I understand what
> "promotion" means in the context of the C standard, and know that you
> used the term incorrectly, something which you still do not seem to have
> understood - nothing in your comments indicates any awareness that this
> is the issue we're both talking about.

Well, I thought we were talking useful people writing good code using C.
Useful people who want to write good code using C don't always use
correct "standardese", but they do know how to read all the pertinent
documentation.

Had you gone back and read said pertinent documentation you would have
done a "useful people" sort of thing, and realized what _my_ point was
about, regardless of any misuse of terminology on my part.

But hey -- I'm just a guy who designs embedded systems that actually make
money for my customers. One of the skills that requires is listening to
people and not trying to slam them for violating some trivial rule of
terminology when they're getting the gist of their statements right.

>>> It was that first quote from you that I'm correcting. Not any other
>>> statement. Specifically:
>>>
>>>> I would expect that gcc would be ANSI compliant, and would therefore
>>>> promote both 16-bit integers to 32-bit before doing the multiply.
>>
>> Oh Christ. READ THE CONTEXT. That statement was made in reply to a
>> question asking about what would happen if you cast one of the operands
>> to 32 bit! And you're replying to a post that told you that it was
>> misleading without its context, and again taking it out of context.
>
> A conforming implementation of C will promote integer values to 32 bits
> only if 'int' is exactly 32-bits. Do you believe that the context I've
> missed changed 'int' to a 32-bit type? If not, your use of "promote" to
> describe that conversion is incorrect, though your expectation that
> there would be such a conversion is accurate.

I believe that you need to spend some time with engineers who actually
design product.

--
My liberal friends think I'm a conservative kook.
My conservative friends think I'm a liberal kook.
Why am I not happy that they have found common ground?

Tim Wescott, Communications, Control, Circuits & Software
http://www.wescottdesign.com

Grant Edwards

unread,
Nov 29, 2012, 2:19:25 PM11/29/12
to
On 2012-11-29, James Kuyper <james...@verizon.net> wrote:
> On 11/29/2012 10:23 AM, Grant Edwards wrote:
>> On 2012-11-29, Tim Wescott <t...@seemywebsite.com> wrote:
>>> On Thu, 29 Nov 2012 01:18:42 +0000, Ben Bacarisse wrote:
>>
>>> Me, I just try to remember that x, y or z went wrong on some compiler
>>> some time, so that if I see symptoms again those problems are on my short
>>> list, and maybe even a fix or two.
>>>
>>> Like Texas Instrument's Code Composter for the TMS320F2812, which has a
>>> 32-bit "double". #$%@.
>>
>> Or on the TMS320C40, where char, int, long, long long, float and
>> double are all 32 bits and all have a sizeof() 1. ...
>
> There's a key difference there. The implementation you describe could
> be fully conforming. The one he described could not; you can't meet
> the standard's precision and range requirements for double with a
> 32-bit data type.

Well, the case I cited had 32-bit doubles. Other than that, I think it
was conforming. When trying to reuse source modules, it's shocking
how many assumptions I make that aren't implied by the C standard.

>> ... Trying to impliment any sort of communications protocol with
>> that was fun.
>
> Thanks for that information. Claims have frequently been made on
> comp.lang.c that, while the C standard allows CHAR_BIT != 8, the
> existence of such implementations is a myth. I'm glad to have a
> specific counter example to cite.

If you look at some other DSPs, I think you'll find similar examples
where everything is 16 bits (they tend not to support FP at all).

> There's something I've wondered about such machines: when data from
> other machines containing data types smaller than 32 bits (for
> instance, ASCII text files) is transferred to the TMS320C40, how is
> this usually handled? I could imagine three main possibilities:

A TMS320 is a DSP, and I doubt there are any that have actual
filesystems. OTOH, high-speed serial interfaces are common, and
filling in things like protocol headers involves a lot of
shifting/masking/anding/oring.

> a) Four 8-bit bytes of data are packed into each 32-bit byte

That's pretty common for data being sent to/from "normal" CPUs.

> b) Each field value stored in a data type smaller than 32 bits is
> converted to a 32-bit type, and stored as such. For instance, a
> file containing 45,678 8-bit bytes of text gets converted into a
> file containing 45,678 32-bit bytes of text.

If you need to do any sort of string manipulation (which you try to
avoid like the plague), that's what you end up doing.

> c) Different methods are used in different contexts, leading to
> constant headaches. This strikes me as the most likely possibility.

Exactly.

And the icing on the cake is that the 32-bit FP represention isn't
IEEE-784, so you also get to convert between external and internal FP
representations also. Fun!

--
Grant Edwards grant.b.edwards Yow! My NOSE is NUMB!
at
gmail.com

Keith Thompson

unread,
Nov 29, 2012, 2:28:12 PM11/29/12
to
And plain char *isn't* one of the "standard integer types", even
though it's a standard type, and it's an integer type, and its
characteristics (range, representation, and behavior) are identical
either to those of signed char or to those of unsigned char, both
of which are "standard integer types".

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

James Kuyper

unread,
Nov 29, 2012, 2:32:34 PM11/29/12
to
On 11/29/2012 02:19 PM, Grant Edwards wrote:
> On 2012-11-29, James Kuyper <james...@verizon.net> wrote:
>> On 11/29/2012 10:23 AM, Grant Edwards wrote:
>>> On 2012-11-29, Tim Wescott <t...@seemywebsite.com> wrote:
...
>>>> Like Texas Instrument's Code Composter for the TMS320F2812, which has a
>>>> 32-bit "double". #$%@.
>>>
>>> Or on the TMS320C40, where char, int, long, long long, float and
>>> double are all 32 bits and all have a sizeof() 1. ...
>>
>> There's a key difference there. The implementation you describe could
>> be fully conforming. The one he described could not; you can't meet
>> the standard's precision and range requirements for double with a
>> 32-bit data type.
>
> Well, the case I cited had 32-bit doubles.

Oops - I missed that. OK - a more accurate statement would have been
that your example had no additional conformance issues that weren't
present in his example.
--
James Kuyper

Keith Thompson

unread,
Nov 29, 2012, 2:38:03 PM11/29/12
to
James Kuyper <james...@verizon.net> writes:
[...]
> I've heard of machines with 32-bit short, but not 64-bit. Note that
> while int32_t and int16_t could not be provided by <stdint.h> for such a
> compiler, int_least32_t and int_fast32_t (and similarly for 16) must be.

Trivia: I've used a machine (Cray T90) with 8-bit char and 64-bit short,
int, long, and long long. It had no 16-bit or 32-bit integer types.

[...]

> That depends upon what you mean by 'normal'. The C99 standard
> distinguishes between standard and extended integer types. The standard
> integer types have names specified by the C standard; extended types are
> implementation-defined, and may have other names.

They *must* have other names.

Normally such names would be identifiers reserved to the
implementation, starting with an underscore and either another
underscore or an uppercase letter. (Though I suppose an
implementation that supports other forms of identifiers as a
language extension could use them; for example some compiler permit
identifiers with '$' characters.)

Jon Kirwan

unread,
Nov 29, 2012, 2:44:18 PM11/29/12
to
On Thu, 29 Nov 2012 11:01:34 -0500, James Kuyper
<james...@verizon.net> wrote:

><snip>
>Claims have frequently been made on
>comp.lang.c that, while the C standard allows CHAR_BIT != 8, the
>existence of such implementations is a myth. I'm glad to have a specific
>counter example to cite.
><snip>

I believe that C was implemented on the PDP-10. I didn't use
it when I was programming the PDP-10 (I used assembly, then,
and some other languages... but not C, until I worked on Unix
v6 in '78.) But that was a 36-bit machine. And ASCII was
packed into 7 bits so that 5 chars fit in a word. No one used
8, so far as I recall. That was the standard method. So I'm
curious now what the C implementation did.

Of course, all that is prior to any standard. But it might be
another case to discuss, anyway.

Jon

upsid...@downunder.com

unread,
Nov 29, 2012, 3:17:23 PM11/29/12
to
On Thu, 29 Nov 2012 11:01:34 -0500, James Kuyper
<james...@verizon.net> wrote:

>On 11/29/2012 10:23 AM, Grant Edwards wrote:
>> On 2012-11-29, Tim Wescott <t...@seemywebsite.com> wrote:

>> ... Trying to impliment
>> any sort of communications protocol with that was fun.

Using left/right shifts and AND and OR operations work just fine.
Works OK with different CHAR_BIT and different endianness platforms.
Do not try to use structs etc.

>Thanks for that information. Claims have frequently been made on
>comp.lang.c that, while the C standard allows CHAR_BIT != 8, the
>existence of such implementations is a myth. I'm glad to have a specific
>counter example to cite.

IMHO CHAR_BIT = 21 is the correct way to handle the Unicode range.

On the Unicode list, I even suggested packing three 21 characters into
a single 64 bit data word as UTF-64 :-)


James Kuyper

unread,
Nov 29, 2012, 3:21:05 PM11/29/12
to
On 11/29/2012 02:38 PM, Keith Thompson wrote:
> James Kuyper <james...@verizon.net> writes:
...
>> That depends upon what you mean by 'normal'. The C99 standard
>> distinguishes between standard and extended integer types. The standard
>> integer types have names specified by the C standard; extended types are
>> implementation-defined, and may have other names.
>
> They *must* have other names.

Not if there aren't any. :-) I should have worded that differently.
They're not required to exist, but if they do, you're right - they must
have other names.
--
James Kuyper

Keith Thompson

unread,
Nov 29, 2012, 3:36:53 PM11/29/12
to
I like it -- but it breaks as soon as they add U+200000 or higher, and
I'm not aware of any guarantee that they won't.

I've thought of UTF-24, encoding each character in 3 octets; that's
good for up to 16,777,216 distinct code points.

upsid...@downunder.com

unread,
Nov 29, 2012, 3:40:41 PM11/29/12
to
Except for self modifying code, why would one want data (program)
access into program space (unless you are writing a linker or
debugger) ??

While working with PDP-11's in the 1970's, the ability to use separate
I/D (Instruction/Data) space helped a lot to keep code/data in private
64 KiD address spaces.


glen herrmannsfeldt

unread,
Nov 29, 2012, 4:23:42 PM11/29/12
to
In comp.lang.c Jon Kirwan <jo...@infinitefactors.org> wrote:
> On Thu, 29 Nov 2012 11:01:34 -0500, James Kuyper
> <james...@verizon.net> wrote:

>><snip>
>>Claims have frequently been made on
>>comp.lang.c that, while the C standard allows CHAR_BIT != 8, the

As I remember the stories, the CRAY-1 had 64 bit char.

>>existence of such implementations is a myth. I'm glad to have a specific
>>counter example to cite.
>><snip>

> I believe that C was implemented on the PDP-10. I didn't use
> it when I was programming the PDP-10 (I used assembly, then,
> and some other languages... but not C, until I worked on Unix
> v6 in '78.) But that was a 36-bit machine. And ASCII was
> packed into 7 bits so that 5 chars fit in a word. No one used
> 8, so far as I recall. That was the standard method. So I'm
> curious now what the C implementation did.

Yes the TOPS-10 file format stores ASCII as 5 characters
to the word, but C can't do that. In a discussion some time
ago about actual implementations, 9 and 18 bit char were
discussed. The PDP-10 has instructions for operating on
halfwords, which could be used, possibly with execute (XCT)
to select the appropriate instruction, or for loops an
unrolled loop. Or 9 bit chars using the byte instructions.

> Of course, all that is prior to any standard. But it might be
> another case to discuss, anyway.

I keep wondering about a C compiler for the 7090, one of the
last sign magnitude machines, and also 36 bits. It was usual
to store six 6 bit BCDIC (more often called just BCD) characters
per word. Also, 6 bit characters on 7 track magnetic tape.

The card reader on the 704, and I will guess also the 7090,
reads one card row into two 36 bit words, ignoring the last
eight columns. Software had to convert that into 12
characters in two words.

-- glen

Jon Kirwan

unread,
Nov 29, 2012, 4:55:09 PM11/29/12
to
On Thu, 29 Nov 2012 22:40:41 +0200, upsid...@downunder.com
wrote:
There are good reasons for self-modifying code space. The
first and most obvious would be an operating system loading a
program into memory. While the O/S does this, the memory is
treated as data. A more meaningful example for small embedded
applications, perhaps, is the ability to modify interrupt
vectors pointing at code. If the processor refers to I space
only for interrupt vectors, it may not be possible. And there
are times when you have externally available code (large
external serial-access memory (low pin count), for example,
used to store code blocks infrequently needed and where the
internally supplied flash just isn't big enough.)

In embedded, there are reasons. For operating systems, there
are also reasons. And I am tapping only what is on the tip of
my tongue and using no imagination, right now.

Jon

Grant Edwards

unread,
Nov 29, 2012, 4:59:55 PM11/29/12
to
On 2012-11-29, upsid...@downunder.com <upsid...@downunder.com> wrote:
> On Thu, 29 Nov 2012 16:36:34 +0000, John Devereux
><jo...@devereux.me.uk> wrote:
>
>>Grant Edwards <inv...@invalid.invalid> writes:
>>
>>> On 2012-11-29, Tim Wescott <t...@seemywebsite.com> wrote:
>>>
>>>> It's certainly what I would expect from gcc-avr. There's no reason you
>>>> can't make a beautifully compliant, reasonably efficient compiler that
>>>> works well on the AVR.
>>>
>>> avr-gcc does indeed work very nicely as long as you don't look at the
>>> code generated when you use pointers. You'll go blind -- especially
>>> if you're used to something like the msp430. It's easy to forget that
>>> the AVR is an 8-bit CPU not a 16-bit CPU like the '430, and use of
>>> 16-bit pointers on the AVR requires a lot of overhead.
>>
>>Other problem with it is the separate program and data memory
>>spaces. Fine for small deeply embedded things but started to show strain
>>when I wanted a LCD display, menus etc. I would not use it for a new
>>project unless there was a very good reason, ultra-low power
>>perhaps. Cortex M3 is much nicer but the chips are much more complicated
>>of course.
>
> Except for self modifying code, why would one want data (program)
> access into program space (unless you are writing a linker or
> debugger) ??

The "program" space was flash (non-volatile). The "data" space was
registers and RAM (volatile). All non-volatile data (strings, screen
templates, lookup tables, menu structures, and so) has to be in flash
memory (IOW "program space"). It makes a _lot_ of sense to just use
directly from flash instead of copying it all to RAM when RAM is so
scarce.

> While working with PDP-11's in the 1970's, the ability to use separate
> I/D (Instruction/Data) space helped a lot to keep code/data in private
> 64 KiD address spaces.

But in a PDP11, Data space was plentiful, and constant data didn't
also have to reside in Instruction space (because that's the only
non-volatile storage you have).

On some parts there is some erasable non-volatile storage in data
space. But, it's always scarce, and putting stuff there that is never
to be altered is both wasteful and dangerous.

--
Grant Edwards grant.b.edwards Yow! I didn't order any
at WOO-WOO ... Maybe a YUBBA
gmail.com ... But no WOO-WOO!

Grant Edwards

unread,
Nov 29, 2012, 5:06:08 PM11/29/12
to
Nobody said anything about modifying code space.

The "data" that's put in code space is never modified (at least not
any any project I've ever seen).

It's not _modifying_ the progam space that's the issue (that is
generally only done for firmware updates, where the entire flash is
erased and reprogrammed).

Simply _reading_ program space _as_data_ is problematic. If you've
got a lot of string constants or constant tables, you want to just
leave them in flash (program space) rather than copy them all to
(scarce) RAM on startup.

Now you need three-byte pointers/addresses to differentiate between
data at 0xABCD in data space and the data at 0xABCD in program space.
Three byte pointers is how some compilers solve that problem -- but I
don't think avr-gcc does that.

--
Grant Edwards grant.b.edwards Yow! I think my career
at is ruined!
gmail.com

Stephen Sprunk

unread,
Nov 29, 2012, 5:22:02 PM11/29/12
to
On 29-Nov-12 14:36, Keith Thompson wrote:
> upsid...@downunder.com writes:
>> IMHO CHAR_BIT = 21 is the correct way to handle the Unicode range.
>>
>> On the Unicode list, I even suggested packing three 21 characters into
>> a single 64 bit data word as UTF-64 :-)
>
> I like it -- but it breaks as soon as they add U+200000 or higher, and
> I'm not aware of any guarantee that they won't.

I thought they had guaranteed they would never go above U+10FFFF, which
would break UTF-16.

> I've thought of UTF-24, encoding each character in 3 octets; that's
> good for up to 16,777,216 distinct code points.

AIUI, there are some DSPs with CHAR_BIT==24 (or was that 12?).

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

Jon Kirwan

unread,
Nov 29, 2012, 5:22:25 PM11/29/12
to
Sorry I didn't interpret things well.

>The "data" that's put in code space is never modified (at least not
>any any project I've ever seen).

I've needed writable code space. Thunking is one such
example.

>It's not _modifying_ the progam space that's the issue (that is
>generally only done for firmware updates, where the entire flash is
>erased and reprogrammed).

While I agree with the "generally" I don't agree that this
translates into 100%.

>Simply _reading_ program space _as_data_ is problematic. If you've
>got a lot of string constants or constant tables, you want to just
>leave them in flash (program space) rather than copy them all to
>(scarce) RAM on startup.

Indeed. Completely agreed.

Jon

pete

unread,
Nov 29, 2012, 6:25:44 PM11/29/12
to
I recall reading some posts in this newsgroup a long time ago,
which claimed that under certain circumstances,
that it was possible in C99,
for unsigned int to promote to type signed int.

But that was never the case.

In C99
6.3.1.1 paragraph 2, read as "less than"
instead of "less than or equal" as you have above;
and unsigned int type was covered by "All other types"
in the last sentence.


--
pete

Keith Thompson

unread,
Nov 29, 2012, 9:15:14 PM11/29/12
to
glen herrmannsfeldt <g...@ugcs.caltech.edu> writes:
> In comp.lang.c Jon Kirwan <jo...@infinitefactors.org> wrote:
>> On Thu, 29 Nov 2012 11:01:34 -0500, James Kuyper
>> <james...@verizon.net> wrote:
>
>>><snip>
>>>Claims have frequently been made on
>>>comp.lang.c that, while the C standard allows CHAR_BIT != 8, the
>
> As I remember the stories, the CRAY-1 had 64 bit char.
[...]

That may well be true; I never used a Cray-1. (And there was more
emphasis on Fortran, or should I say FORTRAN, than on C.)

By the time I started using Crays, they were running Unicos, Cray's
version of Unix, so they pretty much had to have CHAR_BIT==8.

Keith Thompson

unread,
Nov 29, 2012, 9:36:39 PM11/29/12
to
Stephen Sprunk <ste...@sprunk.org> writes:
> On 29-Nov-12 14:36, Keith Thompson wrote:
>> upsid...@downunder.com writes:
>>> IMHO CHAR_BIT = 21 is the correct way to handle the Unicode range.
>>>
>>> On the Unicode list, I even suggested packing three 21 characters into
>>> a single 64 bit data word as UTF-64 :-)
>>
>> I like it -- but it breaks as soon as they add U+200000 or higher, and
>> I'm not aware of any guarantee that they won't.
>
> I thought they had guaranteed they would never go above U+10FFFF, which
> would break UTF-16.

You're right. <http://www.unicode.org/faq/utf_bom.html> says:

Both Unicode and ISO 10646 have policies in place that formally
limit future code assignment to the integer range that can be
expressed with current UTF-16 (0 to 1,114,111).

>> I've thought of UTF-24, encoding each character in 3 octets; that's
>> good for up to 16,777,216 distinct code points.
>
> AIUI, there are some DSPs with CHAR_BIT==24 (or was that 12?).

James Kuyper

unread,
Nov 29, 2012, 10:46:54 PM11/29/12
to
An unsigned type whose entire range can be represented by an int will
promote to signed int, as can easily be confirmed by checking the above
text, and that point has been raised in this group - there were several
threads that touched on that subject in just this past summer. However,
anyone who claimed that it could happen to "unsigned int" was mistaken.
That clause explicitly applies only to types "other than int or unsigned
int".

> But that was never the case.
>
> In C99
> 6.3.1.1 paragraph 2, read as "less than"
> instead of "less than or equal" as you have above;
> and unsigned int type was covered by "All other types"
> in the last sentence.

n1256.pdf (which is C99 with all three TCs applied, making it MORE
useful than C99 itself) and n1570.pdf (which is essentially identical to
C2011) both have "less than or equal to". The line is marked as being
changed from C99 in n1256.pdf, implying that one of the TCs is the
reason. My copy of C99 itself is inaccessible right now, so I can't
confirm the nature of the change.
--
James Kuyper

John Devereux

unread,
Nov 30, 2012, 3:29:29 AM11/30/12
to
Yes, that is precisely it. The AVRs especially tended to have lots of
flash but little RAM. Access to program memory is possible on the AVR,
but you have to use special attribute modifiers everywhere and the
resulting objects become incompatible with the standard libraries, so
you have to write special versions of these...

Another thing is that, being an 8 bit machine, int and short operations
are not atomic. So you have to be very careful about protecting
variables shared with interrupt handlers (or other tasks in a preemptive
system). Good practice anyway of course but a modern CPU like Cortex M3
is a lot more forgiving since even 32 bit load/store operations are
atomic.

[...]


--

John Devereux

lawrenc...@siemens.com

unread,
Nov 30, 2012, 11:47:52 AM11/30/12
to
James Kuyper <james...@verizon.net> wrote:
>
> n1256.pdf (which is C99 with all three TCs applied, making it MORE
> useful than C99 itself) and n1570.pdf (which is essentially identical to
> C2011) both have "less than or equal to". The line is marked as being
> changed from C99 in n1256.pdf, implying that one of the TCs is the
> reason. My copy of C99 itself is inaccessible right now, so I can't
> confirm the nature of the change.

It was TC2 and the change came from DR 230. It was to handle the case of
enumerationed types with the same rank as int, it didn't have anything
to do with unsigned int.
--
Larry Jones

I'm a genius. -- Calvin

Boudewijn Dijkstra

unread,
Dec 11, 2012, 4:53:48 AM12/11/12
to
Op Thu, 29 Nov 2012 21:36:53 +0100 schreef Keith Thompson <ks...@mib.org>:
> upsid...@downunder.com writes:
>> On Thu, 29 Nov 2012 11:01:34 -0500, James Kuyper
>> <james...@verizon.net> wrote:
>>
>>> On 11/29/2012 10:23 AM, Grant Edwards wrote:
>>>> On 2012-11-29, Tim Wescott <t...@seemywebsite.com> wrote:
>>
>>>> ... Trying to impliment
>>>> any sort of communications protocol with that was fun.
>>
>> Using left/right shifts and AND and OR operations work just fine.
>> Works OK with different CHAR_BIT and different endianness platforms.
>> Do not try to use structs etc.
>>
>>> Thanks for that information. Claims have frequently been made on
>>> comp.lang.c that, while the C standard allows CHAR_BIT != 8, the
>>> existence of such implementations is a myth. I'm glad to have a
>>> specific counter example to cite.
>>
>> IMHO CHAR_BIT = 21 is the correct way to handle the Unicode range.
>>
>> On the Unicode list, I even suggested packing three 21 characters into
>> a single 64 bit data word as UTF-64 :-)
>
> I like it -- but it breaks as soon as they add U+200000 or higher

Not really. You can use the spare bit to indicate a different packing.

> , and I'm not aware of any guarantee that they won't.
>
> I've thought of UTF-24, encoding each character in 3 octets; that's
> good for up to 16,777,216 distinct code points.

I hope that, when the galactic discovery is underway that would make this
amount of code points necessary, software engineering will have evolved
beyond the point of humans worrying about bit widths and encodings.


--
Gemaakt met Opera's revolutionaire e-mailprogramma:
http://www.opera.com/mail/

John Devereux

unread,
Dec 11, 2012, 9:53:10 AM12/11/12
to
UTF-8 is the way forward isn't it?

--

John Devereux

Ivan Shmakov

unread,
Dec 14, 2012, 11:25:56 AM12/14/12
to
>>>>> John Devereux <jo...@devereux.me.uk> writes:
>>>>> "Boudewijn Dijkstra" <sp4mtr4p....@indes.com> writes:
>>>>> Op Thu, 29 Nov 2012 21:36:53 +0100 schreef Keith Thompson:

[...]

>>> I've thought of UTF-24, encoding each character in 3 octets; that's
>>> good for up to 16,777,216 distinct code points.

>> I hope that, when the galactic discovery is underway that would make
>> this amount of code points necessary, software engineering will have
>> evolved beyond the point of humans worrying about bit widths and
>> encodings.

> UTF-8 is the way forward isn't it?

I doubt it is. FWIW, it requires three octets for Cyrillic,
while UTF-16 requires only two. Personally, I'd try to use the
latter whenever possible (which means: anywhere, unless OS
interaction issues are deeply involved in the matter.)

--
FSF associate member #7257

Richard Damon

unread,
Dec 15, 2012, 2:30:44 PM12/15/12
to
On 12/11/12 9:53 AM, John Devereux wrote:
>
> UTF-8 is the way forward isn't it?
>

As with most compression systems it depends on what the usage pattern of
characters is. If the text base is mostly the 7 bit ASCII character set,
with some of the other lower valued characters and only a few bigger
valued characters, UTF-8 makes sense. If most of the characters are in
the larger values (like using a non-Latin based character set) then
UTF-16 may make much more sense.

Keith Thompson

unread,
Dec 15, 2012, 3:24:51 PM12/15/12
to
UTF-8 has a couple of other advantages. It's equivalent to ASCII
as long as all the characters are <= 127, which means you can
(mostly) deal with UTF-8 using old tools that aren't Unicode-aware.
And it has no byte ordering issues, so it doesn't need a BOM (Byte
Order Mark).

As for compression, you can always use another compression tool
if necessary; gzipped UTF-8 should be about as compact as gzipped
UTF-16.

Richard Damon

unread,
Dec 15, 2012, 6:02:49 PM12/15/12
to
On 12/15/12 3:24 PM, Keith Thompson wrote:
> Richard Damon <news.x.ri...@xoxy.net> writes:
>> On 12/11/12 9:53 AM, John Devereux wrote:
>>> UTF-8 is the way forward isn't it?
>>
>> As with most compression systems it depends on what the usage pattern of
>> characters is. If the text base is mostly the 7 bit ASCII character set,
>> with some of the other lower valued characters and only a few bigger
>> valued characters, UTF-8 makes sense. If most of the characters are in
>> the larger values (like using a non-Latin based character set) then
>> UTF-16 may make much more sense.
>
> UTF-8 has a couple of other advantages. It's equivalent to ASCII
> as long as all the characters are <= 127, which means you can
> (mostly) deal with UTF-8 using old tools that aren't Unicode-aware.
> And it has no byte ordering issues, so it doesn't need a BOM (Byte
> Order Mark).
>
> As for compression, you can always use another compression tool
> if necessary; gzipped UTF-8 should be about as compact as gzipped
> UTF-16.
>

UTF-8 and UTF-16 *ARE* compression methods. Uncompressed Unicode would
be UTF-32 or UCS-4, using 32 bits per character. For most use, if you
don't need code points above U+0FFFF, then you might consider UCS-2
uncompressed format. Then UTF-16 isn't really compression, but a method
to mark the very rare character above U+0FFFF. UTF-8 is really just a
compression format to try and remove some of the extra space, and will
do so to the extent that characters 0-7F are more common than U+0800 and
higher, the former saving you a byte, and the latter costing you one.

UTF-8 does have the other advantage that you mention, looking like ASCII
for those characters allowing many Unicode unaware programs to mostly
function with UTF-8 data.

Keith Thompson

unread,
Dec 15, 2012, 6:45:31 PM12/15/12
to
Richard Damon <news.x.ri...@xoxy.net> writes:
> On 12/15/12 3:24 PM, Keith Thompson wrote:
[...]
>> As for compression, you can always use another compression tool
>> if necessary; gzipped UTF-8 should be about as compact as gzipped
>> UTF-16.
>>
>
> UTF-8 and UTF-16 *ARE* compression methods.
[...]

I don't recall saying they aren't.

But they're (relatively) simplistic compression methods that don't
adapt to the content being compressed, which is why applying another
compression tool (I *did* say "another") can be useful.

Nobody

unread,
Dec 15, 2012, 10:28:57 PM12/15/12
to
Size isn't the only issue; the fact that UTF-16 may (and usually does)
contain null bytes ('\0') rules it out for many applications.

Similarly, anything which expects specific bytes (e.g. '\x0a', '\x0d',
etc) to have their "usual" meanings regardless of context will work fine
with UTF-8 but not with UTF-16 or UTF-32.

upsid...@downunder.com

unread,
Dec 16, 2012, 3:40:59 AM12/16/12
to
On Sat, 15 Dec 2012 14:30:44 -0500, Richard Damon
For any given non-Latin based language, there are only a few possible
bit combinations in the first byte(s) of the UTF-8 sequence, thus it
should compress quite well.

For use inside a program, UTF-32 would be the natural choice with 1
array element/character.

Compressing a UTF-32 file using some form of Huffman coding, should
not take more space than compressed UTF-8/UTF-16 files, since the
actually used (and stored) symbol table would reflect the actual usage
of sequences in the whole file. Doing the compression on the fly in a
communication link would be less effective, since only a part of the
data would be available at a time, in order to keep the latencies
acceptable.

Richard Damon

unread,
Dec 16, 2012, 10:55:56 PM12/16/12
to
On 12/15/12 6:45 PM, Keith Thompson wrote:
> Richard Damon <news.x.ri...@xoxy.net> writes:
>> On 12/15/12 3:24 PM, Keith Thompson wrote:
> [...]
>>> As for compression, you can always use another compression tool
>>> if necessary; gzipped UTF-8 should be about as compact as gzipped
>>> UTF-16.
>>>
>>
>> UTF-8 and UTF-16 *ARE* compression methods.
> [...]
>
> I don't recall saying they aren't.
>
> But they're (relatively) simplistic compression methods that don't
> adapt to the content being compressed, which is why applying another
> compression tool (I *did* say "another") can be useful.
>

But they are fundamentally different than other compressions.
Multi-byte/symbol encodings are generally designed so that it is
possible to process the data in that encoding. It isn't that much harder
to process the data then if it was kept fully expanded. Some operations,
like computing the length of a string, require doing a pass over the
data instead of just taking the difference in the addresses, but nothing
becomes particularly hard.

On the other hand, it is very unusual for any program to actually
process "zipped" data as such, it is almost always uncompressed to be
worked on and then re-compressed, and any changes tend to require
reprocessing the entire rest of the file (or at least the current
compression block).

Keith Thompson

unread,
Dec 17, 2012, 2:20:43 AM12/17/12
to
I'd say that's a difference of degree, not anything fundamental.

Computing the length of a string requires doing a pass over it, whether
it's UTF-8 encoded or gzipped. And it's certainly possible to process
UTF-8 data by internally converting it to UTF-32.

And copying a file doesn't require uncompressing it, regardless of the
format.

Phil Carmody

unread,
Dec 17, 2012, 5:27:55 AM12/17/12
to
Richard Damon <news.x.ri...@xoxy.net> writes:
> On 12/15/12 3:24 PM, Keith Thompson wrote:
> > Richard Damon <news.x.ri...@xoxy.net> writes:
> >> On 12/11/12 9:53 AM, John Devereux wrote:
> >>> UTF-8 is the way forward isn't it?
> >>
> >> As with most compression systems it depends on what the usage pattern of
> >> characters is. If the text base is mostly the 7 bit ASCII character set,
> >> with some of the other lower valued characters and only a few bigger
> >> valued characters, UTF-8 makes sense. If most of the characters are in
> >> the larger values (like using a non-Latin based character set) then
> >> UTF-16 may make much more sense.
> >
> > UTF-8 has a couple of other advantages. It's equivalent to ASCII
> > as long as all the characters are <= 127, which means you can
> > (mostly) deal with UTF-8 using old tools that aren't Unicode-aware.
> > And it has no byte ordering issues, so it doesn't need a BOM (Byte
> > Order Mark).
> >
> > As for compression, you can always use another compression tool
> > if necessary; gzipped UTF-8 should be about as compact as gzipped
> > UTF-16.
> >
>
> UTF-8 and UTF-16 *ARE* compression methods.

Hmmm, those who work in compression tend to prefer the term
"encodings", for such fixed 1-1 mappings of input to output
tokens. UTF-8, and the others you consider to be "compressed",
simply have output tokens of different lengths.

Phil
--
I'm not saying that google groups censors my posts, but there's a strong link
between me saying "google groups sucks" in articles, and them disappearing.

Oh - I guess I might be saying that google groups censors my posts.

John Devereux

unread,
Dec 17, 2012, 6:17:06 AM12/17/12
to
Keith Thompson <ks...@mib.org> writes:

> Richard Damon <news.x.ri...@xoxy.net> writes:
>> On 12/11/12 9:53 AM, John Devereux wrote:
>>> UTF-8 is the way forward isn't it?
>>
>> As with most compression systems it depends on what the usage pattern of
>> characters is. If the text base is mostly the 7 bit ASCII character set,
>> with some of the other lower valued characters and only a few bigger
>> valued characters, UTF-8 makes sense. If most of the characters are in
>> the larger values (like using a non-Latin based character set) then
>> UTF-16 may make much more sense.
>
> UTF-8 has a couple of other advantages. It's equivalent to ASCII
> as long as all the characters are <= 127, which means you can
> (mostly) deal with UTF-8 using old tools that aren't Unicode-aware.
> And it has no byte ordering issues, so it doesn't need a BOM (Byte
> Order Mark).

Yes precisely. I had to update an embedded system with a simple
home-made gui, so that it could do Chinese. I was pleasantly suprised
how painless it was using UTF8. Strings are still null terminated char
arrays, most everything just worked as before. You can't predict the
number of characters just from the string size, but I was already using
proportional fonts so this was not an issue. I could even abuse the C
standard - sorry c.l.c - and embed utf8 in the C source code and that
worked too. (I moved these out into resource files in the end though).

> As for compression, you can always use another compression tool
> if necessary; gzipped UTF-8 should be about as compact as gzipped
> UTF-16.

--

John Devereux

Tim Rentsch

unread,
Dec 17, 2012, 6:53:33 AM12/17/12
to
Actually it was.

> In C99
> 6.3.1.1 paragraph 2, read as "less than"
> instead of "less than or equal" as you have above;
> and unsigned int type was covered by "All other types"
> in the last sentence.

Yes but it was changed by a TC (still counts as part of C99
even though the TC didn't issue until later). Look at N1256.

In fairness I should add that it was an unintended consequence.
Still, the official text was changed so that unsigned int
could 'promote' to int under some circumstances, and that
was true for much or most of the time C99 was in force.

Ben Bacarisse

unread,
Dec 17, 2012, 9:07:45 AM12/17/12
to
John Devereux <jo...@devereux.me.uk> writes:
<snip>
> [...] I could even abuse the C
> standard - sorry c.l.c - and embed utf8 in the C source code and that
> worked too. (I moved these out into resource files in the end though).

It's not much of an abuse. Multibyte character sequences are permitted
in string literals and may even be converted to wide character strings
as if by the use of the mbstowcs function when appropriate. In C99, the
only trouble is that what encoding is assumed, and what characters are
permitted, is implementation defined. Whilst that's also true in the
latest standard, C11 does add the u8 prefix to produce UTF-8 encoded
strings.

It also adds the U (and u) prefix to make Unicode character arrays from
a multibyte character string, but the encoding is still implementation
defined and dependent on the locale (as it should be, I think).

--
Ben.
0 new messages