I have been wondering that can a char can be processed as a negative
character means if I do something like..
char c= '-2';
printf("%c",c);
Output should be -2 instead of 2
If the answer is no then what is the existence of ...
signed char ranging from -128 to 127.
What is that part meant for yeas I am talking about -128 to 0,because
no way we are doing to get a char representation as a positive value.
Why do we say or make most of our libraries function char values as
unsigned char if by default all character are unsigned!!
I hope I am able to make the situation clear ..
Please guide me.
Cheers!!
That can be negative, though what you mean is c = -2; without quotes.
> printf("%c",c);
That prints c as a character, not as a character code.
Anyway, negative char is quite normal outside 7-bit ASCII land.
#include <stdio.h>
int main() { printf("%d\n", '�'); return 0; }
prints -27 on the host where I'm writing this.
That's why e.g. <ctype.h> says its functions take values in the range of
unsigned char, or EOF. Generally when you want the character code of a
char c, you should use (unsigned char) c.
--
Hallvard
"For portability specify signed or unsigned if non-character data is
to be stored in char variables"
Cheers!!
I really don't know what you're asking.
If you want to store character values, use char. If you want very
small signed numbers, use signed char. If you want very small
unsigned numbers or raw bytes, use unsigned char.
If that doesn't answer your question, you'll have to ask more clearly.
--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
The passage is very likely talking about using char variables
as "tiny integers" rather than as codes designating glyphs. If
you're doing this, the advice is to avoid plain `char' because
it will be a signed type on some systems and an unsigned type on
others, meaning that arithmetic and comparisons and so on could
give different results. Instead, use `signed char' if you want a
tiny integer that can store tiny positive and negative values, or
use `unsigned char' if you intend to store only tiny non-negative
values.
My question is simple..
1.Is there any point of defining a char variable as unsigned when I am
dealing with pure ASCII characters only ?
2.and in what scenarios should I define a character as unsigned?
2.Is this issue has anything to do with portability,means defining
something as sign character will fail on that machine which doesn.t
support signed byte... is that the reason we keep our char definitions
as unsigned (portability reason)...
like if we do...
on x86(which supports sign byte)
char c=-2; in memory 0xfe
unsigned char c=-2 in memory 0xfe ...but any expression which
reference this will evaluate it as 254
on machine which doesn't support signed byte
char c=-2; in memory 0x02 no issue of making it two's
compliment...so store the numeric value ignoring sign
unsigned char c=-2 in memory 0xfe it will be rounded up to
positive value ie 254
I dont know any machine other than x86,this is my assumption
only ...but K&R2 says that if char c=-2; (which is 0xfe looks like
negative) "arbitrary bit patterns stored in character variables may
appear to be negative on some machines ,yet positive on others.For
portability,specify signed or unsigned if non-character data is to be
stored in char variables"
If someone would be able to explain at architecture level,I would
really appreciate that.
Hope this time I am able to articulate what I am confused about.
Cheers!!
If you are storing characters (ASCII, EBCDIC, or whatever),
use plain `char'. That's what it's for.
> 2.and in what scenarios should I define a character as unsigned?
When you are treating it not as a "character," but as a small
unsigned integer. One common situation is when you're looking at
the individual bytes of the representation of some other kind of
datum: You aim an `unsigned char*' at the thing and use it to get
at the individual bytes. (By the way, the fact that this practice
is common doesn't mean it's a good thing; it's just overused.)
> 2.Is this issue has anything to do with portability,means defining
> something as sign character will fail on that machine which doesn.t
> support signed byte... is that the reason we keep our char definitions
> as unsigned (portability reason)...
All C implementations support the `char', `unsigned char', and
`signed char' types. It is the compiler's business to get them all
to work properly on the underlying hardware, whatever its "ordinary"
treatment of byte-sized quantities might be.
`char' behaves the same as `unsigned char' on some systems, and
the same as `signed char' on others.
> I dont know any machine other than x86,this is my assumption
> only ...but K&R2 says that if char c=-2; (which is 0xfe looks like
> negative) "arbitrary bit patterns stored in character variables may
> appear to be negative on some machines ,yet positive on others.For
> portability,specify signed or unsigned if non-character data is to be
> stored in char variables"
0xfe doesn't look negative to me: it looks like a way of writing
the value 254, which could also be written 0376. No matter how you
write it, it's greater than zero, a positive value.
... by which I'm trying to encourage you to stop thinking about
representations and start thinking about values. In the cited
fragment `char c = -2;' there are two possibilities:
1) On a system where `char' is signed, `c' is initialized with
the value "minus two." This value is negative, less than zero.
2) On a system where `char' is unsigned, `c' cannot hold a negative
value. Under C's rules for converting from signed to unsigned
integers, `c' receives the value `UCHAR_MAX+1-2', which will
be 255+1-2=254 on many machines. This value is positive (on
all machines), greater than zero.
If you were to inspect the bit pattern that is stored in `c', you
might very well find that it's 11111110. But bit patterns are not
values, not all by themselves. Quick: What's the value of the bit
pattern 01000010001010000000000000000000? It might be the `int'
value 1109917696, or the `float' value 42.0f, or a pointer to a
block of freshly-allocated memory, or a struct containing the
`short' value 16936 followed by a three-bit bit-field with value
zero followed by padding, or the string "B(" in a zero-filled four-
byte `char' array, or any of a number of other things. You cannot
discern the value of a "naked" bunch of bits; you must use the type
to learn what the bits mean. The exact same batch of bits might
signify a positive value or a negative value, depending on the type
with which you interpret them. When you say "0xfe looks like negative,"
you are implying a type -- and you are forgetting that the type you
imply is not necessarily the type your C code calls for.
No. As Keith said above ”If you want to store character values, use char.”
> 2.and in what scenarios should I define a character as unsigned?
You shouldn't. If you know that it is a character, use char. Only use
unsigned char if you're storing numbers rather that characters.
> 2.Is this issue has anything to do with portability,means defining
Yes. the plain 'char' type is signed on some implementations, and an
unsigned type on others. You need to keep that possibility in mind when
you write your code. You should convert to 'unsigned char' before
passing a char value to one of the <ctype.h> macros; otherwise it's not
very difficult to avoid problems.
It's very easy to distinguish the two cases: if char is signed, than
CHAR_MIN will be negative, otherwise it will be 0.
> char c=-2; in memory 0xfe
As Keith said above, if you're storing a number, you shouldn't use char.
Use signed char or unsigned char. Obviously, you need signed char if you
want to store a value of -2.
Well, mostly. The is*() and to*() functions in <ctype.h> expect
arguments representable as an unsigned char (or the value EOF).
I doubt whether the output is going to be -2, it's going to be a single
character at most, and probably something weird (a small square block on my
machine, corresponding to code 254 which has the same bit pattern as -2).
'char' is a misnomer for this type, which is really just a very short
integer (typically 8 bits) that can be signed or unsigned.
For storing actual character data, there are apparently machines where some
character sets use negative codes, but I don't know of any. I only know
ASCII which uses +0 to +127, and various supersets which still have positive
values.
--
Bartc
simple answer:
char is normally signed (granted, not all C compilers agree to this, as a
few older/oddball compilers have made it default to unsigned).
so 'char'=='character' is a misnomer (historical accident?...) since for
most practical uses, ASCII and UTF-8 chars are better treated as unsigned
(we just use 'char' as a matter of tradition, and cast to unsigned char
wherever it matters), and for most other uses (where we want a signed byte),
thinking of 'char' as 'character' is misleading (note that there are many
cases where a signed 8-bit value actually makes some sense).
many other (newer) languages reinterpret things, typically assigning 'char'
to a larger size (most often 16, or sometimes 32 bits) and adding
byte/sbyte/ubyte/... for the 8-bit types (there is some inconsistency as to
whether 'byte' is signed or unsigned for a given language, so it depends
some on the particular language designer).
in my own uses, I typically use typedef to define 'byte' as 'unsigned char'
and 'sbyte' as 'signed char'. I also use 'u8' and 's8' sometimes.
or such...
Most of the compilers I've used have char signed, but I've used
several where it's unsigned (several Cray systems, SGI Irix, IBM AIX).
And char is almost certainly signed on any EBCDIC-based system.
It's safest not to think of either signed or unsigned as "normal".
Use plain char only if you don't *care* whether it's signed or
unsigned.
> "BGB / cr88192" <cr8...@hotmail.com> writes:
> [...]
>> simple answer:
>> char is normally signed (granted, not all C compilers agree to this, as a
>> few older/oddball compilers have made it default to unsigned).
> [...]
>
> Most of the compilers I've used have char signed, but I've used
> several where it's unsigned (several Cray systems, SGI Irix, IBM AIX).
> And char is almost certainly signed on any EBCDIC-based system.
I'm afraid not. Although I now lack an IBM mainframe to check on, I believe
that char is (by necessity) unsigned on EBCDIC systems.
You see, the characters '0' through '9' are represented by the octets 0xF0
thrugh 0xF9. Given that CHAR_BIT == 8 on EBCDIC systems, and the C standard
(1990, although it should be the same in all versions) states (in section
5.2.1.3) that
"Both the basic source and basic execution character sets shall have the
following members: the 26 uppercase letters of the Latin alphabet
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z
the 26 lowercase letters of the Latin alphabet
a b c d e f g h i j k l m
n o p q r s t u v w x y z
the 10 decimal digits
0 1 2 3 4 5 6 7 8 9
the following 29 graphic characters
! " # % & ' ( ) * + , - . / :
; < = > ? [ \ ] ˆ _ { | } ˜
the space character, and control characters representing horizontal tab,
vertical tab, and form feed. The representation of each member of the
source and execution basic character sets shall fit in a byte."
then the octets 0xF0 through 0xF9 are considered to be part of the "basic
source" and/or "basic execution" charactersets.
Knowing this, we then consider the effect of section 6.2.5.3, in that
"An object declared as type char is large enough to store any member of
the basic execution character set. If a member of the basic execution
character set is stored in a char object, its value is guaranteed to be
positive."
So, 0xF0 through 0xF9 are guaranteed to be positive.
Since these systems use twos-complement math, and (for octets) values over
0x7f are considered negative in that math, the octets 0xF0 through 0xF9
would be considered to be negative values if char were signed.
> It's safest not to think of either signed or unsigned as "normal".
> Use plain char only if you don't *care* whether it's signed or
> unsigned.
Agreed.
--
Lew Pitcher
Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------
Whoops! Yes, I knew that; I thought "unsigned" and typed "signed".
Wishy-washy-signedness of chars is full of traps - your list isn't exaustive.
Phil
--
Any true emperor never needs to wear clothes. -- Devany on r.a.s.f1
who says it's normally signed? I've seen compilers that made it
optional.
Since it hardly ever matters I don't understand why you care.
> so 'char'=='character' is a misnomer (historical accident?...)
its an historical fact. It's hardly a misnomer, an accident or even an
error.
char is a C type for holding characters. I agree it might have been
a good idea to have a byte type as well.
> since for
> most practical uses, ASCII and UTF-8 chars are better treated as unsigned
why should ASCII be unsigned? ASCII fits in 7 bits. Even extended
ASCIIs
still manage fine as signed values.
> (we just use 'char' as a matter of tradition, and cast to unsigned char
> wherever it matters),
it never matters with character data. I use unsigned char when I'm
manipulating external representations (bytes or octets)
> and for most other uses (where we want a signed byte),
that is, hardly ever. I'm tempted to say "never" as I don't think
I've ever needed tiny little integers. But I can imagine uses
for TLIs.
> thinking of 'char' as 'character' is misleading
I disagree
> (note that there are many
> cases where a signed 8-bit value actually makes some sense).
such as?
> many other (newer) languages reinterpret things,
but that doesn't matter
<snip>
> in my own uses, I typically use typedef to define 'byte' as 'unsigned char'
I commonly do this
> and 'sbyte' as 'signed char'.
I never do this
> I also use 'u8' and 's8' sometimes.
I dislike these. Seems to be letting the metal show through
>>> I have been wondering that can a char can be processed as a negative
>>> character means if I do something like..
>>
>>> char c= '-2';
>>> printf("%c",c);
>> simple answer:
>> char is normally signed (granted, not all C compilers agree to this,
>> as a few older/oddball compilers have made it default to unsigned).
>
> who says it's normally signed? I've seen compilers that made it
> optional.
> Since it hardly ever matters I don't understand why you care.
>> since for
>> most practical uses, ASCII and UTF-8 chars are better treated as
>> unsigned
>
> why should ASCII be unsigned? ASCII fits in 7 bits. Even extended
> ASCIIs
> still manage fine as signed values.
It would be perverse. Anyone who had to make the decision, wouldn't
deliberately choose signed format for character data. It's just asking for
trouble. (Try creating a histogram of the 256 character codes used in some
text, you will need the character code to index into an array. It's a lot
easier with 0 to 255 rather than -128 to 127)
There should have been a char type (unsigned, but that doesn't even need
mentioning), and separate signed/unsigned ultra-short integers, ie.
byte-sized. (All easily added added with typedefs, but in practice, no-one
bothers.)
>> (we just use 'char' as a matter of tradition, and cast to unsigned
>> char wherever it matters),
>
> it never matters with character data. I use unsigned char when I'm
> manipulating external representations (bytes or octets)
>
>> and for most other uses (where we want a signed byte),
>
> that is, hardly ever. I'm tempted to say "never" as I don't think
> I've ever needed tiny little integers. But I can imagine uses
> for TLIs.
There's a few, such as ensuring your data just fitting into the memory of
your computer, rather than needing double the memory.
I haven't done any research but I'm guessing that a big chunk of the 'int'
variables in my code do only contain values representable in one byte. The
waste can be usually be ignored, on PCs, but in arrays/arrays of structs and
so on, it can be significant.
--
Bartc
If omitting `-CHAR_MIN' from an array index makes things "a
lot easier" rather than just "a trifle easier," you must use a
different difficulty scale than I'm accustomed to.
> There should have been a char type (unsigned, but that doesn't even need
> mentioning), and separate signed/unsigned ultra-short integers, ie.
> byte-sized. (All easily added added with typedefs, but in practice,
> no-one bothers.)
There was a time when I shared this opinion, but I think that
if DMR had specified unsignedness for `char' in original C, the
language would not have become popular. Machines that do most
operations in CPU registers need to fetch that `char' from memory,
and the receiving register is usually wider than the `char' is.
So what happens to the register's extra bits when the `char' is
loaded? I've seen three styles: The extra bits are zeroed, or
are copied from the `char's high-order bit, or are unchanged.
(All these are real behaviors of real machines, by the way: I'm
not talking about the DeathStation product line.)
Had DMR insisted on unsigned `char', machines of the first
type would have been happy but those of the second type would
have incurred the penalty of a full-word AND (or equivalent)
after every `char' fetch. In the days of limited memory and
slow cycles this would have put C at a disadvantage on those
machines, a disadvantage that might well have been crippling.
Remember, too, that the compiler had to run in limited memory
and with slow cycles, and would have been hard-pressed to figure
out when the AND might be avoidable. Instead of being the cradle
of C, the PDP-11 might have been its grave. By leaving the
signedness of `char' unspecified, DMR allowed the PDP-11 and
similar machines to "do what comes naturally" and use code that
was efficient for the architecture.
Machines of the third type -- well, there's a limit to how
far you can allow the language to bend. A language that said
"The value of a `char' variable is unspecified and may change
unpredictably without anything being stored to it, but at least
the low-order bits will remain intact" would not gather much of
a following ... Fortunately, on the only machine of this type
that I've personally used, the operation of zeroing a register
is cheap and can be inserted just before each `char' load without
a huge time or space penalty.
--
Eric Sosman
eso...@ieee-dot-org.invalid
Avoiding having to mess about with offsets (and doing a double-take on
whether it's +CHAR_MIN or -CHAR_MIN, knowing the latter is negative), and
just not having to keep possible negativeness of your char values always in
mind, makes it a little more than a trifle easier.
>
>> There should have been a char type (unsigned, but that doesn't even need
>> mentioning), and separate signed/unsigned ultra-short integers, ie.
>> byte-sized. (All easily added added with typedefs, but in practice,
>> no-one bothers.)
>
> There was a time when I shared this opinion, but I think that
> if DMR had specified unsignedness for `char' in original C, the
> language would not have become popular. Machines that do most
> operations in CPU registers need to fetch that `char' from memory,
> and the receiving register is usually wider than the `char' is.
> So what happens to the register's extra bits when the `char' is
> loaded? I've seen three styles: The extra bits are zeroed, or
> are copied from the `char's high-order bit, or are unchanged.
> (All these are real behaviors of real machines, by the way: I'm
> not talking about the DeathStation product line.)
You're obviously familiar with a lot more machines types than I am.
I've only ever programmed (to machine level), PDP10, Z80, 6800, 8051(?), and
x86 series architectures. Most of these have registers that are the same
width as a character.
I'm not so familiar with PDP11, but I think byte values used the lower half
of each register, with no auto-extend from 8 to 16 bits, and anyway can work
with that lower half independently, effectively giving it byte-wide
registers.
So having a separate, permanently unsigned char type I don't think would
have been an issue, *unless* the C language insists on char expressions
being evaluated as ints.
This would require unnecessary widening, and the default signedness of chars
might well depend on whether sign- or zero-extend was fastest. In that case,
*that* becomes the issue.
> of C, the PDP-11 might have been its grave. By leaving the
> signedness of `char' unspecified, DMR allowed the PDP-11 and
> similar machines to "do what comes naturally" and use code that
> was efficient for the architecture.
> a following ... Fortunately, on the only machine of this type
> that I've personally used, the operation of zeroing a register
> is cheap and can be inserted just before each `char' load without
> a huge time or space penalty.
OK, so which category does PDP11 come into? And what operation allows it to
load a char value into a register that will also sign-extend or clear the
top half?
--
Bartc
Let's just say that the bulk of this sentence demonstrates the
truth of its opening clause ...
> OK, so which category does PDP11 come into? And what operation allows it
> to load a char value into a register that will also sign-extend or clear
> the top half?
PDP-11 sign-extends (8-bit) bytes when loading them into (16-bit)
registers. The opcode is MOVB with a register destination (any of
R0..R5; it is unwise to target R6 or R7, aka SP and PC, with MOVB).
Let's just say it demonstrates the paucity of the instruction set details I
peeked at before posting...
Nothing was said about any sort of widening when a destination was a
register, only that most operations were either 8 or 16 bits.
>
>> OK, so which category does PDP11 come into? And what operation allows it
>> to load a char value into a register that will also sign-extend or clear
>> the top half?
>
> PDP-11 sign-extends (8-bit) bytes when loading them into (16-bit)
> registers. The opcode is MOVB with a register destination (any of
> R0..R5; it is unwise to target R6 or R7, aka SP and PC, with MOVB).
(I think I would take issue with DEC for having an instruction that does not
do what it says. So MOVB is 8 bits one end and 16 bits at the other? What
about MOVB R0,R1? INC R0? Or is the -B suffix only relevant for memory?)
If sign-extension was really something you couldn't get away from, then
perhaps it explains a couple of things about C, that no-one was bothered
with at the time because characters fit into 7 bits and it didn't matter.
--
bartc
The details of PDP-11 design and operation are off-topic here,
just as are those of x86 and Power, so I'm not going to describe
them at great length. I'll just say that the MOV and MOVB instructions
support all the machine's addressing modes, MOV operating on two-byte
words and MOVB on single bytes. When the target of MOVB is a register
(addressing mode zero, IIRC), the high-order half of the register gets
copies of the loaded byte's sign bit. When the source of MOVB is a
register, the high-order half is ignored. So to your questions:
- MOVB R0,R1 fetches the low-order eight bits of R0, places them
in the low-order eight bits of R1, and fills the high-order half
of R1 with copies of bit 7.
- INC R0 increments the sixteen-bit quantity in R0, and sets
assorted condition flags depending on the result.
And that's all I'll say about PDP-11, except to mention that I have
never encountered another machine with such a programmer-friendly and
"regular" instruction set. Oh, and to point out a curiosity: Playing
around, I found three different instructions that could be used to
return from an ordinary subroutine:
RET PC
MOV (SP)+,PC
JMP @(SP)+
The first was clearly the "intended" return instruction, but on the
hardware model we used the second was actually faster! (The third
was so slow that it ran longer than a pointless Usenet thread.)
>> What about MOVB R0,R1? INC R0? Or is the -B suffix only relevant for
>> memory?)
>
> - MOVB R0,R1 fetches the low-order eight bits of R0, places them
> in the low-order eight bits of R1, and fills the high-order half
> of R1 with copies of bit 7.
OK, thanks.
> - INC R0 increments the sixteen-bit quantity in R0, and sets
> assorted condition flags depending on the result.
I actually meant INCB, but forget it. I've already seen the instruction set
is not quite as orthogonal as I thought.
> around, I found three different instructions that could be used to
> return from an ordinary subroutine:
> (The third was so slow that it ran longer than a pointless Usenet
> thread.)
If you're referring to this one, I don't think investigating the origins of
C's quirky signed char type is such a waste of time.
--
Bartc
<--
who says it's normally signed? I've seen compilers that made it
optional.
Since it hardly ever matters I don't understand why you care.
-->
it is normally signed, since this is what a majority of the compilers on a
majority of the common architectures do.
granted, it is not safe to rely on this, and hence I often use an explicit
signed type if it really matters.
> so 'char'=='character' is a misnomer (historical accident?...)
<--
its an historical fact. It's hardly a misnomer, an accident or even an
error.
char is a C type for holding characters. I agree it might have been
a good idea to have a byte type as well.
-->
but, to have it hold characters, be of a fixed size, and signed?...
I would have rather had said separate byte type, and have left "char" to be
a machine-dependent type, similar to short or int.
> since for
> most practical uses, ASCII and UTF-8 chars are better treated as unsigned
<--
why should ASCII be unsigned? ASCII fits in 7 bits. Even extended
ASCIIs
still manage fine as signed values.
-->
errm, not really.
in practice, extended ASCII sets are generally defined as, and assumed to
be, within the 128-255 range...
likewise, signedness will generally not mix well with things like
encoding/decoding UTF-8 chars, ...
so, it is common practice in my case to cast to "unsigned char" when doing
things involving UTF-8, ... but otherwise leave strings as the more
traditional "char *" type.
> (we just use 'char' as a matter of tradition, and cast to unsigned char
> wherever it matters),
<--
it never matters with character data. I use unsigned char when I'm
manipulating external representations (bytes or octets)
-->
it matters with character data if it happens to be UTF-8.
many simple strategies for working with text may mess up fairly hard if the
text is UTF-8 and things are treated as signed.
> and for most other uses (where we want a signed byte),
<--
that is, hardly ever. I'm tempted to say "never" as I don't think
I've ever needed tiny little integers. But I can imagine uses
for TLIs.
-->
there are many cases, especially if one does things involving image
processing or signal processing...
one needs them much like one needs 16-bit floats, although, granted, there
are other, typically more convinient, ways of shoving floating-point values
into 8 or 16 bit quantities, in the absence of a floating point type
(typically revolving around log or sqrt...).
'fixed point' is also sometimes appropriate, but in these cases it really
depends on the data.
memory is not free, hence it matters that it not all be wasted
frivolously...
> thinking of 'char' as 'character' is misleading
<--
I disagree
-->
it is misleading if your string happens to be UTF-16...
then, suddenly, char is unable to represent said characters...
even with UTF-8, 'char' is not able to represent a character, only a single
byte which could be part of a multi-byte character.
hence, the issue...
> (note that there are many
> cases where a signed 8-bit value actually makes some sense).
<--
such as?
-->
signal-processing related numeric functions, small geometric data, ...
you "could" store everything as floats, and then discover that one is eating
up 100s of MB of memory on data which could easily be stored in much less
space (say, 1/4 the space).
> many other (newer) languages reinterpret things,
but that doesn't matter
<snip>
<--
> in my own uses, I typically use typedef to define 'byte' as 'unsigned
> char'
I commonly do this
> and 'sbyte' as 'signed char'.
I never do this
> I also use 'u8' and 's8' sometimes.
I dislike these. Seems to be letting the metal show through
-->
s8/u8, s16/u16, s32/u32, ...
these are good for defining structures where values are expected to be
specific sizes...
my (newer) x86 interpreter sub-project uses these sorts of types
extensively, mostly as, with x86 machine code, things matter down to the
single bits...
many other tasks may involve similar levels of bit-centric twiddling, and so
the naming may also hint at the possible use of bit-centric logic code...
however, for most more general tasks, I use byte and sbyte instead...
Then you're using the word "normally" in a manner that's inconsistent
with the way I and, I believe, most other people use it.
Something is not "normal" just because it's in the majority, and it
certainly isn't abnormal just because it's in the minority.
(Your quoting convention, on the other hand, is abnormal, and your
article would be easier to read if you used the normal convention.)
normal == "common as to the point of not typically being considered".
for example, "in the US people normally speak English".
this does not, for example, deny the existence of Mexicans, only that people
can "normally" disregard the possibility of them using Spanish when engaging
in conversations with people.
or, maybe, people normally speak Japanese in Japan.
...
> (Your quoting convention, on the other hand, is abnormal, and your
> article would be easier to read if you used the normal convention.)
>
mostly that particular bit of funkiness is this:
lazyness + OE + Google Groups;
as in:
Google Groups makes posts which mess up OE's quoting mechanism;
I am too lazy to go through every post in which this happens and put '>'
everywhere...
hence:
<--
...
-->
because this is much easier to type...
Plain char being signed is not as common as you seem to think it is.
I recently posted several examples of systems where plain char is
unsigned.
[...]
>> (Your quoting convention, on the other hand, is abnormal, and your
>> article would be easier to read if you used the normal convention.)
>>
>
> mostly that particular bit of funkiness is this:
> lazyness + OE + Google Groups;
[snip]
OE-QuoteFix. Either that, or take the time to fix your quoting
manually, for the sake of your readers (who are likely to spend more
cumulative time reading your words than you spent writing them).
<snip>
> normal == "common as to the point of not typically being
> considered".
IBM mainframes are so rare as to be not typically considered? That
will come as a huge surprise to the very many C programmers for whom
they are a daily reality.
<snip>
--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within
<snip>
> >> char is normally signed (granted, not all C compilers agree to this,
> >> as a few older/oddball compilers have made it default to unsigned).
>
> > who says it's normally signed? I've seen compilers that made it
> > optional. Since it hardly ever matters I don't understand why you care.
>
> >> since for most practical uses, ASCII and UTF-8 chars are better treated as
> >> unsigned
>
> > why should ASCII be unsigned? ASCII fits in 7 bits. Even extended
> > ASCIIs still manage fine as signed values.
>
> It would be perverse.
?
Just repeating this doesn't make it true.
> Anyone who had to make the decision, wouldn't
> deliberately choose signed format for character data.
Where compilers give me the option I choose signed chars. It's more
likely
to help me find bugs in my char handling code.
> It's just asking for trouble.
what you call "trouble" I call "help diagnosing a bug"
> (Try creating a histogram of the 256 character codes used in some
> text, you will need the character code to index into an array. It's a lot
> easier with 0 to 255 rather than -128 to 127)
you add an offset on. It's hardly rocket science. chars are not small
ints.
> There should have been a char type
yes
> (unsigned, but that doesn't even need mentioning),
simnce we are arguing about it- yes it does.
> and separate signed/unsigned ultra-short integers, ie.
> byte-sized.
as long as you understand byte is not necessarily 8 bits.
> (All easily added added with typedefs, but in practice, no-one
> bothers.)
I do...
> >> (we just use 'char' as a matter of tradition, and cast to unsigned
> >> char wherever it matters),
>
> > it never matters with character data. I use unsigned char when I'm
> > manipulating external representations (bytes or octets)
>
> >> and for most other uses (where we want a signed byte),
>
> > that is, hardly ever. I'm tempted to say "never" as I don't think
> > I've ever needed tiny little integers. But I can imagine uses
> > for TLIs.
>
> There's a few, such as ensuring your data just fitting into the memory of
> your computer, rather than needing double the memory.
these days I don't normally worry about that
> I haven't done any research but I'm guessing that a big chunk of the 'int'
> variables in my code do only contain values representable in one byte. The
> waste can be usually be ignored, on PCs, but in arrays/arrays of structs and
> so on, it can be significant.
remember that most C interger operators work on ints not chars
so there could be a lot of converting back and forth. It *might*
even make the program larger to compress the data like this!
I don't believe I know a single person who I can be sure is part of
the 'most' you refer to. My girlfriend might be a counter-example,
but she refused to stick her flag firmly in either camp.
> Something is not "normal" just because it's in the majority, and it
> certainly isn't abnormal just because it's in the minority.
I've heard that before, likewise from an American. I think 'normal'
means more to yanks than to brits/eslers/eflers etc. . To the majority
of English speakers, normal doesn't need to imply anything more than
just being an unexceptional thing, not even majority is required;
there's no reference to 'norms' or anything any more. One might say
it's perfectly standard English nowadays to use it in this diluted
sense. Despite the lack of a standard defining the language.
"Normal" really means at 90 degrees, of course!
We'd better warn the cardiac unit at the local hospital then!
> >> [char] is normally signed, since this is what a majority of the compilers on
> >> a majority of the common architectures do.
>
> >> granted, it is not safe to rely on this, and hence I often use an
> >> explicit signed type if it really matters.
>
> > Then you're using the word "normally" in a manner that's inconsistent
> > with the way I and, I believe, most other people use it.
>
> > Something is not "normal" just because it's in the majority, and it
> > certainly isn't abnormal just because it's in the minority.
>
> normal == "common as to the point of not typically being considered".
>
> for example, "in the US people normally speak English".
> this does not, for example, deny the existence of Mexicans, only that people
> can "normally" disregard the possibility of them using Spanish when engaging
> in conversations with people.
not a very good example given the number of Spainish speakers.
Not everyone who speaks Spainish is Mexican or of Mexican origin.
> or, maybe, people normally speak Japanese in Japan.
a better example
<snip>
<snip>
> To the
> majority of English speakers, normal doesn't need to imply anything
> more than just being an unexceptional thing, not even majority is
> required;
In the UK, blue eyes are normal. A trifle unusual, perhaps, but still
normal. Brown eyes are normal, too. So are greenish-grey eyes, and
various other mucky and hard-to-describe colours. But pink eyes are
definitely abnormal. AFAIK I have never met a person with pink eyes,
but they do happen.
> there's no reference to 'norms' or anything any more. One
> might say it's perfectly standard English nowadays to use it in this
> diluted sense.
In fact, it's quite normal to use "normal" in this way...
> Despite the lack of a standard defining the language.
...despite the lack of a norm.
> "Normal" really means at 90 degrees, of course!
Right, but that's an abnormal normal, normally used only by abnormal
people (mathematicians, engineers, programmers...).
Good examples. I was going to use "in the 21st century, in civilised
western society (NTS falacy covered), it's perfectly normal to be
homosexual". Eyes works even better. Richard, meet Anna sitting next
to me, she has pink eyes. (And says 'grrr!'.)
>> there's no reference to 'norms' or anything any more. One
>> might say it's perfectly standard English nowadays to use it in this
>> diluted sense.
>
> In fact, it's quite normal to use "normal" in this way...
>
>> Despite the lack of a standard defining the language.
>
> ...despite the lack of a norm.
>
>> "Normal" really means at 90 degrees, of course!
>
> Right, but that's an abnormal normal, normally used only by abnormal
> people (mathematicians, engineers, programmers...).
I think the thread's headed off at a tangent now.
>>> why should ASCII be unsigned? ASCII fits in 7 bits. Even extended
>>> ASCIIs still manage fine as signed values.
>>
>> It would be perverse.
>
> ?
> Just repeating this doesn't make it true.
(If you entered a skyscraper and found floor 0 was half-way up the building,
that wouldn't seem odd?)
>> There's a few, such as ensuring your data just fitting into the
>> memory of your computer, rather than needing double the memory.
>
> these days I don't normally worry about that
Because your data happens to fit.
>> I haven't done any research but I'm guessing that a big chunk of the
>> 'int' variables in my code do only contain values representable in
>> one byte. The waste can be usually be ignored, on PCs, but in
>> arrays/arrays of structs and so on, it can be significant.
>
> remember that most C interger operators work on ints not chars
> so there could be a lot of converting back and forth. It *might*
> even make the program larger to compress the data like this!
That would be programmer choice. If the program is likely to use a lot of
memory, it's going to be due to data not code.
It might make the program a little slower, but again that is a choice (I'm
assuming C compilers avoid widening where it is not strictly necessary). But
usually less, and more localised, memory use is better.
--
Bartc
It would seem quite usual, aside from the odd numbering.
I haven't checked to be absolutely sure, but I don't think
I've ever been in a skyscraper whose ground floor was its
bottommost floor, and I doubt that you have, either.
--
Eric Sosman
eso...@ieee-dot-org.invalid
I see I'm not going to change anyone's point of view here, but let's try
this:
Nick Keighley wrote:
> you add an offset on. It's hardly rocket science.
What about if arrays in C were INT_MIN-based instead of 0-based? Why would
this be a problem?
After all, anyone who wants 0-based arrays simply has to add an INT_MIN (or
is it -INT_MIN) offset to each index operation. It isn't 'rocket science'!
--
Bartc
Whether you change anyone's opinion about the merits of the
way C treats `char' is irrelevant. The question is "Can a
character be negative," the answer is "Yes," and no amount of
"I'm unhappy with that" is going to change matters. You may
think it would be nicer if the Sun rose in the South, and you
might even convince others you're right, but it will keep on
rising in the East anyhow, despite your excellent arguments.
> but let's try
> this:
>
> Nick Keighley wrote:
>> you add an offset on. It's hardly rocket science.
>
> What about if arrays in C were INT_MIN-based instead of 0-based? Why would
> this be a problem?
>
> After all, anyone who wants 0-based arrays simply has to add an INT_MIN (or
> is it -INT_MIN) offset to each index operation. It isn't 'rocket science'!
I've seen people implement Heapsort in C by sticking an
unused [0] element at the start of the array, so maybe you're
right that some people find array indexing difficult to fathom.
Personally, I think such people are in the wrong line of work.
--
Eric Sosman
eso...@ieee-dot-org.invalid
> >>> why should ASCII be unsigned? ASCII fits in 7 bits. Even extended
> >>> ASCIIs still manage fine as signed values.
>
> >> It would be perverse.
>
> > ?
> > Just repeating this doesn't make it true.
>
> (If you entered a skyscraper and found floor 0 was half-way up the building,
> that wouldn't seem odd?)
US building don't even a floor zero. And in the town where I grew up
it was not uncommon for there to be floors below the entry point
(steep
hills).
I'm dubious that analogies about buildings mean anything when talking
about representations of characters!
> >> There's a few, such as ensuring your data just fitting into the
> >> memory of your computer, rather than needing double the memory.
>
> > these days I don't normally worry about that
>
> Because your data happens to fit.
oh true. I'm sure there are applications where this really matters
but I'm not in such a domain.
> >> I haven't done any research but I'm guessing that a big chunk of the
> >> 'int' variables in my code do only contain values representable in
> >> one byte. The waste can be usually be ignored, on PCs, but in
> >> arrays/arrays of structs and so on, it can be significant.
>
> > remember that most C interger operators work on ints not chars
> > so there could be a lot of converting back and forth. It *might*
> > even make the program larger to compress the data like this!
>
> That would be programmer choice.
quite true
> If the program is likely to use a lot of
> memory, it's going to be due to data not code.
if you're talking toasters (are there really toasters with software
in them?) there might be tight constraints on both. I know someone
who
knows someone who wrote software to close the door on a car.
> It might make the program a little slower, but again that is a choice (I'm
> assuming C compilers avoid widening where it is not strictly necessary). But
> usually less, and more localised, memory use is better.
I suspect its pretty rare. I'm just saying there's a cost to
compression
as well
> >> [... concerning the "perversity" of CHAR_MIN != 0 ...]
>
> >> (If you entered a skyscraper and found floor 0 was half-way up the
> >> building, that wouldn't seem odd?)
>
> > It would seem quite usual, aside from the odd numbering.
> > I haven't checked to be absolutely sure, but I don't think
> > I've ever been in a skyscraper whose ground floor was its
> > bottommost floor, and I doubt that you have, either.
>
> I see I'm not going to change anyone's point of view here, but let's try
> this:
I was youer certainty that set me off
:-)
> Nick Keighley wrote:
> > you add an offset on. It's hardly rocket science.
>
> What about if arrays in C were INT_MIN-based instead of 0-based? Why would
> this be a problem?
>
> After all, anyone who wants 0-based arrays simply has to add an INT_MIN (or
> is it -INT_MIN) offset to each index operation. It isn't 'rocket science'!
I grew up with languages where the lower bound was anything you
wanted.
I've long ago got used to the fact I sometimes have to bash to fit the
real
world into C's rather odd limitations. *I* think all arrays shoulds
start
from one. So there.
There are: http://www.embeddedarm.com/software/arm-netbsd-toaster.php
<snip>
Another classic example of "do as I day, not as I do" from two of the
self-appointed topicality police.
Pair of hypocrites.
It simplifies the code greatly, and even works as a perfect place to
build a new entry before inserting it, for example. I think your
ability to comprehend others' code and designs must be lacking.
It is perfectly clear that topicality is for other people.
>Pair of hypocrites.
Of course.
But to be fair, Kiki's position has always been just ever so slightly
kinder and gentler than Dicky's. Kiki has maintained that the
topicality shit is actually for the benefit of the newbie. That he (the
newbie) will get better answers elsewhere.
The is, in fact, true, since CLC dispenses nothing but BS, bullying, and
character assasination. But that doesn't mean that it *should* be true.
Or that the fact that it is is anything for anyone to be proud of.
It does not simplify the code greatly to use zero-based indexing
in Heapsort, not for any definition of "greatly" that stands scrutiny.
I stand by my assertion that someone who has "great" difficulty with
a dead-easy change of variable in an array index should find a career
that doesn't require him to index arrays.
As for my comprehension skills -- well, you could be right; it
is impossible by definition for me to argue the point.
> Eric Sosman <eso...@ieee-dot-org.invalid> writes:
>> I've seen people implement Heapsort in C by sticking an
>> unused [0] element at the start of the array, so maybe you're
>> right that some people find array indexing difficult to fathom.
>> Personally, I think such people are in the wrong line of work.
>
> It simplifies the code greatly, and even works as a perfect place to
> build a new entry before inserting it, for example. I think your
> ability to comprehend others' code and designs must be lacking.
If it's used for building a new entry before inserting, then it
isn't "unused." I don't think you're disagreeing as much as you
think you are.
--
Lowell Gilbert, embedded/networking software engineer
http://be-well.ilk.org/~lowell/
Getting back to the topic from the subject line, I think
Kenny McCormack and Antoninus Twink are existence proofs
that a character *can* be negative.
Har har. Good one. LOL.
Why did you choose to single out Kenny and me rather than any of the
scuzzbuckets engaged in a nasty, personal, long-running bullying
campaign against Jacob? - Heathfield, Mackintyre, Rosenau, "Teapot",
Carmody to name just a few.
Probably because I've seen nothing to indicate any kind of "bullying
campaign", just comments and/or criticism he doesn't always deal with
from an engineering mindset. :)
-s
--
Copyright 2009, all wrongs reversed. Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
It's unused in the heap. The heap is defined by the heap property,
and [0] doesn't have that property.
> On 2009-10-01, Antoninus Twink <nos...@nospam.invalid> wrote:
>> On 1 Oct 2009 at 19:44, Lowell Gilbert wrote:
>>> Getting back to the topic from the subject line, I think Kenny
>>> McCormack and Antoninus Twink are existence proofs that a
>>> character *can* be negative.
>>
>> Why did you choose to single out Kenny and me rather than any of
>> the scuzzbuckets engaged in a nasty, personal, long-running
>> bullying campaign against Jacob? - Heathfield, Mackintyre, Rosenau,
>> "Teapot", Carmody to name just a few.
>
> Probably because I've seen nothing to indicate any kind of "bullying
> campaign", just comments and/or criticism he doesn't always deal
> with
> from an engineering mindset. :)
And possibly also because half the people named by the troll (i.e.
myself, Phil Carmody, and Mark McIntyre) make a significant positive
contribution to this newsgroup, unlike either Antoninus Twink or
Kenny McCormack, who seem to be content with trying (and, if I recall
correctly, inevitably failing) to bite the ankles of those who
actually post useful C stuff.
So - back to C. Good joke, though, Lowell...
The heap property is that each node's key is >= those of
its children, if it has any. (Or <=, if you're building a heap
the other way around.) Another important feature of a heap is
that there is a simple way to calculate the indices of the
children from the index of the parent, and to calculate the
index of a parent from the index of a child. The calculation
is simple, no matter where the heap indices start.
But I sense you are unconvinced. All right, let's go through
the transformation; maybe it will be instructive.
In the traditional description of a heap, we number the nodes
from 1 to N and adopt the convention that the children of node
number j are the nodes with numbers 2*j and 2*j+1. If we store
these in a 1-based array, so the indices match the node numbers,
we say that the node at index [j] has children at [2*j] and [2*j+1].
The parent of the node numbered j is the node numbered floor(j/2);
with 1-based indexing we navigate from a child at [j] to a parent
at [floor(j/2)]. Agreed?
Okay: Let's put this heap into a zero-based array. We can
begin by simply storing the node whose number is j at the array
index [j-1], which we'll call index [k] for expository purposes:
The node whose number is j gets stored at index [k], with k==j-1.
The node at index [k] has node number j, with j=k+1. With me?
Now to the parent->child calculations. If we've got a parent
at index [k], its node number is j=k+1. The node numbers of its
children are 2*j==2*(k+1)==2*k+2 and 2*j+1==2*(k+1)+1==2*k+3. We
then subtract 1 to get the indices where those node numbers are
stored; they are at [2*k+1] and [2*k+2]. A parent at index [k]
has children at [2*k+1] and [2*k+2]. Clear?
Finally, the child->parent calculation. If we've got a child
at index [k], its node number is j=k+1. The node number of its
parent is floor(j/2)==floor((k+1)/2). The node with that number
is stored at index [floor(j/2)-1], which is [floor(j/2-1)], which
is [floor((k+1)/2-1)] which is [floor((k-1)/2)]. A child at index
[k] has a parent at index [floor((k-1)/2)]. Still clear?
So what's the outcome? In the original 1-based heap we had
(assuming integer division for array indices)
parent [j] -> child [2*j]
parent [j] -> child [2*j+1]
child [j] -> parent [j/2]
In the 0-based heap we have
parent [k] -> child [2*k+1]
parent [k] -> child [2*k+2]
child [k] -> parent [(k-1)/2]
Is the code for the 1-based heap simpler? Yes, by one + operator
and one - operator. Is the code "greatly" simpler? Yes, for
anybody who thinks one + and one - amount to a "great" complexity.
A person who boggles at that trivial level of complication will very
likely have a hard time as a computer programmer.
(Another, and possibly simpler way to figure out the navigation
is to imagine using the illegal ptr=array-1 hack and then formulate
the heap in terms of 1-based indexing with ptr. Return to legality
by substituting array-1 for each appearance of ptr in the solution,
then simplify the indices. I didn't use that route, because people's
alarms might have gone off as soon as they saw the hack, and they might
have thought the solution itself was somehow contaminated by it.
Hence the somewhat longer "change of variable" construction shown.)
Extra credit: Imagine yourself using a language like Pascal,
where arrays can have any index ranges you like. Figure out the
parent->child and child->parent navigation for a heap in an array
whose indices start at [-42]. Comment on the complexity of your
solution.
> So what's the outcome? In the original 1-based heap we had
> (assuming integer division for array indices)
>
> parent [j] -> child [2*j]
> parent [j] -> child [2*j+1]
> child [j] -> parent [j/2]
>
> In the 0-based heap we have
>
> parent [k] -> child [2*k+1]
> parent [k] -> child [2*k+2]
> child [k] -> parent [(k-1)/2]
>
> Is the code for the 1-based heap simpler? Yes, by one + operator
> and one - operator. Is the code "greatly" simpler? Yes, for
> anybody who thinks one + and one - amount to a "great" complexity.
> A person who boggles at that trivial level of complication will very
> likely have a hard time as a computer programmer.
It's not a question of one scheme using a extra symbol or two more than
another. Some people think better 1-based, and others 0-based; if that extra
symbol is in the wrong place then it's more than trivial.
And if an algorithm is being adapted from elsewhere, it's best to stick to
it's original array base, if you want it to still work.
--
Bartc
Okay. Given a C array, zero-based by its nature, how would
you apply the Heapsort algorithm to it if you insist on the 1-based
formulation? The only route I can see is to malloc() (or in C99,
VLA) an array one slot larger, copy the original data into the new
array with a plus-one offset, Heapsort the tail of the new array,
copy the sorted data back, and (if not VLA-ed) free() the scratch
array.
Well, no, that's not the "only" way. You might Heapsort all
the elements except the [0], then do binary search to find where
the [0] element actually belongs, then slide everything over and
stuff it into place. (You can do this without additional space
to store that [0] element if you use the triple-reverse trick.)
Of course, this hybrid is not precisely "Heapsort" any more, but
"Mostly Heapsort, with an admixture of other stuff."
Please comment on the relative complexity of these solutions
in comparison to the index transformation I've shown. For the
first, be sure to specify how to proceed if malloc() returns NULL
(or if the array is too big for a VLA).
--
Eric Sosman
eso...@ieee-dot-org.invalid
Agreed.
> Okay: Let's put this heap into a zero-based array. We can
> begin by simply storing the node whose number is j at the array
> index [j-1], which we'll call index [k] for expository purposes:
> The node whose number is j gets stored at index [k], with k==j-1.
> The node at index [k] has node number j, with j=k+1. With me?
With you.
> Now to the parent->child calculations. If we've got a parent
> at index [k], its node number is j=k+1. The node numbers of its
> children are 2*j==2*(k+1)==2*k+2 and 2*j+1==2*(k+1)+1==2*k+3. We
> then subtract 1 to get the indices where those node numbers are
> stored; they are at [2*k+1] and [2*k+2]. A parent at index [k]
> has children at [2*k+1] and [2*k+2]. Clear?
Clear.
> Finally, the child->parent calculation. If we've got a child
> at index [k], its node number is j=k+1. The node number of its
> parent is floor(j/2)==floor((k+1)/2). The node with that number
> is stored at index [floor(j/2)-1], which is [floor(j/2-1)], which
> is [floor((k+1)/2-1)] which is [floor((k-1)/2)]. A child at index
> [k] has a parent at index [floor((k-1)/2)]. Still clear?
Still clear.
Having said that, you may have sneaked some deliberate mistakes into
the above in order to trick me, in which case haha, it worked, as I
didn't actually read them, as I already understand perfectly well how
a heap works. Quite what gave you the impression otherwise, I know not.
> So what's the outcome? In the original 1-based heap we had
> (assuming integer division for array indices)
>
> parent [j] -> child [2*j]
> parent [j] -> child [2*j+1]
> child [j] -> parent [j/2]
>
> In the 0-based heap we have
>
> parent [k] -> child [2*k+1]
> parent [k] -> child [2*k+2]
> child [k] -> parent [(k-1)/2]
>
> Is the code for the 1-based heap simpler? Yes
Right, we're in perfect agreement up to here.
>, by one + operator
> and one - operator. Is the code "greatly" simpler? Yes, for
> anybody who thinks one + and one - amount to a "great" complexity.
You've doubled the length of two thirds of the expressions from
being a single operator to being two operators. So most of the time
it's twice as verbose.
> A person who boggles at that trivial level of complication will very
> likely have a hard time as a computer programmer.
Fortunately I don't boggle at the complication, I boggle at the
programmer who would chose that lack of simplicity.
I likewise boggle why people do the following, which I saw in the
linux kernel, for example, only yesterday:
t1*p=getp();
foo(p->q->r1);
bar(p->q->r2);
baz(p->q->r3);
//...
quux(p->q->rN);
rather than
t1*p=getp();
t2*q=p->q;
foo(q->r1);
bar(q->r2);
baz(q->r3);
//...
quux(q->rN);
Yet I have no problem understanding the complexity of pointers to
pointers, even when N reaches exceeds the number of fingers I possess.
Similarly, I boggle at why people do the following, which I again saw in
some never-gonna-be-accepted-into-the-mainline-ever module for the linux
kernel only yesterday:
void*p=getp();
((t1*)p)->q1=r1;
((t1*)p)->q2=r2;
((t1*)p)->q3=r3;
//...
((t1*)p)->qN=rN;
rather than the following:
void*p=getp();
t1*pt=(t1*)p;
pt->q1=r1;
pt->q2=r2;
pt->q3=r3;
//...
pt->qN=rN;
Despite my ability to understand the concept of casting pointers, even
when N reaches the heady heights of, oooh, I think I saw about 40 in one
function.
In each of those, I do genuinely prefer the expression with 1 operator
than the expression with 2 operator. Really. I believe it's tangibly
less complicated at the source code level. Yes, it's only one operator,
maybe I'm just sensitive to wasted characters.
In summary - p>>1 is a thing of beauty, (p-1)/2 is gross.
> (Another, and possibly simpler way to figure out the navigation
> is to imagine using the illegal ptr=array-1 hack and then formulate
> the heap in terms of 1-based indexing with ptr. Return to legality
> by substituting array-1 for each appearance of ptr in the solution,
> then simplify the indices. I didn't use that route, because people's
> alarms might have gone off as soon as they saw the hack, and they might
> have thought the solution itself was somehow contaminated by it.
> Hence the somewhat longer "change of variable" construction shown.)
>
> Extra credit: Imagine yourself using a language like Pascal,
You sadist!
> where arrays can have any index ranges you like. Figure out the
> parent->child and child->parent navigation for a heap in an array
> whose indices start at [-42]. Comment on the complexity of your
> solution.
Computational complexity identical, but entirely lacking elegance.
Funnily enough, I'm going to venture into user-space today or next
week and have to use [2]-based addressing for something, gack.
For reference, I think better 0-based.
Floor-wise, the 1st floor is 1 floor above the ground floor.
And I'm with Bourbaki regarding the natural numbers.
Once can reduce to:
#define xxx_ctype_invoke(f, a) f((unsigned int)a)
#define xislower(a) xxx_ctype_invoke(islower, a)
#define xisupper(a) xxx_ctype_invoke(isupper, a)
/* ... */
?
>> another. Some people think better 1-based, and others 0-based; if that
>> extra symbol is in the wrong place then it's more than trivial.
>>
>> And if an algorithm is being adapted from elsewhere, it's best to stick
>> to it's original array base, if you want it to still work.
>
> Okay. Given a C array, zero-based by its nature, how would
> you apply the Heapsort algorithm to it if you insist on the 1-based
> formulation? The only route I can see is to malloc() (or in C99,
> VLA) an array one slot larger, copy the original data into the new
> array with a plus-one offset, Heapsort the tail of the new array,
> copy the sorted data back, and (if not VLA-ed) free() the scratch
> array.
You mean, given an array which is already indexed from [0], to apply a
1-based algorithm to it in an efficient manner?
What about creating a pointer to the array, pointing at the [-1] element? Or
if it's already a pointer, to temporary subtract 1 from it:
int array[]={1,2,3,4,5,6,7,8,9,10};
int *onearray;
int i;
onearray = array-1;
for (i=1; i<=10; ++i) printf("%2d: %d\n",i,onearray[i]);
In practice you'd just subtract 1 from any array passed to a function
implementing such an algorithm.
--
Bartc
(My analogy was not quite right. A more apt one is a building with 256
floors, numbered normally from 0 (European ground floor) up to 127. Then the
next floor is at -128! Then counting up towards floor -1, the 256th floor.
This is workable (although a bit awkward, for example, when you want to know
whether one floor is above or below another), but undeniably "perverse".)
--
Bartc
> You mean, given an array which is already indexed from [0], to apply a
> 1-based algorithm to it in an efficient manner?
>
> What about creating a pointer to the array, pointing at the [-1] element?
That would provoke undefined behaviour.
> Or if it's already a pointer, to temporary subtract 1 from it:
That might provoke undefined behaviour, depending on where the
poitner pointed.
> int array[]={1,2,3,4,5,6,7,8,9,10};
> int *onearray;
> int i;
>
> onearray = array-1;
Undefined behaviour. BOOM. Or, more worryingly, no boom /today/.
Attempting to form a pointer to a place that does not exist before
an array is undefined, just as forming a pointer more than 1 past
the upper end is undefined, just as dereferencing a pointer one
past the upper end of an array is undefined.
--
"My name is Hannelore Ellicott-Chatham. I *end messes*." Hannelore,
/Questionable Content/
Hewlett-Packard Limited registered office: Cain Road, Bracknell,
registered no: 690597 England Berks RG12 1HN
How does that help? It may just be insufficient coffee, but i can't
see how converting a signed char to a large positive number meets the
requirements imposed by the ctype functions. In fact, it seems to add
another problem if the large unsigned value can't be represented as an
int. That's not UB, but since the conversion can do anything
including raising a signal it is almost as troublesome as UB.
--
Ben.
Yes, exactly. Can you use Heapsort on a C array, or must
you simply write off the algorithm as un-C-worthy because
"some people think better 1-based?"
> What about creating a pointer to the array, pointing at the [-1]
> element? Or if it's already a pointer, to temporary subtract 1 from it:
That's "the illegal ptr=array-1 hack" I mentioned in an
earlier message. It is the subject of Question 6.17 in the
comp.lang.c Frequently Asked Questions (FAQ) list, found at
<http://www.c-faq.com/>. Perhaps it's time you renewed your
acquaintance with the FAQ.
--
Eric Sosman
eso...@ieee-dot-org.invalid
OK. You have a deadline to meet, and you are 100% certain your architecture
has no problem with manipulating such pointers, provided they dereferenced
with the correct offset or index.
What would you do? Make a quick workaround such as my suggestion (assuming
it worked), start reallocating and copying large arrays (and hoping no
pointers existed to part of the original array), or start rewriting someone
else's complicated algorithm?
--
Bartc
I am less duplicitous than you suppose, and perhaps than
you deserve.
> [...] I already understand perfectly well how
> a heap works. Quite what gave you the impression otherwise, I know not.
Your own words gave me that impression: "The heap is defined
by the heap property, and [0] doesn't have that property," which
suggests to me that you have at best a muddled idea of how a
heap works.
> [... concerning the "great" complication of inserting one
> addition and one subtraction ...]
> Fortunately I don't boggle at the complication, I boggle at the
> programmer who would chose that lack of simplicity.
I boggle at the programmer who would reject an algorithm
simply because the numeric values of array indices are not the
same as the numeric values of the subscripts in a textbook.
See also "The Fox and the Grapes."
>> Extra credit: Imagine yourself using a language like Pascal,
>
> You sadist!
>
>> where arrays can have any index ranges you like. Figure out the
>> parent->child and child->parent navigation for a heap in an array
>> whose indices start at [-42]. Comment on the complexity of your
>> solution.
>
> Computational complexity identical, but entirely lacking elegance.
... so if somebody handed you a minus-42-based array your
response would be "I refuse to Heapsort it because it's not
elegant enough for me?"
Algorithms are algorithms. Implementations are realizations
of them. There is no compelling reason to transliterate notation
between the two domains when the notation creates inconveniences.
If the algorithm is described in terms of a_sub_j and you are
implementing it with array references jimjam[k], there is no
fundamental reason to insist on k==j, just as there is no reason
to rename the array from jimjam[] to a[]. Notation is not Fate.
--
Eric Sosman
eso...@ieee-dot-org.invalid
> Richard Heathfield <r...@see.sig.invalid> writes:
>> In <87vdizm...@kilospaz.fatphil.org>, Phil Carmody wrote:
>>
>> <snip>
>>
>>> To the
>>> majority of English speakers, normal doesn't need to imply anything
>>> more than just being an unexceptional thing, not even majority is
>>> required;
>>
>> In the UK, blue eyes are normal. A trifle unusual, perhaps, but still
>> normal. Brown eyes are normal, too. So are greenish-grey eyes, and
>> various other mucky and hard-to-describe colours. But pink eyes are
>> definitely abnormal. AFAIK I have never met a person with pink eyes,
>> but they do happen.
>
> Good examples. I was going to use "in the 21st century, in civilised
> western society (NTS falacy covered), it's perfectly normal to be
> homosexual". Eyes works even better. Richard, meet Anna sitting next
> to me, she has pink eyes. (And says 'grrr!'.)
>
It is if you're a complete idiot. "Normal" in the context normally used
FREQUENTLY means the "majority".
e.g It is normal for people to look left and right when crossing the
road.
I find it disturbing that you two are rubbing up against each other and
gloating about another show of outstanding pedantic dickness.
> On 2009-10-01, Antoninus Twink <nos...@nospam.invalid> wrote:
>> On 1 Oct 2009 at 19:44, Lowell Gilbert wrote:
>>> Getting back to the topic from the subject line, I think Kenny
>>> McCormack and Antoninus Twink are existence proofs that a character
>>> *can* be negative.
>>
>> Why did you choose to single out Kenny and me rather than any of the
>> scuzzbuckets engaged in a nasty, personal, long-running bullying
>> campaign against Jacob? - Heathfield, Mackintyre, Rosenau, "Teapot",
>> Carmody to name just a few.
>
> Probably because I've seen nothing to indicate any kind of "bullying
> campaign", just comments and/or criticism he doesn't always deal with
> from an engineering mindset. :)
Then you need to learn to read or actually look at the threads. This
newsgroup is quite unique in technical circles : nasty, pedantic god
botherer types who like nothing more than to bully and belittle others
who challenge for their self perceived dominant role in the group.
--
"Avoid hyperbole at all costs, its the most destructive argument on
the planet" - Mark McIntyre in comp.lang.c
> Richard Heathfield <r...@see.sig.invalid> writes:
>> In <ha1imu$j51$1...@news.albasani.net>, BGB / cr88192 wrote:
>> <snip>
>>>> "BGB / cr88192" <cr8...@hotmail.com> writes:
>>>>>
>>>>> [char] is normally signed, since this is what a majority of the
>>>>> compilers on a majority of the common architectures do.
>>
>> <snip>
>>
>>> normal == "common as to the point of not typically being
>>> considered".
>>
>> IBM mainframes are so rare as to be not typically considered? That
>> will come as a huge surprise to the very many C programmers for whom
>> they are a daily reality.
>
> We'd better warn the cardiac unit at the local hospital then!
>
> Phil
What's with the "we"? Have we a new tag team in Carmody and Heathfield?
Kind of creepy.
>> You mean, given an array which is already indexed from [0], to apply a
>> 1-based algorithm to it in an efficient manner?
>
> Yes, exactly. Can you use Heapsort on a C array, or must
> you simply write off the algorithm as un-C-worthy because
> "some people think better 1-based?"
I'm not familiar with heapsort (for that matter, I rarely use any sort of
sort, if that makes sense).
Ideally these algorithms need to be carefully converted to 0-based, if one
has the time. But I've found these base-conversions are error-prone.
>> What about creating a pointer to the array, pointing at the [-1] element?
>> Or if it's already a pointer, to temporary subtract 1 from it:
>
> That's "the illegal ptr=array-1 hack" I mentioned in an
> earlier message.
Oh.
It is the subject of Question 6.17 in the
> comp.lang.c Frequently Asked Questions (FAQ) list, found at
> <http://www.c-faq.com/>. Perhaps it's time you renewed your
> acquaintance with the FAQ.
OK. It's illegal because the Standard can't guarantee it will work on every
machine. So it was not a good idea to suggest it on usenet but presumably an
individual programmer can choose to ignore the restriction for his machine,
if that's balanced by some obvious benefit (eg. less work).
(BTW I agree it's a bit of a hack. I once had to implement a language
feature for allocated arrays where the lower-bound could be anything. The
choice was for the array pointer to always point to a phantom element 0, or
to the first element but at a cost of an extra offset when indexing. I chose
the latter; I thought pointers pointing into the middle of other variables
was an untidy way of doing things)
--
Bartc
I would adopt the subscript-to-index mapping described in
excruciating detail up-thread. I'd encourage anyone who finds
the mapping "complicated" to find another line of work. Seriously.
If you can't handle this stuff, and handle it pretty easily,
you're not cut out to be a programmer.
Thinks: Didn't this sub-thread get started because you, bartc,
insisted that `char' should be unsigned so a `char'-indexed array
could begin at [0]? And now you're insisting that heaps should
only be built in arrays that start at [1]? All to avoid a simple
plus or minus in an index expression? What other tasks do you
imagine "force" you to use one index origin or another?
No, when you've reached the 127'th floor your only way to go
higher is to use a balloon. If you're on the 127'th and you want
to get to the -128'th, go down.
Yes, and sometimes doesn't. Making the term ambiguous, and leaving the
reader free choice of how to interpret it.
> e.g It is normal for people to look left and right when crossing the
> road.
I wouldn't consider that to be a "majority" usage. But then, I live in
a small town.
> I find it disturbing that you two are rubbing up against each other and
> gloating about another show of outstanding pedantic dickness.
Well, here you are gloating about your interpretation of words, too. :)
I read quite well, thank you. I do admit to not being very responsive
to some kinds of "status" cues -- but that's caused me to study how people
perceive them, and learn that many people appear to misinterpret some
kinds of tone-free communication as though it were arrogant.
> This
> newsgroup is quite unique in technical circles : nasty, pedantic god
> botherer types who like nothing more than to bully and belittle others
> who challenge for their self perceived dominant role in the group.
This language reminds me a great deal of language I've seen from other people
who are so extremely status-aware that they aren't aware that their status
awareness is not an external reality, but an internally-generated
interpretation of the world.
In short: I doubt that Richard Heathfield is thinking in terms of a "dominant
role". He comes across as a bit pedantic and focused on technical matters
rather than persons. That may be callous at times, and I certainly know that
I sometimes hurt people by not thinking about them when I'm busy thinking
about interesting technical questions. However, it also pretty much
excludes any concept of a "self-perceived dominant role".
Your theory here is based very much on speculation as to motive, and I would
suggest that you check out your evidence for that speculation carefully. Not
everyone you meet on Usenet has the same emotional or social responses you
do.
Yes - I think there's no longer any doubt that Carmody is the latest
occupant of the spot on the branch behind Heathfield where he can pick
the fleas from the alpha male's coat.
This group is truly a case study in sociology. Heathfield always says
that the way relationships involving him develop is nothing to do with
his personality, but we see exactly the same things time and again with
him!
Eventually there's no way you can believe it's just coincidence. He's
simply a magnet both for fan-boys and for ant-fan-boys.
On the submissive side, there have been a whole string of lackeys
wanting to gain Heathfield's favor by attacking those who can see
through him and obsequiously embracing his eccentric opinions, e.g. that
C99 is evil.
And on the other side, there have now been so many people over so many
years who've had a personality clash with Heathfield that it's just not
believable that he's really Mr Nice Guy, and all these other people are
psychopaths.
It's clear that Heathfield is a damaged individual, and his personal
issues have been the cause of deep problems in this group.
in the US those who speak Spanish are normally Mexicans...
not that there are not other people who speak Spanish (such as Cubans,
people from Spain, ...), but their presence in the US can be largely ignored
given the strong majority held on this front by the Mexicans.
either way, people in the US, Mexican or not, normally speak English (except
in the minority of cases where they are not, as one can note via Univision
and Telemundo and similar, or in grocery stores, ...).
a strong majority rules de-facto.
granted, in this usage, normal is essentially a synonym for common, although
it is stronger as there is a difference in terms of the level of majority
implied in each case.
for example, people are commonly right handed (for example, I am outside
this group in being left-handed), however, computers are normally x86
(32/64) and run Windows.
it is not an axiom, simply a probability...
>
>> or, maybe, people normally speak Japanese in Japan.
>
> a better example
>
maybe.
>
> <snip>
It's not at all obvious to me that it's always "stronger".
Examples:
* Many people would argue that a particular human experience of grief is
"normal", although particular experiences or feelings may occur in only
a few percent of the population.
* There is fierce debate over whether or not it is reasonable to call
homosexuality "normal", but it seems to occur in somewhere between 3 and
10 percent of the population.
* Depending on context, people might describe left-handedness as either
"abnormal" or "perfectly normal".
The real problem, which these cases highlight, is the confusion between
statistical norms and normative norms. Even among the statistical
norms, there is ambiguity about whether you mean "this is the most common
case" or "this case is common enough that it is not cause for special
remark". For instance, left-handedness is not so uncommon that you would
be likely to wonder why someone is left-handed, but blindness is rare
enough that you might well be curious about it -- it's sufficiently atypical
to merit some kind of search for an explanation.
> for example, people are commonly right handed (for example, I am outside
> this group in being left-handed), however, computers are normally x86
> (32/64) and run Windows.
> it is not an axiom, simply a probability...
The vast majority of devices which run code written in C are not x86 devices
running Windows.
>> for example, people are commonly right handed (for example, I am outside
>> this group in being left-handed), however, computers are normally x86
>> (32/64) and run Windows.
>
>> it is not an axiom, simply a probability...
>
> The vast majority of devices which run code written in C are not x86
> devices
> running Windows.
By computers he probably meant desktop and laptop PCs, which in the consumer
world still seem to mostly use Windows+x86.
(Ten years ago (when I stopped working), probably 99.9% of my clients used
Windows+x86. The very few with Macs could emulate Windows, but the ones with
Linux were out of luck! You can take it that portability wasn't a high
priority for me.)
--
Bartc
Perhaps!
> (Ten years ago (when I stopped working), probably 99.9% of my clients used
> Windows+x86. The very few with Macs could emulate Windows, but the ones with
> Linux were out of luck! You can take it that portability wasn't a high
> priority for me.)
Yeah.
These days, though... I think Macs are about as common as left-handers, gays,
and other instances of "perfectly normal" statistical abnormalities. :)
What language was that post in? Did it in some way address the
points raised in my post?
Yours curiously,
Phil
--
Your interpretation of the English language is perverse to say the
least.
"I'll have what he's having <slump>" might me closer.
>> [... concerning the "great" complication of inserting one
>> addition and one subtraction ...]
>> Fortunately I don't boggle at the complication, I boggle at the
>> programmer who would chose that lack of simplicity.
>
> I boggle at the programmer who would reject an algorithm
Bzzzt.
Your failure to understand has been expressed painfully clearly;
no point continuing.
Have you ever noticed that Macs are _precisely_ as common as
left-handers and gays... . ;-)
(Or, as I like to call them, the sinister side of society...)
Phil
--
Yes, although oddly, I know gay mac users, left-handed mac users, and
left-handed gays, but no left-handed gay mac users.
There is clearly something wrong.
Maybe there's a parity check?
It guards against accidentally forgetting the calling
requirements of the isXXX functions, which specify
taking a character argument that's been converted to (unsigned char)
(and perhaps thence to (int), but that's done automatically
by the calling conversions). Any (char) is representable as
an (unsigned char).
A disadvantage of the above is that it might do the wrong
thing with an EOF value. But there's no absolutely
foolproof way out of that dilemma.
I know the /intent/ of the code, it was the /effect/ I was having
trouble with.
Given char c; when char is signed and c holds a negative value,
islower(c) is an error. I think we both agree up to this point. I
think we also agree that islower((unsigned char)c) is safe and
correct.
I am having trouble with islower((unsigned int)c). Surely the
unsigned int that results from this cast might not even be
representable as an int (the type given in the prototype for islower).
Even if this conversion from unsigned int to int passes your
portability requirements, the value is very unlikely to be "an int, the
value of which shall be representable as an unsigned char" as required
by the ctype functions.
--
Ben.
Sorry, I completely missed the point of your earlier posting.
Using ((unsigned int)a) rather than ((unsigned char)a), which is
of course what I hallucinated that it said, doesn't lead to a
good result in general. In fact, the most likely result
(considering the number of two's complement systems out there)
will be no different than if the functions were called directly
with no conversion of their arguments (except the conversion
implied by the parameter's type).
Presumably Chris simply made a mistake and meant
to use ((f)( (unsigned char)(a) )) instead.
Yes. It was a typo. Sorry about that.
Should be unsigned. (EBCDIC has standard characters with the high bit
set.)
--
dik t. winter, cwi, science park 123, 1098 xg amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
Yes, that was a typo, noticed and acknowledged several days ago.
--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Not so very long ago the vast majority of desktops at my institute were
running some form of Unix (Irix, Solaris, Linux and whatever). It is only
the last few years that the portion of Windows machines is increasing.
> In article <LOsxm.101046$OO7....@text.news.virginmedia.com> "bartc" <ba...@freeuk.com> writes:
> ...
> > (Ten years ago (when I stopped working), probably 99.9% of my clients used
> > Windows+x86. The very few with Macs could emulate Windows, but the ones with
> > Linux were out of luck! You can take it that portability wasn't a high
> > priority for me.)
>
> Not so very long ago the vast majority of desktops at my institute were
> running some form of Unix (Irix, Solaris, Linux and whatever).
Yeah, but that's the CWI. That's hardly your average business, or even
your average institution. You're talking about people who _know_
computers, here, not about people who merely use them a lot.
Richard