Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

sorting 5million minute data: choose std::list or std::vector

25 views
Skip to first unread message

MM

unread,
Feb 2, 2011, 9:53:07 AM2/2/11
to
I have about 5million rows like these (day, minute, and 4 doubles)

YYYYMMDD, HHMM, double, double, double, double

which I parse from a file and store into

struct one_line {
boost::uint32_t day;
boost::uint32_t minute;
double d1, d2, d3, d4;
};

I may need to sort the container once the data is parsed, sort based off the
day and minute.

Would sorting a std::list<one_line> be faster than std::vector<one_line>?
Is the best to use std::map<Key, Value> where Key is
std::pair<boost::uint32_t ,boost::uint32_t> (day and minute) ?

regards,

Leigh Johnston

unread,
Feb 2, 2011, 10:09:31 AM2/2/11
to

std::list and std::map are both node based containers which involve a
separate allocation for each element whilst std::vector involves a
single allocation for all the elements it contains; this may be a
consideration when you have a large number of elements.

As far as sorting speeds are concerned it can depend on the size of the
element in question (whether swapping std::list node pointers is faster
than value swapping vector elements); for example if the element type
was "int" rather than your struct sorting std::vector could be an order
of magnitude faster than sorting std::list. You should run a test
benchmark to determine what is best.

std::set may be more appropriate than std::map; std::multiset may be
more appropriate than std::set if more than one row can have the same
"day" and "minute" values.

/Leigh

Leigh Johnston

unread,
Feb 2, 2011, 10:31:07 AM2/2/11
to

On VC++ I find sorting a std::vector of your struct to be twice as fast
as sorting a std::list of your struct.

/Leigh

MM

unread,
Feb 2, 2011, 10:51:04 AM2/2/11
to

"Leigh Johnston" <le...@i42.co.uk> wrote in message
news:t9qdnZxybYck5dTQ...@giganews.com...
strange, I guess it depends on how unordered the list was.
both sorts are Nlog(N) , aren't they?
I'd think moving pointers is faster than value moving 4 doubles and 2 ints?

thanks,


Leigh Johnston

unread,
Feb 2, 2011, 10:57:02 AM2/2/11
to

Yes they both have the same complexity but constant factors can come
into play. :)

My test was of 5000000 records of random data (very unordered).

If you keep increasing the size of the struct eventually std::list::sort
should overtake std::sort of std::vector.

/Leigh

Pete Becker

unread,
Feb 2, 2011, 11:38:22 AM2/2/11
to

The structure copies can be done with a single call to memcpy; the
pointer copies have to be done separately.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Juha Nieminen

unread,
Feb 2, 2011, 12:18:25 PM2/2/11
to
MM <finju...@gmail.com> wrote:
> Would sorting a std::list<one_line> be faster than std::vector<one_line>?

Is it really that hard to test it?

Juha Nieminen

unread,
Feb 2, 2011, 12:21:06 PM2/2/11
to
MM <finju...@gmail.com> wrote:
> strange, I guess it depends on how unordered the list was.
> both sorts are Nlog(N) , aren't they?
> I'd think moving pointers is faster than value moving 4 doubles and 2 ints?

A doubly-linked list takes significantly more memory than a vector
(especially for such a relatively small element type), and the elements
of the list will usually end up being significantly more dispersed in
RAM while the elements in a vector will be at contiguous memory locations.
This means that sorting (or even just traversing) the list will usually
cause significantly more cache misses than a vector.

Bo Persson

unread,
Feb 2, 2011, 2:35:44 PM2/2/11
to

Note that you have to update 6 pointers to move a node in a doubly
linked list. :-)

Often std::vector has an edge by being more cache friendly. The
elements are adjacent. And smaller than list nodes.

Bo Persson


Jorgen Grahn

unread,
Feb 3, 2011, 3:24:15 PM2/3/11
to

AOL. Test it and report your results here. We don't really know what
will work best --or if the difference is big enough to matter.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Jorgen Grahn

unread,
Feb 3, 2011, 3:26:13 PM2/3/11
to
On Wed, 2011-02-02, MM wrote:
> I have about 5million rows like these (day, minute, and 4 doubles)
>
> YYYYMMDD, HHMM, double, double, double, double
>
> which I parse from a file and store into
>
> struct one_line {
> boost::uint32_t day;
> boost::uint32_t minute;
> double d1, d2, d3, d4;
> };

Nitpick: don't call HHMM 'minute', or you'll be surprised one day when
you forget that 102 doesn't mean 102 minutes but 62 minutes.

Yannick Tremblay

unread,
Feb 4, 2011, 9:39:22 AM2/4/11
to
In article <slrnikm3r3.1...@frailea.sa.invalid>,

Jorgen Grahn <grahn...@snipabacken.se> wrote:
>On Wed, 2011-02-02, MM wrote:
>> I have about 5million rows like these (day, minute, and 4 doubles)
>>
>> YYYYMMDD, HHMM, double, double, double, double
>>
>> which I parse from a file and store into
>>
>> struct one_line {
>> boost::uint32_t day;
>> boost::uint32_t minute;
>> double d1, d2, d3, d4;
>> };
>
>Nitpick: don't call HHMM 'minute', or you'll be surprised one day when
>you forget that 102 doesn't mean 102 minutes but 62 minutes.

Additional nitpick: don't use bit specified integer unless you *need*
to know the size of the integers in bits. Normal "unsigned integer"
would have been perfectly fine here.

James Kanze

unread,
Feb 4, 2011, 12:08:41 PM2/4/11
to
On Feb 4, 2:39 pm, ytrem...@nyx.net (Yannick Tremblay) wrote:
> In article <slrnikm3r3.15e.grahn+n...@frailea.sa.invalid>,
> Jorgen Grahn <grahn+n...@snipabacken.se> wrote:

Normal int would have been perfectly fine here. On the other
hand, if he's got 5 million of them, short or even signed char
might be more appropriate (although in practice, alignment
requirements will probably mean that he won't gain anything by
using a type smaller than int).

--
James Kanze

Leigh Johnston

unread,
Feb 4, 2011, 12:19:58 PM2/4/11
to

Eschewing unsigned integral types is sometimes irrational. *If* it
makes no sense to have negative days or minutes then why use a signed
integral type?

/Leigh

Juha Nieminen

unread,
Feb 5, 2011, 4:24:12 AM2/5/11
to
Leigh Johnston <le...@i42.co.uk> wrote:
> Eschewing unsigned integral types is sometimes irrational. *If* it
> makes no sense to have negative days or minutes then why use a signed
> integral type?

Because if you are calculating the difference of two dates, you will
have to deal with the wrap-around problem if you are using unsigned
types, usually by explicitly casting the integrals being compared to
signed (which kind of defeats the whole purpose of them being unsigned
in the first place).

There might not be negative days in dates, but there may well be when
calculating differences between dates.

Leigh Johnston

unread,
Feb 5, 2011, 7:57:15 AM2/5/11
to

Which is why is said "*If*" in my reply. On the implementation I use
static_cast<int> has no overhead and unlike some people I don't mind the
verbosity of the C++ style casts.

From my code:

typedef int pixel_pos;
typedef std::size_t pixel_size;
struct dimensions
{
dimensions() : cx(0), cy(0) {}
dimensions(pixel_size aCx, pixel_size aCy) : cx(aCx), cy(aCy) {}
pixel_size cx;
pixel_size cy;
friend dimensions operator*(const dimensions& lhs, std::size_t amount)
{ return dimensions(lhs.cx * amount, lhs.cy * amount); }
};
struct coordinate
{
coordinate() : x(0), y(0) {}
coordinate(pixel_pos aX, pixel_pos aY) : x(aX), y(aY) {}
pixel_pos x;
pixel_pos y;
};
struct rect : coordinate, dimensions
{
rect() {}
rect(const coordinate& aCoordinate, const dimensions& aDimensions) :
coordinate(aCoordinate), dimensions(aDimensions) {}
operator RECT() const
{
RECT ret = { x, y, x + static_cast<pixel_pos>(cx), y +
static_cast<pixel_pos>(cy) };
return ret;
}
};


...
...

bool underMouse = aCursorPosition.x >= iconPos.x &&
aCursorPosition.y >= iconPos.y &&
aCursorPosition.x < iconPos.x + static_cast<pixel_pos>(iconSize.cx)
&& aCursorPosition.y < iconPos.y + static_cast<pixel_pos>(iconSize.cy);


/Leigh

gwowen

unread,
Feb 7, 2011, 7:05:51 AM2/7/11
to
On Feb 4, 5:19 pm, Leigh Johnston <le...@i42.co.uk> wrote:

> Eschewing unsigned integral types is sometimes irrational.  *If* it
> makes no sense to have negative days or minutes then why use a signed
> integral type?

There are sometimes good reasons. Unsigned overflow has defined
semantics; signed does not. If the bare silicon of the target
platform's registers does not follow the defined unsigned semantics,
using unsigned can result in a lot of unnecessary code generation
(checking for overflow and implementing correct behaviour)

I've written code for compilers/platforms for which

int find(char foo,const char arr[],size_t sz){
for(int q=0;q<sz;++sz){
if(arr[q] == foo) return q;
}
return sz;
}

produces *considerably* smaller code than

int find(char foo,const char arr[],size_t sz){
for(unsigned q=0;q<sz;++sz){
if(arr[q] == foo) return q;
}
return sz;
}

Of course, that latter has a larger domain of defined behaviour, but
in reality you may well know that you're never going to hit either
limit.

Leigh Johnston

unread,
Feb 7, 2011, 8:06:57 AM2/7/11
to

Presumably you meant "++q" not "++sz"? In which case on VC++ I get:

00000 33 c0 xor eax, eax
00002 38 14 08 cmp BYTE PTR [eax+ecx], dl
00005 74 0b je SHORT $LN5@find1
00007 40 inc eax
00008 83 f8 06 cmp eax, 6
0000b 72 f5 jb SHORT $LL4@find1
0000d b8 06 00 00 00 mov eax, 6
00012 c3 ret 0

versus

00000 33 c0 xor eax, eax
00002 38 14 08 cmp BYTE PTR [eax+ecx], dl
00005 74 0b je SHORT $LN5@find2
00007 40 inc eax
00008 83 f8 06 cmp eax, 6
0000b 72 f5 jb SHORT $LL4@find2
0000d b8 06 00 00 00 mov eax, 6
00012 c3 ret 0

i.e. no difference between signed and unsigned versions.

Your use of an "int" return type in the second version looks dubious.

/Leigh

gwowen

unread,
Feb 7, 2011, 8:18:55 AM2/7/11
to
On Feb 7, 1:06 pm, Leigh Johnston <le...@i42.co.uk> wrote:

>> I've written code for compilers/platforms for which [snip]
> [Your correct code fix snipped]


> In which case on VC++ I get:

Then I think we can safely say that VC++ was not the platform to which
I was referring can't we? Sheesh, some people are so keen to be
thought of as smarter than everyone else in the whole world, they end
up looking like idiots.

Stick to arguing with Paul, at least you *are* probably smarter than
him.

Leigh Johnston

unread,
Feb 7, 2011, 8:42:22 AM2/7/11
to
On 07/02/2011 13:18, gwowen wrote:
> On Feb 7, 1:06 pm, Leigh Johnston<le...@i42.co.uk> wrote:
>
>>> I've written code for compilers/platforms for which [snip]
>> [Your correct code fix snipped]
>> In which case on VC++ I get:
>
> Then I think we can safely say that VC++ was not the platform to which
> I was referring can't we? Sheesh, some people are so keen to be
> thought of as smarter than everyone else in the whole world, they end
> up looking like idiots.
>

No we can safely assume that there is no difference because your
examples were naive (q being promoted to unsigned in the comparison q<sz
in your "int" version).

If instead we compare the more realistic:

typedef int rubbish_size_type;

rubbish_size_type rubbish_find(char foo,const char
arr[],rubbish_size_type sz){
for(rubbish_size_type q=0;q<sz;++q){


if(arr[q] == foo) return q;
}
return sz;
}

versus

typedef std::size_t sensible_size_type;

sensible_size_type sensible_find(char foo,const char
arr[],sensible_size_type sz){
for(sensible_size_type q=0;q<sz;++q){


if(arr[q] == foo) return q;
}
return sz;
}

we get:

00000 33 c0 xor eax, eax
00002 38 14 08 cmp BYTE PTR [eax+ecx], dl

00005 74 0b je SHORT $LN5@rubbish_fi


00007 40 inc eax
00008 83 f8 06 cmp eax, 6

0000b 7c f5 jl SHORT $LL4@rubbish_fi


0000d b8 06 00 00 00 mov eax, 6
00012 c3 ret 0

versus

00000 33 c0 xor eax, eax
00002 38 14 08 cmp BYTE PTR [eax+ecx], dl

00005 74 0b je SHORT $LN5@sensible_f


00007 40 inc eax
00008 83 f8 06 cmp eax, 6

0000b 72 f5 jb SHORT $LL4@sensible_f


0000d b8 06 00 00 00 mov eax, 6
00012 c3 ret 0

Here the only difference is intel opcode jl versus intel opcode jb which
IICR have the same performance characteristics.

[troll snipped]

/Leigh

gwowen

unread,
Feb 7, 2011, 8:47:50 AM2/7/11
to
On Feb 7, 1:42 pm, Leigh Johnston <le...@i42.co.uk> wrote:

> Here the only difference is intel opcode jl versus intel opcode jb which
> IICR have the same performance characteristics.

I'm not talking about an intel processor, so the effacacy of intel
opcodes is utterly irrelevant.

Leigh Johnston

unread,
Feb 7, 2011, 8:58:00 AM2/7/11
to

It was simply a balanced, pertinent counter-example to your claim of
multiple platforms you have worked on producing *considerably* more code
for the unsigned version. Like it or not the Intel platform is quite
pervasive (not some esoteric platform that nobody cares much about).

Can you produce some assembler backing up your claim (using my version
of the functions)?

/Leigh

gwowen

unread,
Feb 7, 2011, 9:20:42 AM2/7/11
to
On Feb 7, 1:58 pm, Leigh Johnston <le...@i42.co.uk> wrote:
> It was simply a balanced, pertinent counter-example to your claim of
> multiple platforms

If you think "one platform agrees with me" provides a counter
example .

> Like it or not the Intel platform is quite pervasive (not some esoteric platform that nobody cares much about).

I know that. In fact, that was entirely my point. I was talking
about an esoteric platform - specifically a custom DSP Asic who's
default signed integer arithmetic saturated (because on embedded DSPs,
that is almost always what you want signed arithmetic to do - clip
your signal rather than wrap it around).

So every time you do an unsigned addition on that processor, the C
standard insists that you check for overflow and correct the results
if overflow is about to occur. If you use signed arithmetic, the
compiler just loads the registers and calls the appropriate
instruction. (In fact, if memory recalls, unsigned arithmetic was done
using a subroutine, signed arithmetic was a single opcode, although I
may have misremembered that).

As to specific cases where the compiler can use the "overflow is
undefined" to optimize, with assembler generated, your attention is
drawn to http://www.airs.com/blog/archives/120

Leigh Johnston

unread,
Feb 7, 2011, 9:42:42 AM2/7/11
to

The implementation specific behaviour of some esoteric platform (or any
specific platform for that matter) should not necessarily direct us as
to what constitutes good C++ coding practice.

Unsigned integral types exist and eschewing them is *usually* irrational
but *not always* (as is the case with your esoteric platform). For
example std::size_t is an unsigned integral type and is quite pervasive
in the C++ Standard Library (see std::allocator).

/Leigh

gwowen

unread,
Feb 7, 2011, 10:09:22 AM2/7/11
to
On Feb 7, 2:42 pm, Leigh Johnston <le...@i42.co.uk> wrote:

> The implementation specific behaviour of some esoteric platform (or any
> specific platform for that matter) should not necessarily direct us as
> to what constitutes good C++ coding practice.
>

Given that all I originally said was "There are sometimes good
reasons" -- the thing which you claimed to have disproved with a
counterexample [not sure how you disprove a "sometimes" with a
counterexample, but hey ho] -- its good to see you've come around.

Unsigned integral types - I use them a lot, they're handy, especially
when I'm writing code for Intel and other general purpose processors
[though I bet there are some GPUs that work like DSPs for similar
reasons]. Eschewing (good word, by the way) them without a good reason
is pretty silly, but as I originally -- and correctly, as you now
agree -- said *THERE ARE SOMETIMES GOOD REASONS*.

Leigh Johnston

unread,
Feb 7, 2011, 10:47:54 AM2/7/11
to
On 07/02/2011 15:09, gwowen wrote:

My original reply stated more or less what you claim I *now* agree with:

"Eschewing unsigned integral types is sometimes irrational."

The only difference being that you are using the word "silly" and I am
using the stronger word "irrational". :)

/Leigh

gwowen

unread,
Feb 7, 2011, 11:50:27 AM2/7/11
to
On Feb 7, 3:47 pm, Leigh Johnston <le...@i42.co.uk> wrote:

> My original reply stated more or less what you claim I *now* agree with:

So what was your "counterexample" meant to demonstrate?

Leigh Johnston

unread,
Feb 7, 2011, 11:57:01 AM2/7/11
to

It was simply to add some balance to your initial reply in which you
mention compilers/platforms (plural) producing *considerably* smaller
code when not using unsigned which I initially viewed as argument.

/Leigh

Juha Nieminen

unread,
Feb 7, 2011, 12:01:11 PM2/7/11
to
Leigh Johnston <le...@i42.co.uk> wrote:
> Which is why is said "*If*" in my reply. On the implementation I use
> static_cast<int> has no overhead and unlike some people I don't mind the
> verbosity of the C++ style casts.

It's not about verbosity. It's about clarity and the maning of your
code. If you say that a variable is unsigned and then later you cast it
to signed, you *lied* in your variable declaration. That can only cause
confusion.

Leigh Johnston

unread,
Feb 7, 2011, 12:06:59 PM2/7/11
to

Wrong. The cast is sometimes required when using the unsigned variable
in an expression involving signed variables (as clearly indicated in the
example code I posted); a cast is not required in other contexts. It
seems to me that the only confusion is your understanding of the issue.

/Leigh

Juha Nieminen

unread,
Feb 7, 2011, 12:19:33 PM2/7/11
to

But that's exactly why you need to make the original variable signed:
Because it can be used in signed expressions. It makes no sense to make
it unsigned when it will be used as a signed value anyways.

Whenever you need to cast an unsigned variable to a signed one in an
expression, that's a sign of bad design. You are using the variable in
a way that contradicts its original declaration.

Bo Persson

unread,
Feb 7, 2011, 12:21:17 PM2/7/11
to

And now we know one platform (singular) where the code is not
considerably smaller?


Bo Persson


Leigh Johnston

unread,
Feb 7, 2011, 12:33:22 PM2/7/11
to

Wrong; it is not a sign of bad design at all. If a variable is used as
an unsigned value more often than as a signed value and it logically
makes sense for it to be unsigned (e.g. it is never negative) then
making the variable unsigned makes perfect sense.

1) Unsigned integral types exist for a reason. Types are used to build
abstractions and having a richer set of building blocks is beneficial.
2) Casts exist for a reason. Type conversion is unavoidable (if
avoiding insanity).

You might be in the "use int everywhere" camp but I am not. Perhaps
Java is a more suitable language for you rather than C++.

/Leigh

Kaba

unread,
Feb 7, 2011, 4:58:35 PM2/7/11
to
Leigh Johnston wrote:
> Wrong; it is not a sign of bad design at all. If a variable is used as
> an unsigned value more often than as a signed value and it logically
> makes sense for it to be unsigned (e.g. it is never negative) then
> making the variable unsigned makes perfect sense.

--8x--

> You might be in the "use int everywhere" camp but I am not. Perhaps
> Java is a more suitable language for you rather than C++.

For what it's worth, here's my take on this. This is a piece of text I
have written earlier (explaining the form). I am in the 'use int
everywhere' camp:) Ideas can't and shouldn't be forced, but they can be
shared; the following text reflects how I reason my position, not that
the other camps are wrong.

In this article I present two problems in using unsigned
integers in a program. These are the _zero boundary problem_,
and the _extended positive range problem_.

Zero boundary problem
---------------------

Most of the time we are assuming that, away from boundaries caused
by the finite representation, an integer object works like an element
of the integer ring ZZ. In addition, it seems plausible that most
integer calculations are concentrated around a neighborhood of zero.

The _zero-boundary problem_ (of unsigned integers) is that the zero
is on the boundary beyond which the assumption of working with integers
falls apart. Thus, the probability of introducing errors in computations
increases greatly. Furthermore, those errors can not be catched, since
every value is legal.

### Looping with unsigned integers

Looping over values from n to zero backwards demonstrates the zero
boundary problem with unsigned integers:

for (unsigned int i = n;i >= 0;--i)
{
// Do something.
}

Since there are no negative values to fail the loop test,
this loop never finishes. In contrast, with signed integers the problem
does not exist, since the boundaries are located far away from zero:

for (int i = n;i >= 0;--i)
{
// Do something.
}

Extended positive range problem
-------------------------------

Conversion between an unsigned integer and a signed integer
is an information destroying process in either direction.
The only way to avoid this is to use one or the other
consistently.

If the normal arithmetic is the intention, then a signed integer
represents a more general concept than an unsigned integer: the former
covers both the negative and positive integers, whereas the latter
only covers non-negative integers. In programming terms, it is
possible to create a program using signed integers alone, however,
the same can't be said about the unsigned integers. Therefore, if
only one type is to be used consistently, the choice should be a
signed integer. However, let us do some more analysis.

Since any non-trivial program must use signed integers, the use of
unsigned integers eventually leads to an unsigned-signed
conversion. In particular, because `std::size_t` in the Standard
Library is an unsigned integer, there are few programs that can
escape the unsigned-signed conversions.

Despite these conversions, programs still seem to work. The reason
for this, I reflect, is that the unsigned integers are normally
not taken into the extended positive range they allow.
The _extended positive range problem_ is that if the unsigned integers
are taken to their extended positive range, then the signed-unsigned
conversions become a reality. Ensuring correctness under such a threat
is hard, if not practically impossible.

Conclusion
----------

Using unsigned integers to model integers decreases the probability
that a program is correct.

--
http://kaba.hilvi.org

Leigh Johnston

unread,
Feb 7, 2011, 5:39:55 PM2/7/11
to

It is debatable if the probability of introducing errors increases
greatly when using unsigned integral types; this is certainly an
assertion on your part, do you have any evidence to back it up?

>
> ### Looping with unsigned integers
>
> Looping over values from n to zero backwards demonstrates the zero
> boundary problem with unsigned integers:
>
> for (unsigned int i = n;i>= 0;--i)
> {
> // Do something.
> }

for (unsigned int i = n + 1;i-- > 0;)
{
// Do something.
}

Yes the horse named std::size_t has indeed already bolted. Stable door.

>
> Despite these conversions, programs still seem to work. The reason
> for this, I reflect, is that the unsigned integers are normally
> not taken into the extended positive range they allow.
> The _extended positive range problem_ is that if the unsigned integers
> are taken to their extended positive range, then the signed-unsigned
> conversions become a reality. Ensuring correctness under such a threat
> is hard, if not practically impossible.

You ensure correctness as you would ensure correctness using any other
language feature.

>
> Conclusion
> ----------
>
> Using unsigned integers to model integers decreases the probability
> that a program is correct.
>

Bugs can be created using any language feature therefore it is, IMO,
wrong to eschew a particular language feature based on the possibility
of bug creation unless there is clear evidence that the language feature
in question has a high probability of bug creation (for some definition
of "high").

What you have presented is little more than conjecture IMO.

/Leigh

Kaba

unread,
Feb 7, 2011, 6:47:09 PM2/7/11
to
Leigh Johnston wrote:
> On 07/02/2011 21:58, Kaba wrote:
> > The _zero-boundary problem_ (of unsigned integers) is that the zero
> > is on the boundary beyond which the assumption of working with integers
> > falls apart. Thus, the probability of introducing errors in computations
> > increases greatly. Furthermore, those errors can not be catched, since
> > every value is legal.
>
> It is debatable if the probability of introducing errors increases
> greatly when using unsigned integral types; this is certainly an
> assertion on your part, do you have any evidence to back it up?

Consider any function taking in an unsigned integer (e.g.
std::vector::resize()). It is possible that the computation of a new
size, as a bug, ends up being a negative number, which is then wrapped
to a large positive number. This bug can't be catched, since every value
is legal.

Something similar can happen with signed integers: a large negative
number wraps to large positive number. But this is much less probable:
the action is concentrated around zero.

Correctness is the most important property of a program. In the
described case using a signed integer _brings visible_ an often
occurring bug.

The problem of unchecked bugs gets worse with increasing levels of
abstraction. In the worst case you go through a deep layer of functions,
with an unexplained crash, although the actual problem was that an
argument of the top-most function did not fulfill its precondition (e.g.
size < 0). It's quite comparable to the thousand-lines template errors
when using Boost.Spirit.

> > ### Looping with unsigned integers
> >
> > Looping over values from n to zero backwards demonstrates the zero
> > boundary problem with unsigned integers:
> >
> > for (unsigned int i = n;i>= 0;--i)
> > {
> > // Do something.
> > }
>
> for (unsigned int i = n + 1;i-- > 0;)
> {
> // Do something.
> }

Nice:) However, the point is that to carry out a task you have to be
careful to avoid a bug. That's something that should not be left to
humans (because by Murphy's law...).

> > Despite these conversions, programs still seem to work. The reason
> > for this, I reflect, is that the unsigned integers are normally
> > not taken into the extended positive range they allow.
> > The _extended positive range problem_ is that if the unsigned integers
> > are taken to their extended positive range, then the signed-unsigned
> > conversions become a reality. Ensuring correctness under such a threat
> > is hard, if not practically impossible.
>
> You ensure correctness as you would ensure correctness using any other
> language feature.

I don't think it can be afforded, because no one can put the effort to
check each unsigned-signed conversion for trouble. The practical way,
which I also use, is to close the eyes and avoid exercising a program to
its extremes (e.g. 2^32 - 1 objects).

Interestingly, the 64-bit integers help with the extended range problem
_in practice_: the extreme values are then often avoided naturally (e.g.
no one creates an std::vector with 2^64 elements). But the zero-boundary
problem remains.

> > Conclusion
> > ----------
> >
> > Using unsigned integers to model integers decreases the probability
> > that a program is correct.
> >
>
> Bugs can be created using any language feature therefore it is, IMO,
> wrong to eschew a particular language feature based on the possibility
> of bug creation unless there is clear evidence that the language feature
> in question has a high probability of bug creation (for some definition
> of "high").

The possibility of a bug creation is not a problem. The problem is that
using unsigned integers prevents me from catching bugs, and that way
decreases my confidence on program correctness.

> What you have presented is little more than conjecture IMO.

Critical thinking appreciated.

--
http://kaba.hilvi.org

James Kanze

unread,
Feb 7, 2011, 8:16:01 PM2/7/11
to

Two reasons, really. The first is simple: the *normal* integral
type in C++ is plain int. You don't use anything else unless
plain int won't do the job. You don't use unsigned for the same
reason you don't use long, or short, or whatever. The second is
more fundamental: C++ doesn't have an unsigned type with
appropriate semantics which would be appropriate for days or
minutes, at least in the general case (where you might have to
take the difference between days or minutes). That's not to say
that you should never use unsigned---there are cases where you
want its special semantics. And there are doubtlessly cases
where it really doesn't matter---things like serial numbers, for
example. (But there again, unless there is some reason why int
isn't appropriate, then it should be int. Anything other than
int tells the reader that you need its special
features---smaller size, bigger range, etc.)

--
James Kanze

James Kanze

unread,
Feb 7, 2011, 8:20:10 PM2/7/11
to
On Feb 7, 1:18 pm, gwowen <gwo...@gmail.com> wrote:
> On Feb 7, 1:06 pm, Leigh Johnston <le...@i42.co.uk> wrote:

[...]


> Stick to arguing with Paul, at least you *are* probably smarter than
> him.

Ouch. Now that is hitting low.

(Seriously, there is one machine currently being sold today
where the compiler has an option to turn off standard compliant
behavior of unsigned integers, because of the runtime overhead.
But it's not common enough for it to be an issue for most
people. And from other postings in the past, I think Leigh's in
the camp that only Wintel matters.)

--
James Kanze

Leigh Johnston

unread,
Feb 7, 2011, 8:26:12 PM2/7/11
to

Nonsense; see posts else-thread.

/Leigh

Juha Nieminen

unread,
Feb 8, 2011, 1:10:21 AM2/8/11
to
Leigh Johnston <le...@i42.co.uk> wrote:
> Wrong; it is not a sign of bad design at all. If a variable is used as
> an unsigned value more often than as a signed value and it logically
> makes sense for it to be unsigned (e.g. it is never negative) then
> making the variable unsigned makes perfect sense.

What is "more often"? 50%? Do you count the individual instances where
it's being used in the program and change the type accordingly? What if
the percentage changes later?

The major problem with using unsigned types in variables which are being
used in signed expressions is that it's easy to forget the cast, which
easily introduces bugs which might not be immediately obvious.

The most typical case where this introduces bugs is where the unsigned
type is mixed with signed types on an expression that ends up being of a
floating point type. If you forget the cast, a result which should be in
the order of tens might end up to be wrongly in the order of billions.
The problem is that this might not be apparent immediately, only with
certain values (eg. if you are calculating coordinates based on some
input data or whatever).

You might think, for example, "the width and height of an image are
never negative, hence it makes sense to make then unsigned". However,
if the width and height are used to calculate screen coordinates of
objects (which is rather typical in many situations), you will end up
with the useless casts and a high risk of forgetting a cast and potentially
have your coordinates 2 billion pixels to the right and down.

In these cases it actually makes more sense to have the width and height
as signed integers.

And yes, I have experience on this precise problem.

> Perhaps
> Java is a more suitable language for you rather than C++.

Where did that come from?

Paul

unread,
Feb 8, 2011, 1:47:59 AM2/8/11
to

"James Kanze" <james...@gmail.com> wrote in message
news:9aaad3a2-2b6a-461d...@p16g2000vbo.googlegroups.com...

> On Feb 7, 1:18 pm, gwowen <gwo...@gmail.com> wrote:
>> On Feb 7, 1:06 pm, Leigh Johnston <le...@i42.co.uk> wrote:
>
> [...]
>> Stick to arguing with Paul, at least you *are* probably smarter than
>> him.
>
> Ouch. Now that is hitting low.
>

Look 3 idiots together! ^_^

Juha Nieminen

unread,
Feb 8, 2011, 2:00:41 AM2/8/11
to
Paul <pchr...@yahoo.co.uk> wrote:
> Look 3 idiots together! ^_^

Honestly, what do you think you are going to achieve with that kind of
attitude?

Paul

unread,
Feb 8, 2011, 2:16:14 AM2/8/11
to

"Juha Nieminen" <nos...@thanks.invalid> wrote in message
news:4d50ea19$0$2852$7b1e...@news.nbl.fi...

Or is it 4? �_�

Richard Kettlewell

unread,
Feb 8, 2011, 6:54:55 AM2/8/11
to
Kaba <no...@here.com> writes:

> ### Looping with unsigned integers
>
> Looping over values from n to zero backwards demonstrates the zero
> boundary problem with unsigned integers:
>
> for (unsigned int i = n;i >= 0;--i)
> {
> // Do something.
> }
>
> Since there are no negative values to fail the loop test,
> this loop never finishes.

While agreeing that unsigned can booby-trapped in some cases, I don't
think this is one of them: a decent compiler will spot the trivial
comparison and generate a warning.

--
http://www.greenend.org.uk/rjk/

Leigh Johnston

unread,
Feb 8, 2011, 9:21:52 AM2/8/11
to
On 08/02/2011 06:10, Juha Nieminen wrote:
> Leigh Johnston<le...@i42.co.uk> wrote:
>> Wrong; it is not a sign of bad design at all. If a variable is used as
>> an unsigned value more often than as a signed value and it logically
>> makes sense for it to be unsigned (e.g. it is never negative) then
>> making the variable unsigned makes perfect sense.
>
> What is "more often"? 50%? Do you count the individual instances where
> it's being used in the program and change the type accordingly? What if
> the percentage changes later?

Don't be a retard. By more often I simply meant that if you used an
incorrect abstraction then you would probably end up with more
"instances" of it being used in a context which requires a cast.
However the actual number is an irrelevant metric yes.

>
> The major problem with using unsigned types in variables which are being
> used in signed expressions is that it's easy to forget the cast, which
> easily introduces bugs which might not be immediately obvious.

You can create a bug using any language feature. Fix and recompile.

>
> The most typical case where this introduces bugs is where the unsigned
> type is mixed with signed types on an expression that ends up being of a
> floating point type. If you forget the cast, a result which should be in
> the order of tens might end up to be wrongly in the order of billions.
> The problem is that this might not be apparent immediately, only with
> certain values (eg. if you are calculating coordinates based on some
> input data or whatever).

You can create a bug using any language feature. Fix and recompile.

>
> You might think, for example, "the width and height of an image are
> never negative, hence it makes sense to make then unsigned". However,
> if the width and height are used to calculate screen coordinates of
> objects (which is rather typical in many situations), you will end up
> with the useless casts and a high risk of forgetting a cast and potentially
> have your coordinates 2 billion pixels to the right and down.

You can create a bug using any language feature. Fix and recompile.

>
> In these cases it actually makes more sense to have the width and height
> as signed integers.

In your opinion.

>
> And yes, I have experience on this precise problem.

So do I. I don't have any problem with casts; they are a perfectly
acceptable part of a statically typed language such as C++.

>
>> Perhaps
>> Java is a more suitable language for you rather than C++.
>
> Where did that come from?

As you seem to be in the "use int everywhere" camp Java may be a more
appropriate language for you.

/Leigh

Kaba

unread,
Feb 8, 2011, 10:24:19 AM2/8/11
to

Yep. So instead:

unsigned int a = f();

for (unsigned int i = n;i >= a;--i)
{
// Do something.
}

The loop works except when a = 0.

--
http://kaba.hilvi.org

Richard Kettlewell

unread,
Feb 8, 2011, 11:03:49 AM2/8/11
to
Kaba <no...@here.com> writes:
> Richard Kettlewell wrote:

>> While agreeing that unsigned can booby-trapped in some cases, I don't
>> think this is one of them: a decent compiler will spot the trivial
>> comparison and generate a warning.
>
> Yep. So instead:
>
> unsigned int a = f();
>
> for (unsigned int i = n;i >= a;--i)
> {
> // Do something.
> }
>
> The loop works except when a = 0.

Much better l-)

--
http://www.greenend.org.uk/rjk/

Martin

unread,
Feb 9, 2011, 8:36:05 PM2/9/11
to
On Feb 7, 3:58 pm, Kaba <n...@here.com> wrote:
> Leigh Johnston wrote:
[snip]

>
> Zero boundary problem
> ---------------------
>
> Most of the time we are assuming that, away from boundaries caused
> by the finite representation, an integer object works like an element
> of the integer ring ZZ. In addition, it seems plausible that most
> integer calculations are concentrated around a neighborhood of zero.

Actually, I assume an unsigned integer object behaves more like N away
from the boundaries caused by the finite representation. And it's not
an assumption that it behaves like Z<sub>UINT_MAX</sub>.

>
> The _zero-boundary problem_ (of unsigned integers) is that the zero
> is on the boundary beyond which the assumption of working with integers
> falls apart. Thus, the probability of introducing errors in computations
> increases greatly. Furthermore, those errors can not be catched, since
> every value is legal.

If you don't start with that assumption, it's failure won't cause any
errors at all. And the method you advocate for catching errors with
signed integers can also be transformed to work with unsigned
integers, unless the the unsigned integer's maximum is equal to the
signed integer's maximum. (.e.g. UINT_MAX == INT_MAX would prevent
the transformation from working. Otherwise, reserve the range
(INT_MAX, UINT_MAX] as invalid. If those need to be valid values,
then signed int isn't an option in the first place.)

>
[example elided]


>
> Extended positive range problem
> -------------------------------
>
> Conversion between an unsigned integer and a signed integer
> is an information destroying process in either direction.
> The only way to avoid this is to use one or the other
> consistently.

Not necessarily. There's at least one common implementation where
that conversion is lossless.

And one can, of course, only convert in situations where the
conversion is lossless (e.g. [0, INT_MAX]). Using only one or the
other is hardly the *only* way.

>
> If the normal arithmetic is the intention, then a signed integer
> represents a more general concept than an unsigned integer: the former
> covers both the negative and positive integers, whereas the latter
> only covers non-negative integers.

Which normal arithmetic? On reals? On rationals? On Integers? On
Naturals? On Complex? On Quaternions? The most obvious of those
options is the one on the reals. Why aren't you advocating the use of
double everywhere unless otherwise indicated?

> In programming terms, it is
> possible to create a program using signed integers alone, however,
> the same can't be said about the unsigned integers. Therefore, if
> only one type is to be used consistently, the choice should be a
> signed integer. However, let us do some more analysis.

This is not the previous argument in programming terms. That said,
the *only* reason it's not possible to create a program using only
unsigned integers is that main requires int. In fact, any
functionality that can be coded using only signed integers can be
coded using only unsigned integers.

>
> Since any non-trivial program must use signed integers, the use of
> unsigned integers eventually leads to an unsigned-signed
> conversion. In particular, because `std::size_t` in the Standard
> Library is an unsigned integer, there are few programs that can
> escape the unsigned-signed conversions.

Yep, just about every non-trivial program that is sanely written will
have both signed and unsigned integers.

>
> Despite these conversions, programs still seem to work. The reason
> for this, I reflect, is that the unsigned integers are normally
> not taken into the extended positive range they allow.
> The _extended positive range problem_ is that if the unsigned integers
> are taken to their extended positive range, then the signed-unsigned
> conversions become a reality. Ensuring correctness under such a threat
> is hard, if not practically impossible.

What problem? Let's assume that you have an unsigned integer with a
value in (INT_MAX, UINT_MAX]. Either this value is valid, in which
case the correctness is ensured by not converting. Or it's an invalid
value, in which case you've already failed to ensure correctness.
Your distinction makes no difference.

>
> Conclusion
> ----------
>
> Using unsigned integers to model integers decreases the probability
> that a program is correct.

True enough, but I don't see anyone advocating such a use.

James Kanze

unread,
Feb 10, 2011, 7:59:45 AM2/10/11
to
On Feb 7, 5:33 pm, Leigh Johnston <le...@i42.co.uk> wrote:

> On 07/02/2011 17:19, Juha Nieminen wrote:

> You might be in the "use int everywhere" camp but I am not.
> Perhaps Java is a more suitable language for you rather than
> C++.

Or perhaps he's just been using the language longer than you
have. Other people I know in the "use int everywhere" camp
include such people as Stroustrup, and in C, Kernigan and
Richie. (The fact that they use int everywhere isn't an
absolute argument for that policy, of course, but it does
signify that using int everywhere is very much in the C/C++
tradition.)

--
James Kanze

Leigh Johnston

unread,
Feb 10, 2011, 8:22:16 AM2/10/11
to

I have been using C++ for 18 years; I used to be (IMO) lazy and also
used int everywhere but I have since stopped this practice mainly due to
the advent of the C++ Standard Library and the size_type typedef which
by default is std::size_t which is an unsigned integral type. I am
completely comfortable with using "unsigned int" where appropriate these
days. We *can* argue about what is appropriate but I would rather not
as we would just be repeating the same old arguments.

/Leigh

gwowen

unread,
Feb 10, 2011, 8:32:17 AM2/10/11
to
On Feb 10, 1:22 pm, Leigh Johnston <le...@i42.co.uk> wrote:
> On 10/02/2011 12:59, James Kanze wrote:
>
> I am completely comfortable with using "unsigned int" where appropriate these days.  

unsigned is often fine, as long as you don't want to do subtraction.

Leigh Johnston

unread,
Feb 10, 2011, 11:41:07 AM2/10/11
to

Subtraction of unsigned values is fine if the minuend (lhs) is greater
than the subtrahend (rhs).

/Leigh

Juha Nieminen

unread,
Feb 10, 2011, 12:36:17 PM2/10/11
to
Leigh Johnston <le...@i42.co.uk> wrote:
> You can create a bug using any language feature. Fix and recompile.

Do you know how much does it cost to fix bugs vs. not having them in
the first place?

The problem is that the bug might not be found during development nor
even testing. And even if it's found in testing, it might be laborious to
track down the reason for the bug (because it often causes very erratic
behavior).

Besides, your argument can be used to defend sloppy programming. There's
nothing wrong in defensive programming that takes care of potential bugs
from the get go.

>> In these cases it actually makes more sense to have the width and height
>> as signed integers.
>
> In your opinion.

Based on actual experience in actual projects where using unsigned integers
to denote width and height of images has bitten me hard.

> So do I. I don't have any problem with casts; they are a perfectly
> acceptable part of a statically typed language such as C++.

As long as you *remember* to cast everything that needs to be cast.

Again, explicit casting is often a sign of bad design. There are
situations where explicit casting is ok and to be expected, but this
isn't one.

>>> Perhaps
>>> Java is a more suitable language for you rather than C++.
>>
>> Where did that come from?
>
> As you seem to be in the "use int everywhere" camp Java may be a more
> appropriate language for you.

No. I'm of the camp "use signed integers when signed types are being used".

I still don't understand what Java has to do with this.

Juha Nieminen

unread,
Feb 10, 2011, 12:41:26 PM2/10/11
to
James Kanze <james...@gmail.com> wrote:
> Or perhaps he's just been using the language longer than you
> have. Other people I know in the "use int everywhere" camp
> include such people as Stroustrup, and in C, Kernigan and
> Richie. (The fact that they use int everywhere isn't an
> absolute argument for that policy, of course, but it does
> signify that using int everywhere is very much in the C/C++
> tradition.)

Usually the only situation where using unsigned integers (of any size)
is legitimate is when you are playing with bits, rather than with integers
(in other words, when your variable is more or less used as a bit container).

There might also be some situations where you need the full range of
unsigned integers, including the wrap-around behavior (many random number
generators come to mind as examples).

With things like array indices the question starts being much fuzzier,
without any clear definitive answer. (Well, unless you really, really need
to address a char array with the full range of an unsigned integer. It's
not inconceivable that this could happen in some apps.)

Juha Nieminen

unread,
Feb 10, 2011, 12:42:09 PM2/10/11
to
> Or is it 4? °_°

You didn't answer my question.

Bo Persson

unread,
Feb 10, 2011, 12:48:30 PM2/10/11
to

If you are in the "use int everywhere", the unsigned size_type of the
standard containers is arguably a mistake (or non-optimal design, at
best). You only need the extra range if you have things like a char
array larger than half the addressable space. And when you do, *that*
is probably a mistake too.

Having size unsigned is a problem if you try to compute it by
subtracting two pointers, because that result is signed. You then
quickly end up mixing signed and unsigned artihmetic, which is
definitely *not* good.


Bo Persson


Paul

unread,
Feb 10, 2011, 12:51:41 PM2/10/11
to

"Juha Nieminen" <nos...@thanks.invalid> wrote in message
news:4d542211$0$2853$7b1e...@news.nbl.fi...

> Leigh Johnston <le...@i42.co.uk> wrote:
>> You can create a bug using any language feature. Fix and recompile.
>
> Do you know how much does it cost to fix bugs vs. not having them in
> the first place?
>
> The problem is that the bug might not be found during development nor
> even testing. And even if it's found in testing, it might be laborious to
> track down the reason for the bug (because it often causes very erratic
> behavior).
>
> Besides, your argument can be used to defend sloppy programming. There's
> nothing wrong in defensive programming that takes care of potential bugs
> from the get go.
>

This seems to be an argument of styles , nobody is right or wrong.
As for bug free code that is the *only* way to code, is it not?

>>> In these cases it actually makes more sense to have the width and
>>> height
>>> as signed integers.
>>
>> In your opinion.
>
> Based on actual experience in actual projects where using unsigned
> integers
> to denote width and height of images has bitten me hard.
>

This is arguable as a rectangle on Cartesian co-ords can be negative, but as
a 3D object its not going to be negative unless it is some kind of
anti-matter.

>> So do I. I don't have any problem with casts; they are a perfectly
>> acceptable part of a statically typed language such as C++.
>
> As long as you *remember* to cast everything that needs to be cast.
>
> Again, explicit casting is often a sign of bad design. There are
> situations where explicit casting is ok and to be expected, but this
> isn't one.
>

You should not need to cast anything a variable declared as unsigned should
be used for positive values only, except in very exceptional circumstances
IMO.
If you are casting signed to unsigned or vice versa often it seems like bad
programming to me.

>>>> Perhaps
>>>> Java is a more suitable language for you rather than C++.
>>>
>>> Where did that come from?
>>
>> As you seem to be in the "use int everywhere" camp Java may be a more
>> appropriate language for you.
>
> No. I'm of the camp "use signed integers when signed types are being
> used".
>

Saying 'of the camp use signed for everything' is just a saying and it does
not imply that these people are totally ignorant to the benefits of using
unsigned when appropriate.
I think it's more a case fo 'if in doubt' use a signed.

> I still don't understand what Java has to do with this.
>

All Java integer types are signed.

gwowen

unread,
Feb 10, 2011, 12:55:28 PM2/10/11
to
On Feb 10, 4:41 pm, Leigh Johnston <le...@i42.co.uk> wrote:
> Subtraction of unsigned values is fine if the minuend (lhs) is greater
> than the subtrahend (rhs).

If you're attempting to use the archaic (and almsot entirely unused)
Latin to demonstrate how educated and/or intelligent you are, it helps
to be right. Your usage of minuend and subtrahend are fine (to the
extent that such self-indulgent archaisms-for-effect are ever fine),
but you need to learn what the LHS and RHS of an equation are are.
Clue: They're relative to an '=', not to an operator.

For example in the subtraction: x = a - b

The minuend is 'a'. The subtrahend is 'b''. The LHS is 'x', the RHS
is 'a-b'.
Here, you go:
http://en.wikipedia.org/wiki/Left-hand_side_and_right-hand_side_of_an_equation

Leigh Johnston

unread,
Feb 10, 2011, 1:02:30 PM2/10/11
to
On 10/02/2011 17:36, Juha Nieminen wrote:
> Leigh Johnston<le...@i42.co.uk> wrote:
>> You can create a bug using any language feature. Fix and recompile.
>
> Do you know how much does it cost to fix bugs vs. not having them in
> the first place?

Again, you can create a bug in the first place using any language feature.

>
> The problem is that the bug might not be found during development nor
> even testing. And even if it's found in testing, it might be laborious to
> track down the reason for the bug (because it often causes very erratic
> behavior).

This is simply static the obvious; again you can create such bugs using
any language feature.

>
> Besides, your argument can be used to defend sloppy programming. There's
> nothing wrong in defensive programming that takes care of potential bugs
> from the get go.
>
>>> In these cases it actually makes more sense to have the width and height
>>> as signed integers.
>>
>> In your opinion.
>
> Based on actual experience in actual projects where using unsigned integers
> to denote width and height of images has bitten me hard.

It has not bitten me hard; I have only ever had one such bug if memory
serves on the effect of the bug was obvious and I quickly fixed it. You
can create bugs using any language feature.

>
>> So do I. I don't have any problem with casts; they are a perfectly
>> acceptable part of a statically typed language such as C++.
>
> As long as you *remember* to cast everything that needs to be cast.

Again you are simply stating the obvious; it is possible to create a bug
using any language feature and by forgetting to do something.

>
> Again, explicit casting is often a sign of bad design. There are
> situations where explicit casting is ok and to be expected, but this
> isn't one.
>
>>>> Perhaps
>>>> Java is a more suitable language for you rather than C++.
>>>
>>> Where did that come from?
>>
>> As you seem to be in the "use int everywhere" camp Java may be a more
>> appropriate language for you.
>
> No. I'm of the camp "use signed integers when signed types are being used".
>
> I still don't understand what Java has to do with this.

Because all Java's integral types are signed ergo Java may suit you
better due to your irrational fear of a perfectly acceptable C++
language features.

/Leigh

Leigh Johnston

unread,
Feb 10, 2011, 1:10:53 PM2/10/11
to

Your reaction to having your reply criticised is a pathetic troll. Grow up.

Obviously I am referring to the lhs and rhs of the subtraction
*operator* not the lhs and rhs of an *equation*.

If you care to look at the C++ Standard you will find many instances of
the terms lhs and rhs being used with respect to operators, for example:

"template<class charT, class traits, class Allocator>
basic_string<charT,traits,Allocator>
operator+(const basic_string<charT,traits,Allocator>& lhs,
const basic_string<charT,traits,Allocator>& rhs);"

Perhaps you should engage your brain before posting in future.

/Leigh

Juha Nieminen

unread,
Feb 10, 2011, 1:15:44 PM2/10/11
to
Leigh Johnston <le...@i42.co.uk> wrote:
> Again, you can create a bug in the first place using any language feature.

This whole "bugs will happen no matter what, hence it doesn't matter
what you do to avoid them, just fix them when they happen" mentality is
something that I just must disagree with.

There is no reason nor advantage in using unsigned integers to represent
image sizes (at least not in any of the practical applications I have been
involved with, where image sizes never get even close to 2 billion x 2
billion pixels). On the contrary, it can only potentially cause bugs which
can be hard to trace and which might not manifest themselves immediately
nor obviously.

>>>>> Perhaps
>>>>> Java is a more suitable language for you rather than C++.
>>>>
>>>> Where did that come from?
>>>
>>> As you seem to be in the "use int everywhere" camp Java may be a more
>>> appropriate language for you.
>>
>> No. I'm of the camp "use signed integers when signed types are being used".
>>
>> I still don't understand what Java has to do with this.
>
> Because all Java's integral types are signed ergo Java may suit you
> better due to your irrational fear of a perfectly acceptable C++
> language features.

So if you want to, for example, manipulate the individual bits of a
32-bit register, you do what exactly in Java? Or how do you write an
efficient linear congruential generator in Java?

Where have I ever said that unsigned types must never be used?

Leigh Johnston

unread,
Feb 10, 2011, 1:17:04 PM2/10/11
to
On 10/02/2011 17:41, Juha Nieminen wrote:
> James Kanze<james...@gmail.com> wrote:
>> Or perhaps he's just been using the language longer than you
>> have. Other people I know in the "use int everywhere" camp
>> include such people as Stroustrup, and in C, Kernigan and
>> Richie. (The fact that they use int everywhere isn't an
>> absolute argument for that policy, of course, but it does
>> signify that using int everywhere is very much in the C/C++
>> tradition.)
>
> Usually the only situation where using unsigned integers (of any size)
> is legitimate is when you are playing with bits, rather than with integers
> (in other words, when your variable is more or less used as a bit container).

What utter nonsense; when dealing with just *positive* integers the
unsigned integral types are fine.

>
> There might also be some situations where you need the full range of
> unsigned integers, including the wrap-around behavior (many random number
> generators come to mind as examples).
>
> With things like array indices the question starts being much fuzzier,
> without any clear definitive answer. (Well, unless you really, really need
> to address a char array with the full range of an unsigned integer. It's
> not inconceivable that this could happen in some apps.)

More nonsense; it is not fuzzy at all; the C++ draft Standard provides a
clear and definitive answer: std::tr1::array::size_type is std::size_t
(array indices).

/Leigh

Juha Nieminen

unread,
Feb 10, 2011, 1:33:07 PM2/10/11
to
Leigh Johnston <le...@i42.co.uk> wrote:
>> Usually the only situation where using unsigned integers (of any size)
>> is legitimate is when you are playing with bits, rather than with integers
>> (in other words, when your variable is more or less used as a bit container).
>
> What utter nonsense; when dealing with just *positive* integers the
> unsigned integral types are fine.

If there is no clear advantage or need for unsigned types, choosing
a signed type is usually safer. This is especially so if the variable
will be used in signed expressions.

Of course this doesn't mean there aren't situations where unsigned types
are more advantageous if not outright mandatory (for example when handling
raw bytes you usually want to use unsigned chars rather than signed ones),
but those situations are, in fact, relatively rare.

> More nonsense; it is not fuzzy at all; the C++ draft Standard provides a
> clear and definitive answer: std::tr1::array::size_type is std::size_t
> (array indices).

The standardization committee made the decision of defining the size_type
of containers as an unsigned integer. Not everybody agrees with this
decision, and there are good arguments on both sides. The issue *is* fuzzy
and a question of opinion.

James Kanze

unread,
Feb 10, 2011, 2:13:15 PM2/10/11
to

Many people think so.

> You only need the extra range if you have things like a char
> array larger than half the addressable space. And when you do, *that*
> is probably a mistake too.

Practically speaking, you never need the extra range. Today at
least---I think the choice of unsigned was made when development
was on a 16 bit MS-DOS machine, and I've personally written code
which used arrays of over 32K char on such machines.

> Having size unsigned is a problem if you try to compute it by
> subtracting two pointers, because that result is signed. You then
> quickly end up mixing signed and unsigned artihmetic, which is
> definitely *not* good.

Historically, there is a very difficult problem to solve.
size_t must be big enough to contain the size of any possible
object, which usually means you either make it unsigned, or make
it have mor bits than a pointer has. Given its role, the choice
is obvious, but it does have the problem you mention: if
pointers are 32 bits, and you can use most of the memory (more
than half) in a single object, then you need 33 bits to
represent the difference between two pointers. But doing this
only affects a very small minority of programs, and comes at
a not insignificant runtime cost (or did, on older 32 bit
machines). The choice made by the C committee (sometime before
1989) represents a valid compromise, even if it has some serious
drawbacks.

Using size_t as the default size_type in the C++ standard
library is a different issue, and it's definitely not a good
choice.

--
James Kanze

James Kanze

unread,
Feb 10, 2011, 2:15:49 PM2/10/11
to
On Feb 10, 6:17 pm, Leigh Johnston <le...@i42.co.uk> wrote:
> On 10/02/2011 17:41, Juha Nieminen wrote:

> > James Kanze<james.ka...@gmail.com> wrote:
> >> Or perhaps he's just been using the language longer than you
> >> have. Other people I know in the "use int everywhere" camp
> >> include such people as Stroustrup, and in C, Kernigan and
> >> Richie. (The fact that they use int everywhere isn't an
> >> absolute argument for that policy, of course, but it does
> >> signify that using int everywhere is very much in the C/C++
> >> tradition.)

> > Usually the only situation where using unsigned integers (of
> > any size) is legitimate is when you are playing with bits,
> > rather than with integers (in other words, when your
> > variable is more or less used as a bit container).

> What utter nonsense; when dealing with just *positive*
> integers the unsigned integral types are fine.

No they're not. Unsigned integers are a very poor abstraction
of a cardinal type.

> > There might also be some situations where you need the full
> > range of unsigned integers, including the wrap-around
> > behavior (many random number generators come to mind as
> > examples).

> > With things like array indices the question starts being
> > much fuzzier, without any clear definitive answer. (Well,
> > unless you really, really need to address a char array with
> > the full range of an unsigned integer. It's not
> > inconceivable that this could happen in some apps.)

> More nonsense; it is not fuzzy at all; the C++ draft Standard
> provides a clear and definitive answer:
> std::tr1::array::size_type is std::size_t (array indices).

std::array::size_type is size_t because that is the default
size_type for all of the earlier containers. It's a bad
decision we have to live with.

--
James Kanze

Paul

unread,
Feb 10, 2011, 2:36:25 PM2/10/11
to

"Juha Nieminen" <nos...@thanks.invalid> wrote in message
news:4d542371$0$2853$7b1e...@news.nbl.fi...

3 people slagging me off on this thread in which I wasn't even
participating, and you suggest I have some attitude problem. I think my
reply was sufficient.

Leigh Johnston

unread,
Feb 10, 2011, 4:33:04 PM2/10/11
to
On 10/02/2011 19:15, James Kanze wrote:
> On Feb 10, 6:17 pm, Leigh Johnston<le...@i42.co.uk> wrote:
>> On 10/02/2011 17:41, Juha Nieminen wrote:
>
>>> James Kanze<james.ka...@gmail.com> wrote:
>>>> Or perhaps he's just been using the language longer than you
>>>> have. Other people I know in the "use int everywhere" camp
>>>> include such people as Stroustrup, and in C, Kernigan and
>>>> Richie. (The fact that they use int everywhere isn't an
>>>> absolute argument for that policy, of course, but it does
>>>> signify that using int everywhere is very much in the C/C++
>>>> tradition.)
>
>>> Usually the only situation where using unsigned integers (of
>>> any size) is legitimate is when you are playing with bits,
>>> rather than with integers (in other words, when your
>>> variable is more or less used as a bit container).
>
>> What utter nonsense; when dealing with just *positive*
>> integers the unsigned integral types are fine.
>
> No they're not. Unsigned integers are a very poor abstraction
> of a cardinal type.

Nonsense.

>
>>> There might also be some situations where you need the full
>>> range of unsigned integers, including the wrap-around
>>> behavior (many random number generators come to mind as
>>> examples).
>
>>> With things like array indices the question starts being
>>> much fuzzier, without any clear definitive answer. (Well,
>>> unless you really, really need to address a char array with
>>> the full range of an unsigned integer. It's not
>>> inconceivable that this could happen in some apps.)
>
>> More nonsense; it is not fuzzy at all; the C++ draft Standard
>> provides a clear and definitive answer:
>> std::tr1::array::size_type is std::size_t (array indices).
>
> std::array::size_type is size_t because that is the default
> size_type for all of the earlier containers. It's a bad
> decision we have to live with.
>

Live with the fact that C++ provides unsigned integral types.

/Leigh

James Kanze

unread,
Feb 11, 2011, 4:49:00 AM2/11/11
to
On Feb 10, 9:33 pm, Leigh Johnston <le...@i42.co.uk> wrote:
> On 10/02/2011 19:15, James Kanze wrote:

> Live with the fact that C++ provides unsigned integral types.

Live with the fact that they were designed for very specific
uses, and not as a generalized cardinal type. (At least
according the Kernighan and Richie.)

--
James Kanze

Paul

unread,
Feb 11, 2011, 6:08:58 AM2/11/11
to

"James Kanze" <james...@gmail.com> wrote in message
news:387d6208-5a69-45fb...@j11g2000yqh.googlegroups.com...
But this is appealing to authority and Leigh will call you a troll now.

Juha Nieminen

unread,
Feb 11, 2011, 6:51:32 AM2/11/11
to

I suppose that signed and unsigned ints exist in C (and hence in C++)
because the most common CPU architectures of the time could handle the
CPU registers as either signed or unsigned integers (usually using the
2's complement format, which was the easiest and most logical way of
handling binary registers from the point of view of the hardware).
This idea of the same registers being handled as either signed or
unsigned became so ubiquitous that basically all CPU designs today do
so (I'm not aware of any exceptions; if there are, they are probably
extremely exotic and probably obsolete nowadays).

From a program design point of view whether to use signed or unsigned
integers is an interesting and very ambiguous question. Usually there is
no speed difference between the two, so the question is purely one of
program design and/or need.

The two opposing arguments could perhaps be summarized as:

1) The signedness of the integral type should be chosen according to
the meaning of the variable. If negative values are nonsensical, then
an unsigned type should be used to reflect this.

2) When dealing with integer math, signed values are the natural choice.
Unsigned integers should generally be used for more low-level, more
"hardware-level" programming, such as direct bit manipulation or
situations where the modulo behavior of hardware registers is actually
intended and explicitly desired (one good example of this being the
implementation of a linear congruential generator of the size of the
natural register size of the architecture).

In my personal experience argument 1 presents practical problems
because such unsigned integrals are often used (sometimes inadvertedly)
in expressions which mix them with signed types, causing implicit
promotions with their inherent problems, requiring explicit casting
to avoid them.

I have already mentioned this example, but I think it's rather telling,
and it has happened to me in practice in several occasions: Image
dimensions is something that one could argue can never be negative and
hence unsigned integrals should be used. The problem is that these
dimensions are then very easily used in expressions where they get mixed
with signed types (such as screen coordinates), the result of which then
gets promoted to floating point. You easily end up in a situation where
your coordinates end up accidentally in the billions, and this is
something that might not be evident immediately, but only after
extensive testing, and not even after that.

The problem is easily solved by having all the variables as signed.
There's little reason to have image dimensions as unsigned, as there's
seldom any benefit (an image with dimensions so large that it cannot be
represented with a signed integer would usually be so large that it
would not fit in RAM).

Leigh Johnston

unread,
Feb 11, 2011, 7:46:01 AM2/11/11
to

std::size_t

/Leigh

gwowen

unread,
Feb 11, 2011, 8:25:53 AM2/11/11
to
On Feb 11, 11:51 am, Juha Nieminen <nos...@thanks.invalid> wrote:
>   The problem is easily solved by having all the variables as signed.
> There's little reason to have image dimensions as unsigned, as there's
> seldom any benefit (an image with dimensions so large that it cannot be
> represented with a signed integer would usually be so large that it
> would not fit in RAM).

I wonder if anyone has formulated a CPU-version of Zipf's law, along
the lines of "90% of all integers stored in computers are (in
absolute) between 0 and +9, of the remainder 90% are between 10 and
100, of the remainder 100 and 100 etc...". I'm not counting pointers,
bit masks etc ... I've no evidence to back that up, but it seems
plausible.

Robert Hairgrove

unread,
Feb 11, 2011, 11:29:32 AM2/11/11
to
On 02/08/2011 02:26 AM, Leigh Johnston wrote:
> On 08/02/2011 01:16, James Kanze wrote:
>> On Feb 4, 5:19 pm, Leigh Johnston<le...@i42.co.uk> wrote:
>>> On 04/02/2011 17:08, James Kanze wrote:

[huge snip]

> Nonsense; see posts else-thread.
>
> /Leigh

Now what does this say about the poster when he quotes dozens of lines
from previously posted messages, only to add "nonsense" at the end?

Troll ... *plonk*

James Kanze

unread,
Feb 11, 2011, 2:43:45 PM2/11/11
to

> std::size_t

Didn't come along until much later, and was designed for a very
specific purpose.

--
James Kanze

Leigh Johnston

unread,
Feb 11, 2011, 3:59:19 PM2/11/11
to

Who cares when it came along? I live in the present not the past.

Array indices are never negative. You cannot allocate nor have a
negative number of container elements. Join the dots.

std::size_t

std::tr1::array::size_type

std::allocator::size_type
std::vector::size_type

/Leigh

Paul

unread,
Feb 11, 2011, 5:14:15 PM2/11/11
to

"Leigh Johnston" <le...@i42.co.uk> wrote in message
news:1qGdnTiF0q64PsjQ...@giganews.com...
This problem stems from the introduction of the Standard library which
created different types of C++ programmers:
a) The old style programmers who never needed the standard library.
b) The programmers who choose to learn the standard library as part of the
C++ language.

I consider the std lib as an extension to the C++ language, some people
consider it as part of the C++ language. To me the C++ language is/was a
low/mid level programming language. The introduction of the std lib raises
the level to mid/high level programming IMHO.

The C++ language now seems a bit confused nowadays and without a definite
direction. I liked the old C++ where you had the power to do almost anything
you like within the realms of computer programming, for example you could
define a std::size_t if you wanted to but this was not forced upon you as
part of the language.

Leigh appears as newbie because he knows all the ins and outs of the std
lib, he reckons he has 18 years programming experience but the programs he
has created ref: his website, suggests he is quite noobish.

I think Leigh is probably *generally* right regarding the use of unsigned
for array indexing and counters but this may not always be the case on all
implementations. I do not pretend to understand the exact details of all
microchips and how they handle integers but I think this is the key to
understanding which is more efficient;.

Juha Nieminen

unread,
Feb 12, 2011, 2:27:51 PM2/12/11
to
Leigh Johnston <le...@i42.co.uk> wrote:
> Array indices are never negative.

Do you know what value std::string::npos usually is? That's right,
a value which would otherwise be a completely valid index, but which
is just agreed to mean "not any valid index". Usually this value is
size_t(-1). In other words, the natural value for npos would be -1,
but since the index type is unsigned, it cannot be used.

Even if we conceded that array indices are never negative, the result
of subtracting to array indices can be, in which case you have to do the
manual casting to avoid problems. (Why do you think ptrdiff_t is signed?)

Leigh Johnston

unread,
Feb 12, 2011, 4:09:33 PM2/12/11
to

Array indices are never negative; the result of subtracting array
indices cannot be used as an array index if the result would otherwise
be negative (if signed).

/Leigh

Paul

unread,
Feb 12, 2011, 4:29:04 PM2/12/11
to

"Leigh Johnston" <le...@i42.co.uk> wrote in message
news:8q2dnYQBytKUasvQ...@giganews.com...

char str[24];
char* p_str = str+12;
p_str[-12] ='A';

Miles Bader

unread,
Feb 12, 2011, 6:59:35 PM2/12/11
to
James Kanze <james...@gmail.com> writes:
>> Live with the fact that C++ provides unsigned integral types.
>
> Live with the fact that ...blah blah blah...

How many times a month does this particular religious war pop up in new
threads anyway?

-miles

--
Quidquid latine dictum sit, altum viditur.

Sjouke Burry

unread,
Feb 12, 2011, 9:29:43 PM2/12/11
to
Juha Nieminen wrote:
> Leigh Johnston <le...@i42.co.uk> wrote:
>> Array indices are never negative.

In fortran there is no problem with those indices.
If you want arr(-5:20)? No problem.

SG

unread,
Feb 13, 2011, 5:27:27 AM2/13/11
to
On 2 Feb., 15:53, MM wrote:
> I have about 5million rows like these (day, minute, and 4 doubles)
>
> YYYYMMDD, HHMM, double, double, double, double
>
> which I parse from a file and store into
>
> struct one_line {
>   boost::uint32_t day;
>   boost::uint32_t minute;
>   double d1, d2, d3, d4;
> };

I have to agree with the others about the way you store a date and a
time. This looks error-prone.

About vector<one_line> versus list<one_line> w.r.t. sorting speed:
Test it. I would expect the vector version to be faster. Here's why:
one_line is "trivially copyable/assignable" (in C++0x terms) which
means that a good compiler will copy the raw bytes as fast as
possible. The difference in time between copying a raw block of bytes
of this size and adjusting a couple of pointers is probably
insignificant in comparison to the differences in the memory access
patterns and the resulting number of cache misses. Consider the memory
access patterns of quicksort given a linear memory layout versus a
mergesort on a doubly linked list whose elements are more or less
arbitrarily distributed in memory. Keep in mind that linear memory
accesses (forwards and backwards) are typically very fast whereas
random access is slow.

Cheers!
SG

James Kanze

unread,
Feb 13, 2011, 7:47:25 AM2/13/11
to

Touché.

You've actually raised a significant point with regards to
Leigh's argument: it's find to say that "array indices are never
negative", except that in C and in C++, there aren't really any
such things as array indices, since there is no array
indexation; only pointer arithmetic.

Of course, user defined classes may (and generally do) override
[] to signify indexation. But there's certainly no requirement
there that array indices cannot be negative---it's purely a
question of how the author defines his class. (There are a few
algorithms where allowing arbitrary upper and lower bounds can
be quite useful.)

--
James Kanze

Leigh Johnston

unread,
Feb 13, 2011, 8:46:49 AM2/13/11
to
On 13/02/2011 12:47, James Kanze wrote:
> On Feb 12, 9:29 pm, "Paul"<pchris...@yahoo.co.uk> wrote:
>> "Leigh Johnston"<le...@i42.co.uk> wrote in message
>
>> news:8q2dnYQBytKUasvQ...@giganews.com...
>>> On 12/02/2011 19:27, Juha Nieminen wrote:
>>>> Leigh Johnston<le...@i42.co.uk> wrote:
>>>>> Array indices are never negative.
>
>>>> Do you know what value std::string::npos usually is? That's right,
>>>> a value which would otherwise be a completely valid index, but which
>>>> is just agreed to mean "not any valid index". Usually this value is
>>>> size_t(-1). In other words, the natural value for npos would be -1,
>>>> but since the index type is unsigned, it cannot be used.
>
>>>> Even if we conceded that array indices are never negative, the result
>>>> of subtracting to array indices can be, in which case you have to do the
>>>> manual casting to avoid problems. (Why do you think ptrdiff_t is signed?)
>
>>> Array indices are never negative; the result of subtracting array indices
>>> cannot be used as an array index if the result would otherwise be negative
>>> (if signed).
>
>> char str[24];
>> char* p_str = str+12;
>> p_str[-12] ='A';
>
> Touch�.

Touch�? I don't think so. Pointers are not arrays. In C and C++ array
indexes are never negative. An array is a distinct concept in C/C++ and
has a distinct type; "p_str" above could subsequently be made to point
to something else (other than an array).

>
> You've actually raised a significant point with regards to
> Leigh's argument: it's find to say that "array indices are never
> negative", except that in C and in C++, there aren't really any
> such things as array indices, since there is no array
> indexation; only pointer arithmetic.

Not true; an array is a distinct concept, a pointer is a distinct concept:

template <typename T>
void is_same_i_dont_think_so(const T&, const T&) {}
...
is_same_i_dont_think_so(str, p_str);

Just because an array can decay to a pointer does not mean that arrays
are the same as pointers.

>
> Of course, user defined classes may (and generally do) override
> [] to signify indexation. But there's certainly no requirement
> there that array indices cannot be negative---it's purely a
> question of how the author defines his class. (There are a few
> algorithms where allowing arbitrary upper and lower bounds can
> be quite useful.)

A user defined class is not a C/C++ array and overriding operator[] in a
user defined class is mostly irrelevant as I am talking about C/C++ arrays.

std::tr1::array is the standard array container class in C++0x and its
index type is std::size_t as array indices are never negative.

The fact that Kanze chooses to act as a troll and chime in with Paul The
Troll is rather sad and pathetic. Kanze is now precariously close to my
killfile.

/Leigh

Paul

unread,
Feb 13, 2011, 10:19:49 AM2/13/11
to

"Leigh Johnston" <le...@i42.co.uk> wrote in message
news:5YWdnYQDroVYfcrQ...@giganews.com...
>> Touché.
>
> Touché? I don't think so. Pointers are not arrays. In C and C++ array
> indexes are never negative. An array is a distinct concept in C/C++ and
> has a distinct type; "p_str" above could subsequently be made to point to
> something else (other than an array).
>
Most arrays are stored on the heap. So how else can we have a dynamic array
if it is not a pointer?

int* p_arr = new int[10];
p_arr+=5;

p_arrr[-5];
The above is a negative array index, there is no other reasonable
description for it.

>>
>> You've actually raised a significant point with regards to
>> Leigh's argument: it's find to say that "array indices are never
>> negative", except that in C and in C++, there aren't really any
>> such things as array indices, since there is no array
>> indexation; only pointer arithmetic.
>
> Not true; an array is a distinct concept, a pointer is a distinct concept:
>
> template <typename T>
> void is_same_i_dont_think_so(const T&, const T&) {}
> ...
> is_same_i_dont_think_so(str, p_str);
>
> Just because an array can decay to a pointer does not mean that arrays are
> the same as pointers.
>

The above argument is complete nonsense because even two *local* arrays can
fail this test if they are different types/sizes.
And IIRC is there not a rule in C++ about not referencing arrays?.
There is no other way to create a dynamic array, except by using a pointer.
It doesn't *decay* into a pointer it begins life as a pointer.

>>
>> Of course, user defined classes may (and generally do) override
>> [] to signify indexation. But there's certainly no requirement
>> there that array indices cannot be negative---it's purely a
>> question of how the author defines his class. (There are a few
>> algorithms where allowing arbitrary upper and lower bounds can
>> be quite useful.)
>
> A user defined class is not a C/C++ array and overriding operator[] in a
> user defined class is mostly irrelevant as I am talking about C/C++
> arrays.
>
> std::tr1::array is the standard array container class in C++0x and its
> index type is std::size_t as array indices are never negative.
>

Um so its ok to use class types, such as std containers, when they support
your argument but not when they are against you?


> The fact that Kanze chooses to act as a troll and chime in with Paul The
> Troll is rather sad and pathetic. Kanze is now precariously close to my
> killfile.
>

This is totally predictable Leigh behaviour, when he is losing an argument
he always resorts to this. He is like a baby throwing the dummy-tit out of
the pram.

Leigh Johnston

unread,
Feb 13, 2011, 10:52:44 AM2/13/11
to
>> Touché.
>
> Touché? I don't think so. Pointers are not arrays. In C and C++ array

> indexes are never negative. An array is a distinct concept in C/C++ and
> has a distinct type; "p_str" above could subsequently be made to point
> to something else (other than an array).
>

Even if one considers the dynamic array case (and dynamic arrays are old
hat in C++ land, prefer std::vector instead) the *array* index can still
not be negative:

int* pa = new int[10](); // (1)
pa += 5;
pa[-5] = 42; // (2)
pa -= 5;
delete[] pa;

In (1) above the pointer pa *points to* an array. In (2) above the pa
pointer no longer refers to an array; it just refers to an array
element. In (2) above the *array* index is actually 0 not -5 (5 + -5 = 0).

I repeat: arrays and pointers are distinct entities in C and C++.
Arrays can decay into pointers but that does not mean that arrays are
pointers. A pointer can point to an array element but that does not
mean that a pointer is an array.

/Leigh

Paul

unread,
Feb 13, 2011, 12:01:18 PM2/13/11
to

"Leigh Johnston" <le...@i42.co.uk> wrote in message
news:afqdndj3e8faY8rQ...@giganews.com...
So now he implies that a dynamic array is obsolete, this is obviously a
joke.

>
> int* pa = new int[10](); // (1)
> pa += 5;
> pa[-5] = 42; // (2)
> pa -= 5;
> delete[] pa;
>
> In (1) above the pointer pa *points to* an array. In (2) above the pa
> pointer no longer refers to an array; it just refers to an array element.
> In (2) above the *array* index is actually 0 not -5 (5 + -5 = 0).

No such arithmetic takes place re:(5+(-5)=0), this is complete nonsense.
If it was calculated as 0, it would refer to some out of the bounds memory
and be an error.

>
> I repeat: arrays and pointers are distinct entities in C and C++.

An array on the heap, which most arrays are, is a pointer to memory.

> Arrays can decay into pointers but that does not mean that arrays are
> pointers.

It doesn't *decay* at all, it is born as a pointer.

> A pointer can point to an array element but that does not mean that a
> pointer is an array.
>

Nobody said a pointer is an array and that isn't the argument. The argument
is that an array cannot have a negative index. This displays his attempt to
shift the focus of the argument.
He said 'an array index cannot be negative', I think we have, once again,
proven him to be incorrect.


Leigh Johnston

unread,
Feb 13, 2011, 2:39:17 PM2/13/11
to
>>> Touch�.
>>
>> Touch�? I don't think so. Pointers are not arrays. In C and C++ array

>> indexes are never negative. An array is a distinct concept in C/C++ and
>> has a distinct type; "p_str" above could subsequently be made to point
>> to something else (other than an array).
>>
>
> Even if one considers the dynamic array case (and dynamic arrays are old
> hat in C++ land, prefer std::vector instead) the *array* index can still
> not be negative:
>
> int* pa = new int[10](); // (1)
> pa += 5;
> pa[-5] = 42; // (2)
> pa -= 5;
> delete[] pa;
>
> In (1) above the pointer pa *points to* an array. In (2) above the pa
> pointer no longer refers to an array; it just refers to an array
> element. In (2) above the *array* index is actually 0 not -5 (5 + -5 = 0).
>

For the confused trolls (Paul), the stupid trolls (Paul) and the trolls
with nothing better to do (Kanze):

I am not implying that 5 + -5 is actually emitted by the compiler but
that the *underlying array index* of the array element that pa[-5]
refers to is 0 and a compiler might even optimize accordingly reflecting
this.

I never claimed that *pointer* arithmetic can not involve negative
values (std::ptrdiff_t is a signed integral type) instead I claimed that
you cannot have negative *array* indices.

In (2) above the pointer pa no longer refers to an array as it no longer
points to the start of an array instead it just refers to an array
element hence the -5 in pa[-5] is not an array index but a pointer
offset; the array index is 0.

I repeat for the third and final time: arrays and pointers are distinct
C/C++ concepts.

As far as dynamic arrays vs std::vector is concerned std::vector is
superior to dynamic arrays making dynamic arrays more or less obsolete
in C++. Even a container allocator will likely be calling the scalar
form of operator new (which will likely call malloc) rather then using
the array form of operator new to allocate an array of bytes into which
the container elements can be constructed.

/Leigh

P.S. std::tr1::array::size_type

Paul

unread,
Feb 13, 2011, 3:06:52 PM2/13/11
to

"Leigh Johnston" <le...@i42.co.uk> wrote in message
news:wuudndhMGZ_8rsXQ...@giganews.com...
>>>> Touché.
>>>
>>> Touché? I don't think so. Pointers are not arrays. In C and C++ array
An array index is the value inside the square brackets that you use to index
an array.

He can repeat as many times as he likes, that an array is not a pointer but
this doesn't change the fact he is wrong to state that an array index cannot
be negative.


> As far as dynamic arrays vs std::vector is concerned std::vector is
> superior to dynamic arrays making dynamic arrays more or less obsolete in
> C++. Even a container allocator will likely be calling the scalar form of
> operator new (which will likely call malloc) rather then using the array
> form of operator new to allocate an array of bytes into which the
> container elements can be constructed.
>

Vector is not superior to a dynamic array. A vector is an STL container and
is used in completely different situations. The fact that Leigh seems to
compare the two as the same thing just shows his noobness.

Bjarke Hammersholt Roune

unread,
Feb 13, 2011, 8:51:10 PM2/13/11
to
On Feb 12, 2:27 pm, Juha Nieminen <nos...@thanks.invalid> wrote:
> Leigh Johnston <le...@i42.co.uk> wrote:
> > Array indices are never negative.
>
>   Do you know what value std::string::npos usually is? That's right,
> a value which would otherwise be a completely valid index, but which
> is just agreed to mean "not any valid index". Usually this value is
> size_t(-1). In other words, the natural value for npos would be -1,
> but since the index type is unsigned, it cannot be used.
>
In this case -1 is merely a shorter way of saying
std::numeric_limits<size_t>::max(), and once you've typed that a few
times you come to appreciate (size_t)-1. Once you realize that you
also realize that in fact npos could never be a valid index since
valid indexes are strictly less than the size of the array (or in this
case string) you are indexing into, and the maximum possible number of
addressable bytes is precisely std::numeric_limits<size_t>::max, or
npos. So (size_t)-1 is a reasonable value for npos to have - it is the
one and only unsigned value that could not be a valid index in any
imaginable system. Using signed indexes would, on the other hand, make
50% of otherwise perfectly good bit patterns unusable as indexes. I
don't know that that's a big problem, but your argument for preserving
an "otherwise completely valid index" comes down on the opposite side
than you intended.

Paavo Helde

unread,
Feb 14, 2011, 1:48:43 AM2/14/11
to
Bjarke Hammersholt Roune <bjarke...@gmail.com> wrote in news:a7e3c7d2-
8b96-4067-85a...@f30g2000yqa.googlegroups.com:

> On Feb 12, 2:27 pm, Juha Nieminen <nos...@thanks.invalid> wrote:
>> Leigh Johnston <le...@i42.co.uk> wrote:
>> > Array indices are never negative.
>>
>>   Do you know what value std::string::npos usually is? That's right,
>> a value which would otherwise be a completely valid index, but which
>> is just agreed to mean "not any valid index". Usually this value is
>> size_t(-1). In other words, the natural value for npos would be -1,
>> but since the index type is unsigned, it cannot be used.
>>
> In this case -1 is merely a shorter way of saying
> std::numeric_limits<size_t>::max(), and once you've typed that a few
> times you come to appreciate (size_t)-1.

It is true that std::string uses the default allocator where size_type is
the same as std::size_t. However, basic_string class template can be
specialized to use other allocators where the size_type may be different
from std::size_t. Thus the npos value can in general be expressed as
my_string_type::size_type(-1).

There is no need the mess with numeric_limits or to cast anything, one
should use the npos constant directly. Example:

std::string s;
if (s.find('x')==s.npos) { //...


> Once you realize that you
> also realize that in fact npos could never be a valid index since
> valid indexes are strictly less than the size of the array (or in this
> case string) you are indexing into, and the maximum possible number of
> addressable bytes is precisely std::numeric_limits<size_t>::max, or
> npos.

Nope, the maximum number of addressable bytes is std::numeric_limits
<size_t>::max()+1. Are you trying to claim that if I use a 8-bit unsigned
char for indexing I cannot address the full 256-byte array? Or are you
trying to say that an operating system cannot access the last byte in the
address space?

> So (size_t)-1 is a reasonable value for npos to have - it is the
> one and only unsigned value that could not be a valid index in any
> imaginable system.

Does not follow as the assumptions are wrong.

>Using signed indexes would, on the other hand, make
> 50% of otherwise perfectly good bit patterns unusable as indexes. I
> don't know that that's a big problem, but your argument for preserving
> an "otherwise completely valid index" comes down on the opposite side
> than you intended.

Does not follow either.

Cheers
Paavo

Marc

unread,
Feb 14, 2011, 2:00:19 AM2/14/11
to
Paavo Helde wrote:

> Bjarke Hammersholt Roune <bjarke...@gmail.com> wrote in news:a7e3c7d2-
> 8b96-4067-85a...@f30g2000yqa.googlegroups.com:

>> In this case -1 is merely a shorter way of saying
>> std::numeric_limits<size_t>::max(), and once you've typed that a few
>> times you come to appreciate (size_t)-1.

[...]


>> Once you realize that you
>> also realize that in fact npos could never be a valid index since
>> valid indexes are strictly less than the size of the array (or in this
>> case string) you are indexing into, and the maximum possible number of
>> addressable bytes is precisely std::numeric_limits<size_t>::max, or
>> npos.
>
> Nope, the maximum number of addressable bytes is std::numeric_limits
> <size_t>::max()+1.

What should s.length() return in that case?

Paavo Helde

unread,
Feb 14, 2011, 2:45:45 AM2/14/11
to
Marc <marc....@gmail.com> wrote in
news:ijaju3$e6b$1...@news-rocq.inria.fr:

Implementation of string class is a different topic from the addressable
space. The string class is required to return the size of the largest
container via the max_size() member function. The return value is of type
size_type, which cannot hold value std::numeric_limits<size_type>::max()+
1, so effectively such strings cannot be supported and the question is
moot.

This does not mean that max_size() must return exactly
std::numeric_limits<size_type>::max(). This value may be much smaller,
leaving potentially a lot of freedom of choosing a sentinel value for
npos.

Cheers
Paavo


Bjarke Hammersholt Roune

unread,
Feb 14, 2011, 3:29:55 AM2/14/11
to
> > Once you realize that you
> > also realize that in fact npos could never be a valid index since
> > valid indexes are strictly less than the size of the array (or in this
> > case string) you are indexing into, and the maximum possible number of
> > addressable bytes is precisely std::numeric_limits<size_t>::max, or
> > npos.
>
> Nope, the maximum number of addressable bytes is std::numeric_limits
> <size_t>::max()+1. Are you trying to claim that if I use a 8-bit unsigned
> char for indexing I cannot address the full 256-byte array? Or are you
> trying to say that an operating system cannot access the last byte in the
> address space?
>
Correct, an OS can address all its memory because it does not address
that memory as indexes into std::string. However, strings are not like
raw arrays in that strings store their size. They store their size in
a size_t so the size cannot exceed npos. Hence the maximum possible
valid index is npos - 1.

> >Using signed indexes would, on the other hand, make
> > 50% of otherwise perfectly good bit patterns unusable as indexes. I
> > don't know that that's a big problem, but your argument for preserving
> > an "otherwise completely valid index" comes down on the opposite side
> > than you intended.
>
> Does not follow either.
>

You want std::string to support negative indexes?

Cheers
Bjarke Hammersholt Roune

Michael Doubez

unread,
Feb 14, 2011, 3:54:37 AM2/14/11
to


Then lets take another example:
int array1[2][5];
int (&array2) [5] = array1[1];

array2[-1] = 42;

You will admit that array2 is a reference to a 'array of 5 int
element'.
And I can rightfully use a -1 indexing.

>
> > You've actually raised a significant point with regards to
> > Leigh's argument: it's find to say that "array indices are never
> > negative", except that in C and in C++, there aren't really any
> > such things as array indices, since there is no array
> > indexation; only pointer arithmetic.
>
> Not true; an array is a distinct concept, a pointer is a distinct concept:
>
> template <typename T>
> void is_same_i_dont_think_so(const T&, const T&) {}
> ...
> is_same_i_dont_think_so(str, p_str);

int array3[5];
is_same_i_think_so(array2,array3);


> Just because an array can decay to a pointer does not mean that arrays
> are the same as pointers.

But the subscript operator on an array has exactely the same effect as
on a pointer: E1[E2] <=> *(E1+E2)

Now just to get the misunderstanding out, I think you are right for a
given definition of index: 'index' meaning the i-th element - where it
therefore cannot be negative.

But we commonly use 'index' to name the integral part of the subscript
operator and in this case, it can be negative.

--
Michael

Leigh Johnston

unread,
Feb 14, 2011, 9:30:28 AM2/14/11
to
On 14/02/2011 08:54, Michael Doubez wrote:

Indeed this is a case I did not consider when posting, however...

[snip]

>
>> Just because an array can decay to a pointer does not mean that arrays
>> are the same as pointers.
>
> But the subscript operator on an array has exactely the same effect as
> on a pointer: E1[E2]<=> *(E1+E2)

n3225 says:

"Because of the conversion rules that apply to +, if E1 is an
array and E2 an integer, then E1[E2] refers to the E2-th member of E1."

>
> Now just to get the misunderstanding out, I think you are right for a
> given definition of index: 'index' meaning the i-th element - where it
> therefore cannot be negative.

I agree.

>
> But we commonly use 'index' to name the integral part of the subscript
> operator and in this case, it can be negative.

Given what the standard says I am not convinced that -1 is an allowable
array index as an array does not have an element at the i-th position if
i is negative.

Similarly I am not convinced that the following is allowed by the
standard for a similar reason:

int array1[2][5];
int (&array2) [5] = array1[0];

array2[5] = 42;

I think the standard could perhaps make it clear that for operations on
an array reference to a "sub-array" of a larger array the allowable
index can be out of bounds of the "sub-array".

The fact that these array operations are probably fine on most if not
all sane implementations is a different issue.

/Leigh

James Kanze

unread,
Feb 14, 2011, 9:40:47 AM2/14/11
to
On Feb 13, 1:46 pm, Leigh Johnston <le...@i42.co.uk> wrote:
> On 13/02/2011 12:47, James Kanze wrote:
> > On Feb 12, 9:29 pm, "Paul"<pchris...@yahoo.co.uk> wrote:
> >> "Leigh Johnston"<le...@i42.co.uk> wrote in message

> >>news:8q2dnYQBytKUasvQ...@giganews.com...
> >>> On 12/02/2011 19:27, Juha Nieminen wrote:
> >>>> Leigh Johnston<le...@i42.co.uk> wrote:
> >>>>> Array indices are never negative.

> >>>> Do you know what value std::string::npos usually is? That's right,
> >>>> a value which would otherwise be a completely valid index, but which
> >>>> is just agreed to mean "not any valid index". Usually this value is
> >>>> size_t(-1). In other words, the natural value for npos would be -1,
> >>>> but since the index type is unsigned, it cannot be used.

> >>>> Even if we conceded that array indices are never negative, the result
> >>>> of subtracting to array indices can be, in which case you have to do the
> >>>> manual casting to avoid problems. (Why do you think ptrdiff_t is signed?)

> >>> Array indices are never negative; the result of subtracting array indices
> >>> cannot be used as an array index if the result would otherwise be negative
> >>> (if signed).

> >> char str[24];
> >> char* p_str = str+12;
> >> p_str[-12] ='A';

> > Touch .

> Touch ? I don't think so. Pointers are not arrays.

Red herring. Nobody said they were.

> In C and C++ array
> indexes are never negative. An array is a distinct concept in C/C++ and
> has a distinct type; "p_str" above could subsequently be made to point
> to something else (other than an array).

Arrays and pointers are two very distinct things, but in an
expression, there is no way to index into an array. You have to
convert the array to a pointer first.

> > You've actually raised a significant point with regards to
> > Leigh's argument: it's find to say that "array indices are never
> > negative", except that in C and in C++, there aren't really any
> > such things as array indices, since there is no array
> > indexation; only pointer arithmetic.

> Not true; an array is a distinct concept, a pointer is a distinct concept:

> template <typename T>
> void is_same_i_dont_think_so(const T&, const T&) {}
> ...
> is_same_i_dont_think_so(str, p_str);

> Just because an array can decay to a pointer does not mean that arrays
> are the same as pointers.

I know that, and you know that I know that. On the other hand,
there is no "array index operator" which operates on an array.
To index into an array, you have to convert the array to
a pointer, and use pointer arithmetic.

--
James Kanze

Paul

unread,
Feb 14, 2011, 10:02:50 AM2/14/11
to

"James Kanze" <james...@gmail.com> wrote in message
news:5706b922-26a7-4883...@o20g2000yqk.googlegroups.com...

Boost::multi_array also provides support for negative array bases.
Also note that it claims to be be more efficient than vector, which seems to
be Leighs preferred std::container.
"Boost MultiArray is a more efficient and convenient way to express
N-dimensional arrays than existing alternatives (especially the
std::vector<std::vector<...>> formulation of N-dimensional arrays). "

Leigh Johnston

unread,
Feb 14, 2011, 10:28:55 AM2/14/11
to

A compiler can optimize indexing into a local array as direct memory
read of the stack (i.e. offsetting the stack pointer) without performing
any pointer arithmetic at all.

Please cite the part of the Standard which says that indexing into an
array *requires* converting the array into a pointer. Although the
array/pointer *equivalence* exists it is not a *requirement* that such
*equivalence* always be utilized by a compiler (implementation) as far
as I can tell (I could be wrong; I am not familiar with the entire
Standard).

/Leigh

gwowen

unread,
Feb 14, 2011, 11:06:02 AM2/14/11
to
On Feb 14, 3:28 pm, Leigh Johnston <le...@i42.co.uk> wrote:

> Please cite the part of the Standard which says that indexing into an
> array *requires* converting the array into a pointer.  Although the
> array/pointer *equivalence* exists it is not a *requirement* that such
> *equivalence* always be utilized by a compiler (implementation) as far
> as I can tell (I could be wrong; I am not familiar with the entire
> Standard).

Well, the "as if" rule applies as normal, so there is no requirement
on the compiler to implement array subscript as *((E1)+(E2)). All
that is required is that array subscripting E1[E2] behaves in a way
which is functionally identical to *((E1)+(E2)). If you don't want to
describe that as "utilized by a compiler", that's your perogative, but
it is literally a distinction without a difference.

It is loading more messages.
0 new messages