I can't decide whether how to handle standard types vs compiler
specific types.
I've thus far always opted to make a best effort to provide interfaces
that use standard types. However, I am now running into problems doing
that.
All my code that returned sizes is now returning 64 bit values, since
MS has made size_t 64 bits. My code used to treat these as 32 bit
unsigned integers. Heck, even the some of the Windows APIs expect 32
bit unsigned integers when passing parameters, but I get 64 bit
unsigned integers.
Anyway, is it bad practice for me to just give in and start using the
defined types MS provides instead of standard types? I am beginning to
believe it might make my job alot easier, even if my code is no longer
portable.
Also, should I start using 64 bit types whenever possible vs 32 bit
types, even if I don't think the values will grow that large? Does it
make a difference?
For instance I used to have a class something like this:
class MySpecialContainer
{
public:
const unsigned GetNumElements() const
{
return m_udts.size();
}
private:
unsigned m_numElements;
typedef std::vector<UDT> UDTVec;
UDTVec m_udts;
}
Now the return value of size() is 64 bits and does not convert to an
unsigned.
I could go and change everything to return size_t, but then I have to
change everywhere it is passed as a parameter, and half the APIs I
call are asking for a UINT, even the Windows APIs. I guess because
they expect the value to never grow that big. So, then I have to
static_cast<unsigned>( mySize_T) back again. I don't know what kind of
rules to adapt for these situations.
The 'const' for the result type is superfluous and should be omitted.
The 'Get' in the name is superfluous and should be omitted (it may have some
purpose in Java but in C++ it's only noise, i.e. it has some negative impact for
readability etc. and adds nothing positive, so it's irrational to use it).
And even though the standard library has size methods with unsigned result type,
also that is an abomination. It's just sh*tty design, very much so, to use a C++
built-in type, where there's no guaranteed overflow checking, to indicate some
limited value range (technically: (1) it has no advantages, and (2) it has lots
of problems, including very serious ones, so (3) it's just sh*tty). Instead use
a signed type such as 'int' or 'long' or, if you absolutely must, 'ptrdiff_t'.
> {
> return m_udts.size();
> }
>
> private:
> unsigned m_numElements;
>
> typedef std::vector<UDT> UDTVec;
> UDTVec m_udts;
> }
>
>
> Now the return value of size() is 64 bits and does not convert to an
> unsigned.
It should convert.
If not then your compiler is very non-standard.
However, you should not use 'unsigned' here (see above).
Cheers & hth.,
- Alf
--
Due to hosting requirements I need visits to <url: http://alfps.izfree.com/>.
No ads, and there is some C++ stuff! :-) Just going there is good. Linking
to it is even better! Thanks in advance!
I am using VC 2008 SP1
Well, it does convert in the sense that I assign a 64 bit value to a
32 bit value with a compiler warning. However, I don't think having
9000 warnings is a good idea. I think the warning is legitimat too,
because who is to say I might not have a number of elements exceeding
the bounds of a 32 bit integer in the far off future?
I just don't know how to handle it. An int is 32 bits, a long is 32
bits, an unsigned int is 32 bits, but the size is 64 bits.
This is what I have to go on: http://msdn.microsoft.com/en-us/library/94z15h2c(VS.80).aspx
So, I am still unsure whether I should be using thier INT64 type
instead of the built in int type.
I am not even sure if __int64 is standard or also one of thier types.
-----------------------------------------------
I hate Google Groups!!!
I hate Time Warner for killing usenet access!!!
I really don't want to pay $15/mo to read newsgroups :(
That's where you use a static_cast.
> I think the warning is legitimat too,
> because who is to say I might not have a number of elements exceeding
> the bounds of a 32 bit integer in the far off future?
You.
The trick is to balance guaranteed work now against marginally possible work in
the future sometime.
Consider that in the future, if that issue ever comes to the front, you might
just use LargeSize for 64 bit sizes. Or whatever. Done.
> I just don't know how to handle it. An int is 32 bits, a long is 32
> bits, an unsigned int is 32 bits, but the size is 64 bits.
In your original code you used 'unsigned', which is only guaranteed to have 16 bits.
So it's not a portability question.
It's only a question of what you design for in your current environment. Then
design for what's most practical.
> This is what I have to go on: http://msdn.microsoft.com/en-us/library/94z15h2c(VS.80).aspx
I refuse to look up Microsoft documentation for a C++ issue.
> So, I am still unsure whether I should be using thier INT64 type
> instead of the built in int type.
No, you should not. Just the fact that it's all uppercase tells you that it was
designed by an incompetent (that is, Microsoft). Define your own signed Size type.
> I am not even sure if __int64 is standard or also one of thier types.
It is a vendor defined type, as you can see from the double underscore.
I thought this newsgroup was about the C++ language, not about
programming guidelines. Suggesting to avoid using "get" in a getter
method is as superfluous here as suggesting eg. that one should use
reverse polish notation or how many spaces should be used for
indentation. It's a matter of style, not a matter of whether it's a
standard C++ feature.
> And even though the standard library has size methods with unsigned
> result type, also that is an abomination.
How do you suggest a program being able to use the entire address
space of the target architecture with a signed indexing type? Even if
you can address the entire address space with a signed type (as the CPU
will probably internally use the value as an unsigned type), you still
get problems when comparing indices with < when the indices go over the
half-way mark.
Or are you going to say "nobody will ever need more than half of the
address space of any computer architecture"?
It's about use of the C++ programming language. I'm always ready to revise my
judgement of what's topical or not. But, deciding on topicality every day for
clc++m, I think I have a pretty good grasp of that, thus, good arguments needed
to affect that judgment. ;-)
I have recently posted some other articles in this ng that were technically off
topic (although most of them of interest to the community).
But the above was not one of those OT articles, and it's just a silly argument.
> Suggesting to avoid using "get" in a getter
> method is as superfluous here as suggesting eg. that one should use
> reverse polish notation or how many spaces should be used for
> indentation. It's a matter of style, not a matter of whether it's a
> standard C++ feature.
It is bad style in C++ precisely because C++ doesn't have any language feature
to make use of it (Java does have such a feature).
Reverse polish notation is also bad, but for another reason: C++ does have
language features that make that notation redundant and error prone, and since
it lowers readability and can easily become misleading, it's just dumb.
Both bad habits share the property of once having been meaningful and work
saving devices in some original environment (respectively Java for "Get" prefix
and Microsoft Programmers' Workbench for reverse polish notation) and then just
being silly muddifying stuff, causing extra work and lowering readability and
maintainability, when used in modern C++.
It's like handcranking the engine of your modern car instead of using starter key.
A frozen habit from ages ago, once meaningful, now the opposite.
>> And even though the standard library has size methods with unsigned
>> result type, also that is an abomination.
>
> How do you suggest a program being able to use the entire address
> space of the target architecture with a signed indexing type? Even if
> you can address the entire address space with a signed type (as the CPU
> will probably internally use the value as an unsigned type), you still
> get problems when comparing indices with < when the indices go over the
> half-way mark.
That's a misunderstanding, sort of urban legend, unfortunately still bandied
about as if it were meaningful.
It isn't meaningful.
For in order to make use of the extra range of unsigned you need a /character/
(byte) array larger than one half of the address space. For 32-bit programs you
don't even have that address space available in Windows. For 64-bit programs,
when was the last time you needed a 2^63 bytes character array? And can you cite
any common computer with that much memory? I guess there might be a Cray...
Then, you're talking about /always/ using unsigned indexing in order to support
the case of using a Cray to address a larger than 2^63 character array. Hello.
Thus, when reduced to a concrete issue rather than hand waiving, it's not
meaningful at all, just technically bullshit. :-)
It's best forgotten!
> Or are you going to say "nobody will ever need more than half of the
> address space of any computer architecture"?
That's a fallacy. Using signed indexing doesn't mean you can't use that much. It
means that if you need that much memory and can't reach it via indexing, then it
is necessarily for a character array that large. The OP's code will never be
used for a character array that large. Nor will my code or yours. I think. :-)
So? You have never used byte arrays?
> For
> 32-bit programs you don't even have that address space available in
> Windows.
So C++ is now a Windows-exclusive programming language? Since when?
Besides, you are wrong: In 32-bit Windows the limit is 3 GB. (Granted,
you have to turn this feature on, as it's by default 2 GB, but it's
perfectly possible for programs to use up to 3 GB of memory. If you
don't know how, then maybe you should study the subject more.)
> For 64-bit programs, when was the last time you needed a 2^63
> bytes character array?
"640 kilobytes ought to be enough for anybody."
Aren't we already past these?
Sorry, that's all grossly irrelevant.
Cheers, & hth.,
Maybe add an assert before casting to make sure not to overflow.
How would changing 'unsigned' to 'UINT' help you in the example below?
Most likely, 'UINT' is a typedef for 'unsigned int' anyway.
>
> Also, should I start using 64 bit types whenever possible vs 32 bit
> types, even if I don't think the values will grow that large? Does it
> make a difference?
>
> For instance I used to have a class something like this:
>
> class MySpecialContainer
> {
> public:
>
> const unsigned GetNumElements() const
> {
> return m_udts.size();
> }
>
> private:
> unsigned m_numElements;
>
> typedef std::vector<UDT> UDTVec;
> UDTVec m_udts;
>
> }
>
> Now the return value of size() is 64 bits and does not convert to an
> unsigned.
> I could go and change everything to return size_t, but then I have to
> change everywhere it is passed as a parameter, and half the APIs I
> call are asking for a UINT, even the Windows APIs. I guess because
> they expect the value to never grow that big. So, then I have to
> static_cast<unsigned>( mySize_T) back again. I don't know what kind of
> rules to adapt for these situations.
You should look at the application logic.
If UINT_MAX has always been a few orders of magnitude larger than the
values returned from GetNumElements(), what makes you think that on a
64-bit platform your container will be storing so many more elements
that you actually need to count them in 64 bits?
If your problem is that the compiler has started to complain about the
conversion from size_t to unsigned, then you should use a cast in
GetNumElements itself (after you verified that 32-bit is indeed more
than enough). For added safety, you could add an 'assert(my_udts.size
() < UINT_MAX)'
Bart v Ingen Schenau
And what if the local coding rules state that "The name of a function
must start with an action verb", which is not an uncommon rule.
What verb do you propose to use instead of Get?
>
> And even though the standard library has size methods with unsigned result type,
> also that is an abomination. It's just sh*tty design, very much so, to use a C++
> built-in type, where there's no guaranteed overflow checking, to indicate some
> limited value range (technically: (1) it has no advantages, and (2) it has lots
> of problems, including very serious ones, so (3) it's just sh*tty). Instead use
> a signed type such as 'int' or 'long' or, if you absolutely must, 'ptrdiff_t'.
Do you have a document number for your formal proposal to deprecate
the unsigned types?
>
> Cheers & hth.,
>
> - Alf
Bart v Ingen Schenau
Of course. Anything that makes your point weaker is "grossly
irrelevant". Anything that supports your point is, however, relevant.
I understand now.
I'm sorry, that just innuendo.
> I understand now.
It seems you don't.
To have any argument at all there must be some connection with the issue
discussed. Hand-waiving doesn't cut it. It must be concrete.
Cheers & hth.,
Just take it in stride and comply with the coding guidelines.
It's easy.
A coding guideline including a rule that adds needless work instead of saving
work just means you're working in a less than perfect environment. But nothing's
perfect. The question is whether the totality of rules is such that frustration
level is more than you can tolerate (at least for some persons there is a limit
to how much idiocy one can abide with) -- but that's a personal decision.
>> And even though the standard library has size methods with unsigned result type,
>> also that is an abomination. It's just sh*tty design, very much so, to use a C++
>> built-in type, where there's no guaranteed overflow checking, to indicate some
>> limited value range (technically: (1) it has no advantages, and (2) it has lots
>> of problems, including very serious ones, so (3) it's just sh*tty). Instead use
>> a signed type such as 'int' or 'long' or, if you absolutely must, 'ptrdiff_t'.
>
> Do you have a document number for your formal proposal to deprecate
> the unsigned types?
It's not a practical proposition to change the language or standard library in
that regard, nor is your particular proposed solution one that anyone competent
would want (unsigned types do have their uses, especially for bitlevel
manipulation). So I'm sorry but that question contains an invalid assumption of
the "have you stopped beating your wife" sort. It's purely silly-rhetorical.
Cheers & hth.,
- Alf
--
... and standard people are right and you are wrong ;-).
Name advantages and problems. You didn't do it, you just showed that
you gain nothing by using unsigned. That's neither advantage nor a
problem, and that was your only argument.
Here, some advantages of unsigned:
1. size is a __count__. Conceptually, it's a natural number. Same for
indexing. There's no element at position -1. Again, conceptually, it's
a natural number. The data type used (unsigned) reflects that. (IMHO,
this reason alone removes any need for a discussion).
2. in practice, an underflow with unsigned on raw arrays and some
(past?) implementations of STL leads to an earlier crash than going in
with e.g. -1.
Goran.
It is a Very Bad Idea to choose a C++ unsigned type to indicate numerical range,
because it doesn't enforce it. In Pascal, yes, there is a benefit; in Ada, yes,
there is a benefit; in C++, no, there is no benefit. On the contrary, in C++, by
doing that, you get all the problems and no benefits, hence it's plain stupid.
And I write that in spite of having mindlessly done the stupid thing for many
years (coming from Pascal/Modula-2, and seeing some experts doing the same in
C++), and having argued vociferously for it also in this group.
But it's never to late to learn, and when you start thinking about why the heck
you need an additional /hint/ that a size is non-negative, isn't "size" hint
enough, how dumb can a programmer be, really?, you realize that there's
something really fishy. Then thinking about actual benefits you find none, only
hand-waiving. Then thinking about problems and you find an abundance.
> 2. in practice, an underflow with unsigned on raw arrays and some
> (past?) implementations of STL leads to an earlier crash than going in
> with e.g. -1.
This is pretty unclear, but unsigned opens the door for more bugs, so this
argument about probability of detecting those bugs is pretty lame. :)
The problems with unsigned types are well known.
Your compiler, if it's any good, will warn you about comparisions
unsigned/signed. Those warnings are serious. Where you have such type mismatch
(which results from unsigned) you often have a bug. For example, if size is
unsigned and n is negative signed of small absolute value (which is typical),
then 'size < n' will often yield true because n is promoted to unsigned.
Your compiler cannot, however, warn you about arithmetic problems.
There's a host of bug vectors in that, including the main example of loop
counting down (incorrectly expressed).
One main problem is that you have to add casts and/or unnaturally choose types
and/or unnaturally express loops to avoid warnings and/or bugs (like infinite
loops) where you just know that those values are all non-negative. There would
have been no such problems with signed size types. So there's a lot of work
added just to avoid the problems one wouldn't have with signed sizes.
And a natural solution to that is to define one's own signed Size type.
And then define some generic function like
template< typename T > Size size( T const& );
Along with generic 'begin' and 'end' (that also work for raw arrays), and so on.
Hip hurray, all that **** removed.
Oh well, OK, not all, not by far. For such solutions cannot easily be
retrofitted onto existing code. But I argue for adopting the more sane way of
doing things for new code, including maintainance of old. :-)
Cheers, & hth.,
Viva64 is a static code analyzer which was made to simplify porting
programs to 64-bit systems, and by means of that to reduce necessary
expenses. Viva64 works with C and C++ program code intended for 64-bit
Windows operation system. Viva64 is a connecting unit which integrates
into Visual Studio 2005/2008 development environment.
Using Viva64 static code analyzer allows to release a 64-bit
application 3 or 4 times faster. These results are reached because the
process of code analysis and testing is sped up considerably. This
requires taking into consideration specific character of 64-bit
architecture. The time devoted to testing of a developed 64-bit
software product is shortened owing to its higher quality. The
methodology of static code analysis used in Viva64 has some essential
advantages in comparison with other sorts of analysis because it
allows covering the whole program code. The procedure of code
verification cannot somehow damage the code itself. The process of
analysis is entirely controlled by a person and exactly the person
decides whether to modify it. Viva64 possesses a large knowledge base
concerning 64-bit code development (help system, articles, examples),
which will give a boost to the developers. If they follow the
recommendations given in these documents, both quality of code and its
productivity will be raised.
Viva64: What Is It, and Who Is It for?
http://www.viva64.com/art-1-2-903037923.html
64 bits, Wp64, Visual Studio 2008, Viva64 and all the rest...
http://www.viva64.com/art-1-2-621693540.html
20 issues of porting C++ code on the 64-bit platform
http://www.viva64.com/art-1-2-599168895.html
private:
/*I do not know the reason for the following but anyway*/
size_type m_numElements;
UDTVec m_udts;
}
Yes. If we cannot talk about such things here, where should we do it?
I am here to become a better C++ programmer and having more fun while
doing it, by hearing what other people do, what they don't do, what
they like and so on.
Put differently, to me the culture and conventions *around* C++ is on
topic here. Not just the things you can quote from the standard.
That's also largely how this group works in practice. But sometimes
people play the "offtopic" card a bit too often, refuse to hypothesize
around things not covered by the standard, and so on.
/Jorgen
--
// Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
\X/ snipabacken.se> R'lyeh wgah'nagl fhtagn!
> So, I am still unsure whether I should be using thier INT64 type
> instead of the built in int type.
> I am not even sure if __int64 is standard or also one of thier types.
I wouldn't use a type with a fixed size for no reason, and I hope the
Windows APIs do not have them in their signatures except where really
needed.
Your problem code upthread started with the size of a std::vector,
which is std::vector::size_type. Call me sloppy, but I'd handle it as
a size_t. I have no reason to claim to know that this type is 64 bits
(it isn't on half of my targets). Windows shouldn't have to know,
either.
I also wonder if you're not exaggerating the problem. Why would you
pass the size of a std::vector (or an index into a vector) into a
Windows API? I can't see that this would be very common.
Then the issue becomes one of design. If the natural verb for an action
performed by a method is "get", that is indicative of a design smell.
If it were "calculate" then the method does something useful.
--
Ian Collins
Why use a non-portable type when we have size_t?
Keep the platform specific details of your size type out of the user
code and in the platform specific headers.
If range (and to keep Alf happy, range checking) is important, give the
container its own range type.
class MySpecialContainer
{
public:
typedef <your choice> size_type;
Then use this type for all size related returns.
This could be expanded to something like:
class MySpecialContainer
{
public:
#if DEBUG_MODE
typedef SomeBoundedType size_type;
#else
typedef size_t size_type;
#endif
--
Ian Collins
It seems Alf is claiming there are inherant problems with indexing
using an unsigned type. However, I am unclear on what those specific
problems are. I am not arguing it, but I haven't seen anything
mentioned that point them out. I did see mention of comparing signed
and unsigned, however I fail to see how that is relevant? I mean I'd
just cast the unsigned to a signed type before for the comparision
right? Alf assumes they are obvious. They are not obvious to me, but
then Alf has more experience than I.
I also read someone questioning my statement that the Windows APIs are
going to expect the size back as a UINT, which they made 32 bits.
Well, not directly. They ask for number of elements as a parameter to
a number of functions. Naturally the number of elements is going to
come from my std container since that is what I am using to store
elements. So, inevitablly (cant spell) I will have to take my 64 bit
value representing the number of elements in my stl container and
change it to a 32 bit value somewhere along the line. I can move the
problem wherever I wish, but the problem is still there.
If I use size_t then, that is fine until I have to convert it to a 32
bit value in order to pass it to an API call. See above.
I also saw, use assert, which sounds good to me. I could assert and
then cast. My only worry is that assert only shows up in debug right?
Perhaps, I should just write a utility method that casts a 64 bit
value to a 32 bit value, which checks the bounds, and throws an
exception when the numbers don't fit.
i am sure it will be a rare case, if any case at all. However, I want
my code to be very good. I plan to use it for demonstration while job
hunting. I don't want anyone to look at it and think, "look at this
guy, clearly ignored these cases, not very thorough"
This group is about programming using C++. comp.std.c++ is about the C++
standard/language itself.
> Suggesting to avoid using "get" in a getter
> method is as superfluous here as suggesting eg. that one should use
> reverse polish notation or how many spaces should be used for
> indentation. It's a matter of style, not a matter of whether it's a
> standard C++ feature.
It's actually relevant, because naming conventions when programming in C++
might be specific to the language, for example to fit in with the naming
conventions used in the C++ standard library. In this case for example,
the library uses things like size() to get the size, resize() to change
it, thus one might want to avoid calling something get_size() in C++.
> > And even though the standard library has size methods with unsigned
> > result type, also that is an abomination.
>
> How do you suggest a program being able to use the entire address
> space of the target architecture with a signed indexing type? Even if
> you can address the entire address space with a signed type (as the CPU
> will probably internally use the value as an unsigned type), you still
> get problems when comparing indices with < when the indices go over the
> half-way mark.
>
> Or are you going to say "nobody will ever need more than half of the
> address space of any computer architecture"?
On most machines, an unsigned size_t is only necessary for indexing arrays
of char. In other cases, ptrdiff_t can hold all valid indicies. One should
prefer a signed type in most cases, since arithmetic using it behaves more
normally.
You won't become any better of a C++ programmer if you discuss whether
you should use the word "get" in getter method names or not any more
than you will if you discuss eg. whether 2 or 4 spaces of indentation is
better or whether you should use camel-case in variable names.
Those types of discussion just aren't useful nor relevant. They are
completely a matter of taste, and your program will not become any
better or worse depending on it (as long as you use a clear style and
you use it consistently).
Things which are not really part of the C++ standard but which *can*
concretely improve your C++ programming are things like "you should
usually avoid allocating things with 'new' if an STL container suffices
equally well for the same task" and such. There are certain programming
practices which will help you write better C++ programs. However,
whether or not you should use "get" in a getter is not such a practice.
It seems you choose to have a very limited view of software engineering.
The idea of referential transparency for getters was and is one fundamental idea
of the Eiffel language. In Eiffel you can freely (by design of the language)
change the representation of a data member from member variable to accessor
function and back, without affecting client code. Now think how easy or not that
is if you have to keep renaming the thing all the time to comply with a silly
requirement to have an utterly redundant prefix or suffix on one form, and think
of whether a main influence of a language like Eiffel can be irrelevant.
Such prefixes that indicate type or implementation aspect are the fundamental
idea of Hungarian notation. It will increase your and others' efficiency if you
stay away from Hungarian notation (which you probably already do). Think about
whether that is irrelevant to C++ programming. Then, but think about it first!,
apply that insight not only to Hungarian notation but also to other
manifestations of same the in modern C++ counter-productive idea, like, for
example, "Get" prefixes. Then, but think about it first!, think also about why
the situation is different in e.g. Java, i.e., why this ties in specifically to
C++ programming.
Cheers & hth.,
- Alf (unfortunately, can only show rough map of terrain, you have to walk it)
> I thought this newsgroup was about the C++ language, not about
> programming guidelines.
Programming guideslines for C++ are relevant. The problem with
Alf's comments concerning Get is not that he's necessarily
wrong; it's more or less an open issue, and he has a right to
his opinion. (Also, I agree with him to a point, so he can't be
all wrong.) The problem is that when reading his posts, he
seems to be putting the use of "get" in the name on the same
level as, say, dereferencing a null pointer. The difference
between "I find using `get' to be a bad idea", "It's generally
accepted that names in all caps should be reserved for macros",
and "This isn't legal C++" isn't coming accross.
> Suggesting to avoid using "get" in a getter method is as
> superfluous here as suggesting eg. that one should use reverse
> polish notation or how many spaces should be used for
> indentation. It's a matter of style, not a matter of whether
> it's a standard C++ feature.
Which doesn't mean it can't be discussed. It does mean,
however, that he should make it clear that 1) it's not a problem
on the same level as e.g. dereferencing a null pointer, or even
using all caps for a variable name, and 2) in this case, it
happens to be his particular opinion---not all experts agree.
> > And even though the standard library has size methods with
> > unsigned result type, also that is an abomination.
> How do you suggest a program being able to use the entire
> address space of the target architecture with a signed
> indexing type? Even if you can address the entire address
> space with a signed type (as the CPU will probably internally
> use the value as an unsigned type), you still get problems
> when comparing indices with < when the indices go over the
> half-way mark.
> Or are you going to say "nobody will ever need more than half
> of the address space of any computer architecture"?
No. He's saying that most programs won't need one single byte
array that is larger than have the address space. And that
using "unsigned" to gain address space is a loosing proposition;
if you need between 0.5 and 1 times the address space today (for
a single object), you'll need more than 1 times tomorry, so you
might as well prepare for it.
--
James Kanze (GABI Software) email:james...@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
On 3 Apr, 06:08, "Alf P. Steinbach" <al...@start.no> wrote:
> * Christopher:
> > const unsigned GetNumElements() const
(snip)
> And even though the standard library has size methods with unsigned result type,
> also that is an abomination. It's just sh*tty design, very much so, to use a C++
> built-in type,
size_t seems the ideal type for returning a value which indicates how
many of something you've got, as this is more or less what it was
designed for. (Or do you not count size_t as "built-in"?)
Why is it so dreadful for the standard library to return built-in
types, or are you not actually objecting to this per se?
> where there's no guaranteed overflow checking,
Unsigned types don't check for overflow, but they do produce defined
results when they do overflow.
Signed tyes generally don't check for overflow either, and it is
undefined behaviour if they do overflow. Why is this an improvement?
> to indicate some
> limited value range (technically: (1) it has no advantages, and (2) it has lots
> of problems, including very serious ones,
Agreed. Try googling for "{23,34,12,17,204,99,16};" I take it this is
the sort of thing you had in mind?
> so (3) it's just sh*tty). Instead use
> a signed type such as 'int' or 'long' or, if you absolutely must, 'ptrdiff_t'.
Why do you want a signed type to indicate a quantity, which can't be
negative? Aren't you wasting half its potential values?
> It's actually relevant, because naming conventions when
> programming in C++ might be specific to the language, for
> example to fit in with the naming conventions used in the C++
> standard library. In this case for example, the library uses
> things like size() to get the size, resize() to change it,
> thus one might want to avoid calling something get_size() in
> C++.
This would be a stronger argument if the library were even
half-way consistent. Following Alf's reasoning elsethread
(which I basically agree with, by the way, although I think he's
overstating the issue), the argument is fundamentally between:
int size() const ;
void size( int newSize ) ;
and
int getSize() const ;
void setSize( int newSize ) ;
I definitely prefer the former, but...
The code will work just as well, and in fact be just as readable
with the latter. And if all of the existing code uses the
latter, or if the majority of your collegues prefer the latter,
it's better to be consistent, rather than have some use one, and
some another. (And even if I think they're wrong to prefer the
latter, there are a lot more important things to convince them.)
> > > And even though the standard library has size methods with
> > > unsigned result type, also that is an abomination.
> > How do you suggest a program being able to use the entire
> > address space of the target architecture with a signed
> > indexing type? Even if you can address the entire address
> > space with a signed type (as the CPU will probably
> > internally use the value as an unsigned type), you still get
> > problems when comparing indices with < when the indices go
> > over the half-way mark.
> > Or are you going to say "nobody will ever need more than
> > half of the address space of any computer architecture"?
> On most machines, an unsigned size_t is only necessary for
> indexing arrays of char. In other cases, ptrdiff_t can hold
> all valid indicies. One should prefer a signed type in most
> cases, since arithmetic using it behaves more normally.
Exactly. In this case, there is a reasonably strong technical
argument for using int as an index, instead of unsigned. On the
other hand, there are even stronger technical reasons for not
mixing signedness, and since the standard library got it wrong,
you're often stuck with size_t, when you shouldn't be.
(Logically, of course: an index is the difference between two
addresses. And the difference between two addresses is a
ptrdiff_t, not a size_t.)
Why? It's a classic application of "fail fast" at work: going into an
array with -x __happens__. E.g. bad decrement somewhere gives you -1,
or, bad difference gives (typically small!) -x. Now, that typically
ends in reading/writing bad memory, which is with small negatives
detected quickly only if you're lucky. If, however, that decrement/
subtraction is done unsigned, you typically explode immediately,
because there's a very big chance that memory close to 0xFFFF... ain't
yours.
> The problems with unsigned types are well known.
>
> Your compiler, if it's any good, will warn you about comparisions
> unsigned/signed. Those warnings are serious. Where you have such type mismatch
> (which results from unsigned) you often have a bug.
True, but why are signed and unsigned mixed in the first place? I say,
because of the poor design! IOW, in a poor design, it's bad. So how
about clearing that up first?
> Your compiler cannot, however, warn you about arithmetic problems.
True, but they exist for signed types, too. Only additional problem
with unsigned is that subtraction is more tricky (must know that a>b
before doing a-b). But then, I question the frequency at which e.g.
sizes are subtracted. And even then (get this!), it's fine. Result is
__signed__ and it all works. (Hey, look! Basic math at work: subtract
two natural numbers and you don't get a natural number!) Well, it
works unless you actually work on an array of bytes, but that example
is contrived and irrelevant, I mighty agree with you there.
I also question the relevance of signed for subtraction of indices,
because going into an array with a-b where a<b is just as much of a
bug as with unsigned. So with signed, there has to be a check (if (a-
b>=0)), with unsigned, there has to be a check (if (a>b)). So I see no
gain with signed, only different forms.
> There's a host of bug vectors in that, including the main example of loop
> counting down (incorrectly expressed).
Hmmm... But I see only one vector: can't decrement before checking for
0.
So the two dangers above can take many forms, but honestly, how
difficult is it for someone to grasp the concept? I say, not very.
You claim that these potential bugs are important. I claim that they
are not, because I see very little subtraction of indices in code I
work with, and very little backwards-going loops. That may be
different for you, but I'll still wager that these are overall in low
percentiles.
You also conveniently chose to overlook (or worse yet, call it hand-
waiving) the true nature of a count and an index (they are natural
numbers). I can't see how designing closer to reality can be
pointless.
And so I have to tell you what somebody already told you here: you
seem to adhere to "anything that makes your point weaker is "grossly
irrelevant". Anything that supports your point is, however, relevant."
Goran.
[...]
> > Suggesting to avoid using "get" in a getter method is as
> > superfluous here as suggesting eg. that one should use
> > reverse polish notation or how many spaces should be used
> > for indentation. It's a matter of style, not a matter of
> > whether it's a standard C++ feature.
> It is bad style in C++ precisely because C++ doesn't have any
> language feature to make use of it (Java does have such a
> feature).
I'm not sure what you mean by "feature" here. As I see it,
there are two alternatives:
int size() const ;
void size( int newSize ) ;
and
int getSize() const ;
void setSize( int newSize ) ;
Both work, and both result in readable code. I prefer the
first, but the difference isn't enormous.
And both work equally well in Java as in C++. (And in
beginning, the Java library was sometimes inconsistent in its
choice as well.)
> > How do you suggest a program being able to use the entire
> > address space of the target architecture with a signed
> > indexing type? Even if you can address the entire address
> > space with a signed type (as the CPU will probably
> > internally use the value as an unsigned type), you still get
> > problems when comparing indices with < when the indices go
> > over the half-way mark.
> That's a misunderstanding, sort of urban legend, unfortunately
> still bandied about as if it were meaningful.
> It isn't meaningful.
> For in order to make use of the extra range of unsigned you
> need a /character/ (byte) array larger than one half of the
> address space.
A single byte array. You generally can't have two of them.
Unless your on a system which has a segmented address
architecture (e.g. an Intel processor), under a system which
actually uses it (not Windows or Linux). On a 16 bit Intel,
under MS-DOS, it did occasionally happen that people needed byte
arrays of e.g. 50000 bytes. And you could have several of them
(in different segments), even though a single object couldn't be
more than 2^16 bytes in size. The same thing would be true
today, on a 32 bit Intel, if you had a decent OS for it (instead
of Windows or Linux), although in a very real sense, there's a
much greater difference between 2^15 and 2^16 than between 2^31
and 2^32.
> For 32-bit programs you don't even have that address space
> available in Windows. For 64-bit programs, when was the last
> time you needed a 2^63 bytes character array? And can you cite
> any common computer with that much memory? I guess there might
> be a Cray... Then, you're talking about /always/ using
> unsigned indexing in order to support the case of using a Cray
> to address a larger than 2^63 character array. Hello.
Historically, C (and early C++) ran on 16 bit machines. Some of
which could effectively address more than 2^16 bytes, just not
in the same object. Historically---today, I don't think that
it's really relevant.
Even historically, however: if p and q are pointers into the
same array, and p-q doesn't give you the number of elements
between the two (and isn't negative if p points to an element
after q), then a lot of things (at least in C and C++) break.
Using an unsigned type as an index is a serious design flaw in C
and C++. About the only thing worse is mixing signed and
unsigned types in the same role and/or expression.
> Thus, when reduced to a concrete issue rather than hand
> waiving, it's not meaningful at all, just technically
> bullshit. :-)
> It's best forgotten!
> > Or are you going to say "nobody will ever need more than
> > half of the address space of any computer architecture"?
> That's a fallacy. Using signed indexing doesn't mean you can't
> use that much. It means that if you need that much memory and
> can't reach it via indexing, then it is necessarily for a
> character array that large. The OP's code will never be used
> for a character array that large. Nor will my code or yours. I
> think. :-)
Having argued with you up until now:-).
I can think of one exception: mmap'ing a very large text file in
a relatively simple program. (But although I'd treat the file
as a single, large array, I doubt that I'd use indexes into it.)
Introspection. Which makes it possible to create tools that depend on a certain
naming convention, tools that let you treat a "component" class very generally,
including e.g. design time manipulation. With support from the class!
And the original convention for that in Java was called "Java beans".
Quoting Wikipedia on beans: "The class properties must be accessible using get,
set, and other methods (so-called accessor methods and mutator methods),
following a standard naming convention. This allows easy automated inspection
and updating of bean state within frameworks, many of which include custom
editors for various types of properties.".
> As I see it,
> there are two alternatives:
> int size() const ;
> void size( int newSize ) ;
> and
> int getSize() const ;
> void setSize( int newSize ) ;
> Both work, and both result in readable code. I prefer the
> first, but the difference isn't enormous.
>
> And both work equally well in Java as in C++. (And in
> beginning, the Java library was sometimes inconsistent in its
> choice as well.)
Actually you're right that I did put things to a point. I personally prefer a
mixture, with the "set" prefix. That's because of a preference for readability
and my very subjective opinion of what constitutes readability, he he. :)
And that ties in with that one practical and very C++ specific benefit of
avoiding the prefixes has only to do with "get", not with "set".
Namely, to supporting letting the client code choose to manually optimize
(awkward notation) or not (especially when the compiler does it), by doing
void getPopulationData( Container& c )
{
Container result;
...
result.swap( c );
}
Container populationData()
{
Container c;
getPopulationData( c );
return c;
}
Here client code will preferentially use "populationData", relying on RVO for
the cases where efficiency matters.
If it turns out that the compiler isn't up to the task and measurements show
that efficiency of these calls do matter a lot, then client code can fall back
to using getPopulationData, in the place or places where it affects performance.
Cheers,
- Alf
Uhm, I didn't comment on that because it wasn't necessary given that the
argument was based on detecting bugs caused by signed/unsigned problems.
But consider with signed index that is negative, corresponding to large value
unsigned,
a[i]
If (1) the C++ implementation is based on unchecked two's complement (which is
the usual), then the address computation yields the same as with unsigned index.
So, no advantage for unsigned.
If the C++ implementation isn't based on unchecked two's complement, then either
(2) you get the same as with unsigned index (no advantage for unsigned), or (3)
you get a trap on the /arithmetic/.
So in all three possible cases unsigned lacks any advantage over signed.
This, not from data -- for I haven't any experience that I can recall with
code that supplies negative index (or corresponding with unsigned) -- but from
pure logic, which is a stronger argument, I do question your statement about
unsigned "leads to an earlier crash". The logic seems to dictate that that
simply cannot be true, unless the compiler is perverse. So I'd need to see some
pretty strong evidence to accept that it isn't totally wishful thinking.
>> The problems with unsigned types are well known.
>>
>> Your compiler, if it's any good, will warn you about comparisions
>> unsigned/signed. Those warnings are serious. Where you have such type mismatch
>> (which results from unsigned) you often have a bug.
>
> True, but why are signed and unsigned mixed in the first place? I say,
> because of the poor design! IOW, in a poor design, it's bad. So how
> about clearing that up first?
Yes, that's one thing that signed sizes can help with (the main other thing
being cleaning up redundant and unnaturally structured code, like removing casts).
However, as remarked else-thread, since the standard library unfortunately uses
unsigned, "can help" isn't necessarily the same as "will help".
If applied mindlessly it may exacerbate the problem instead of fix it. But then,
so it is with all things. Needs to be done with understanding. :-)
>> Your compiler cannot, however, warn you about arithmetic problems.
>
> True, but they exist for signed types, too. Only additional problem
> with unsigned is that subtraction is more tricky (must know that a>b
> before doing a-b).
Yes, that's major problem, because the 0 limit is well within the most often
occurring set of values.
As opposed to limits of signed, which are normally way outside that set.
Thus, the 0 limit of unsigned is one often encountered (problematic), while the
limits of signed are not so often encountered (much less problematic).
> But then, I question the frequency at which e.g.
> sizes are subtracted. And even then (get this!), it's fine. Result is
> __signed__ and it all works. (Hey, look! Basic math at work: subtract
> two natural numbers and you don't get a natural number!)
ITYM, "Result is __unsigned__". And yes that works as long as keeping within
unsigned. The problem is that most everything else is signed, so keeping within
unsigned is in practice a real problem, and that's where the nub is.
> Well, it
> works unless you actually work on an array of bytes, but that example
> is contrived and irrelevant, I mighty agree with you there.
Ah. :-)
> I also question the relevance of signed for subtraction of indices,
> because going into an array with a-b where a<b is just as much of a
> bug as with unsigned. So with signed, there has to be a check (if (a-
> b>=0)), with unsigned, there has to be a check (if (a>b)). So I see no
> gain with signed, only different forms.
It's not so much about that particular bug. I haven't ever encountered it,
unless I did in my student days. It's much more about loops and stuff.
But regarding that bug, if for the sake of argument it's assumed to be a real
problem, then see above: it seems signed has the advantage also there... ;-)
>> There's a host of bug vectors in [arithmetic], including the main example of loop
>> counting down (incorrectly expressed).
>
> Hmmm... But I see only one vector: can't decrement before checking for
> 0.
Well, above you talked about using unsigned-only arithmetic and how that works
out nicely when keeping to unsigned. And yes it does work out well using only
unsigned arithmetic. But now you're talking about /checking/ for 0, which
implies that somehow, the result will be mixed with signed -- which is often
the case, it often will be -- which defeats the earlier argument.
The loop example (well known, well-known solutions also, except that I seem to
recall that Andrew Koenig had a very elegant one that baffled me at the time,
like how could I not have thought of that, and now I can't remember it!):
for( size_t i = v.size()-1; i >= 0; --i )
This is the natural expression of the loop, so any fix -- which is easy --
adds work, both in writing it and in grokking it later for maintainance.
Another arithmetic example (I'm sorry my example generator is sort of out of
commission, so this is not a main example, just one that I remember):
for( size_t i = 0; i < v.size()*step; i += step )
Uh huh, if 'step' is signed and negative then it's promoted to unsigned in the
arithmetic expression, and then for non-zero v.size() the loop iterates at least
once.
Again, solutions are well known.
But they have to applied (and just as importantly, it has to be recognized in
each and every case that a solution needs to be applied), which is more work,
both originally and for maintainance, and makes for less correct software.
And so on.
> So the two dangers above can take many forms, but honestly, how
> difficult is it for someone to grasp the concept? I say, not very.
Judging from experience and discussions here, it /is/ difficult for many to
grasp the concepts of unsigned modulo 2^n arithmetic.
But that's not the primary problem.
The primary problem is the ease of introducing pitfalls and the added work. But
could one perhaps rely on humans catching mistakes and doing everything right?
Well, think about how often you catch an error by /compiling/.
> You claim that these potential bugs are important. I claim that they
> are not, because I see very little subtraction of indices in code I
> work with, and very little backwards-going loops. That may be
> different for you, but I'll still wager that these are overall in low
> percentiles.
I'm sorry but the notion that all mixing of signed and unsigned happen in
indexing and count-down loops is simply wrong. Above is one counter example.
Happily modern compilers warn about some other examples such as signed/unsigned
comparisions, but e.g. Pete Becker has argued earlier in this group that trying
to achieve warning-free compilation is futile in the context of developing
portable code, and should not be a concern, and so I gather many think that.
> You also conveniently chose to overlook (or worse yet, call it hand-
> waiving) the true nature of a count and an index (they are natural
> numbers). I can't see how designing closer to reality can be
> pointless.
The correspondence you point out, but misunderstand, is /worse/ than pointless
in C++ (although not in some other languages).
In C++ the correspondence is
endpoints of basic value range [correspond to] endpoints of restricted range
The value range on the left is one of modulo 2^n arithmetic. Its endpoints are
not barriers, they are not values that shouldn't be exceeded. On the contrary,
in your arguments above you make use of the fact that exeeding those values is
well defined in C++, a feature to be exploited, "don't care" arithmetic (of
course with the catch that this implies no mixing with signed values).
The value range on the right is, on the other hand, one whose endpoints
constitute barriers.
Exceeding those barriers is an error.
So the correspondence, such as it is, is one of comparing, to the right, the
/numerical value/ of a barrier (exceeding which is an error) to, on the left,
the /numerical value/ of a wrap-around point (exceeding which is a feature to be
exploited), and disregarding the nature of the points so compared.
One can't have both, both error and feature to be exploited. So it's not
identity, it's not "closer". It's just a coincidence of numerical values, and
when you confuse the kinds of ranges they stem from you introduce bugs.
> And so I have to tell you what somebody already told you here: you
> seem to adhere to "anything that makes your point weaker is "grossly
> irrelevant". Anything that supports your point is, however, relevant."
I'm sorry but that's just innuendo.
Cheers & hth.,
Heh. Well it's a good thing about a non-moderated group that we can have more
colorful discussions, to say what we Really Mean, offensive or not! :-) And
even sometimes we can stray into off-topic land. It broadens the scope, and I
believe diversity is good in and of its own. The moderated group provides other
benefits such as the absence of spam and pure off-topic discussion and other
"noise", as well as a higher frequency of particpation of Real Experts -- but
regarding colorful language it is unfortunately only apparently an advantage,
because, for example, it's much /harder/ to defend oneself against insinuations
hidden in deep implications and a very intelligent person's non-offensive
language so that it passes moderation, than when the response is more direct.
> On 3 Apr, 06:08, "Alf P. Steinbach" <al...@start.no> wrote:
>
>> * Christopher:
>>> const unsigned GetNumElements() const
>
> (snip)
>
>> And even though the standard library has size methods with unsigned result type,
>> also that is an abomination. It's just sh*tty design, very much so, to use a C++
>> built-in type,
>
> size_t seems the ideal type for returning a value which indicates how
> many of something you've got, as this is more or less what it was
> designed for. (Or do you not count size_t as "built-in"?)
>
> Why is it so dreadful for the standard library to return built-in
> types, or are you not actually objecting to this per se?
It's not dreadful to have size_t as a built-in type.
It's dreadful to have it as an unsigned type.
That's because mixing signed and unsigned in C++ leads to a lot of problems,
which is added work, which added work may not even catch all the errors.
>> where there's no guaranteed overflow checking,
>
> Unsigned types don't check for overflow, but they do produce defined
> results when they do overflow.
>
> Signed tyes generally don't check for overflow either, and it is
> undefined behaviour if they do overflow. Why is this an improvement?
Because you avoid many/most of the problems of mixing signed and usigned.
>> to indicate some
>> limited value range (technically: (1) it has no advantages, and (2) it has lots
>> of problems, including very serious ones,
>
> Agreed. Try googling for "{23,34,12,17,204,99,16};" I take it this is
> the sort of thing you had in mind?
Yes. :)
>> so (3) it's just sh*tty). Instead use
>> a signed type such as 'int' or 'long' or, if you absolutely must, 'ptrdiff_t'.
>
> Why do you want a signed type to indicate a quantity, which can't be
> negative? Aren't you wasting half its potential values?
No, there's no waste except for the case of a single byte array that's more than
half the size of addressable memory, which on a modern system you simply will
not ever have. There's no waste because that extra range isn't used, and cannot
be used (except for the single now purely hypothetical case mentioned).
Why not?
--
Ian Collins
Just try to come up with a /concrete/ example... :-)
32bit Solaris?
Any 32 bit OS with memory mapped files and large file support?
--
Ian Collins
And is the concrete example then of mapping a 2 GB file to memory under Solaris
and using indexing instead of pointer arithmetic to access it?
Well, I grant that it's possible, and so "will not ever have" was too strong.
But we have to really search to come up with such special cases, and they're not
problematic for the convention of using signed sizes. For when you're doing a
over-half-memory file mapping you're on the edge of things and have to deal with
much more serious problems. By comparision, applying the right types for the
special case shouldn't be hard, and anyway shouldn't influence the choice of
types for more normal code: the marginal gain for the special case doesn't
outweight the serious problems elsewhere (unless Marketing gets into it :-) ).
Cheers, & hth.,
- Alf
PS-
I don't have Solaris (James has) but I think the following C code, just grabbed
from the net, would do something like that:
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
int main(void)
{
int fd, i;
char *mmap_space;
long pagesize = sysconf(_SC_PAGESIZE);
unsigned long mmap_size = 3200000000;
mmap_size = ALLOCSIZE - malloc_size;
if ((fd = open("/dev/zero", O_RDWR)) == -1)
perror("open"), exit(-1);
mmap_space = (void*)mmap((caddr_t) 0,
mmap_size,
(PROT_READ | PROT_WRITE),
MAP_PRIVATE,
fd,
(off_t)0);
if (mmap_space == MAP_FAILED)
perror("mmap"), exit(-1);
(void)close(fd);
(void)fprintf(stderr, "mmap'd %lu bytes\n", mmap_size);
/*
* Just to be thorough, test evey page
*/
(void)fprintf(stderr, "Testing the %lu mmap'd bytes ...\n", mmap_size);
for (i=0; i<mmap_size; i+=pagesize)
mmap_space[i] = i;
(void)fprintf(stderr, "done\n");
return 0;
}
Have you tried it?
-DS
Or simply calling malloc( 3*1024*1024*1024 )!
Which would be impossible on a 32 bit system with a signed size_t.
> Well, I grant that it's possible, and so "will not ever have" was too
> strong.
It generally is!
>
> PS-
> I don't have Solaris (James has) but I think the following C code, just
> grabbed from the net, would do something like that:
>
<snip>
>
> Have you tried it?
My last 32 bit system went to the happy recycling ground last year....
--
Ian Collins
Have you tried that?
Not that it has anything to do with the discussion of signed sizes, but the code
I provided was from an article showing how to overcome a reportedly common 2 GB
limit in Solaris.
I guess it depends much on the version.
> Which would be impossible on a 32 bit system with a signed size_t.
>
>> Well, I grant that it's possible, and so "will not ever have" was too
>> strong.
>
> It generally is!
Yeah, as James Bond reportedly remarked, never say never... ;-)
>> PS-
>> I don't have Solaris (James has) but I think the following C code,
>> just grabbed from the net, would do something like that:
>>
> <snip>
>>
>> Have you tried it?
>
> My last 32 bit system went to the happy recycling ground last year....
So, for you and some others it's already not a practical proposition or even
possible at all to map a file into more than half the available address range
(not even mentioning the matter of processing it at the byte level). :-)
Then if I were inclined to word-weaseling I could claim that by "modern system"
of course I meant a 64-bit one.
He he.
But really, I don't think that the argument about "wasting" some address space
holds water at all.
And as I understand it you agree with that and just playing Devil's Advocate
here (which is good).
Cheers & hth.,
- Alf
--
Not on a 32 bit system.
> Not that it has anything to do with the discussion of signed sizes, but
> the code I provided was from an article showing how to overcome a
> reportedly common 2 GB limit in Solaris.
>
> I guess it depends much on the version.
I probably does, older version did gave a 2GB limit.
>>> PS-
>>> I don't have Solaris (James has) but I think the following C code,
>>> just grabbed from the net, would do something like that:
>>>
>> <snip>
>>>
>>> Have you tried it?
>>
>> My last 32 bit system went to the happy recycling ground last year....
>
> So, for you and some others it's already not a practical proposition or
> even possible at all to map a file into more than half the available
> address range (not even mentioning the matter of processing it at the
> byte level). :-)
Well I do have some 8 and 16 bit embedded development boards I could
power up....
> Then if I were inclined to word-weaseling I could claim that by "modern
> system" of course I meant a 64-bit one.
If you exclude modern cell phones, engine management units, toasters.....
> But really, I don't think that the argument about "wasting" some address
> space holds water at all.
>
> And as I understand it you agree with that and just playing Devil's
> Advocate here (which is good).
Well it is Sunday evening :)
--
Ian Collins
Hm, EC++ is AFAIK dead, and I'm curious: are there *any* C++ compilers extant
for 16-bit addresses (not 16-bit data, but 16-bit addresses)?
Cheers,
- Alf (wondering)
Yes, but the "convention" existed and was documented by Sun
before Java supported introspection. And all introspection
really requires is a convention, not any specific convention
(but it would be somewhat difficult to implement if there were
no specific prefix).
> > As I see it,
> > there are two alternatives:
> > int size() const ;
> > void size( int newSize ) ;
> > and
> > int getSize() const ;
> > void setSize( int newSize ) ;
> > Both work, and both result in readable code. I prefer the
> > first, but the difference isn't enormous.
> > And both work equally well in Java as in C++. (And in
> > beginning, the Java library was sometimes inconsistent in its
> > choice as well.)
> Actually you're right that I did put things to a point. I
> personally prefer a mixture, with the "set" prefix.
In other words, the worst of both worlds:-).
Seriously, I mentionned the two (and not more) because those are
the only two I've seen in any actual programming guidelines, or
in real code. It's probably a reasonable argument to say that
the two functions do different things, so deserve different
names. In that case, however, it's just as reasonable to insist
that the names reflect what they do, i.e. get and set. And I
find it just as reasonable (if not more) to consider that these
aren't really "functions", despite the syntax; they "expose" (in
a controlled way) a data member, and should thus have the public
name of the data member.
A third solution, of course, would be to have:
int size() const ;
IntProxy size() ;
so you could write:
int a = x.size() ;
x.size() = a ;
In many ways, this is the most elegant. But it just seems more
work than necessary (to me anyway) to implement all of those
proxies, and C++ programmers don't really expect it.
> That's because of a preference for readability and my very
> subjective opinion of what constitutes readability, he he. :)
> And that ties in with that one practical and very C++ specific
> benefit of avoiding the prefixes has only to do with "get",
> not with "set".
> Namely, to supporting letting the client code choose to
> manually optimize (awkward notation) or not (especially when
> the compiler does it), by doing
> void getPopulationData( Container& c )
> {
> Container result;
> ...
> result.swap( c );
> }
> Container populationData()
> {
> Container c;
> getPopulationData( c );
> return c;
> }
> Here client code will preferentially use "populationData",
> relying on RVO for the cases where efficiency matters.
> If it turns out that the compiler isn't up to the task and
> measurements show that efficiency of these calls do matter a
> lot, then client code can fall back to using
> getPopulationData, in the place or places where it affects
> performance.
How does this change anything with regards to the choice above?
If you use get/set prefixes, overload resolution will come into
play for the selection of the get function. If you use no
prefixes, then the get name is still available for use as above.
It's readability, of /the calling code/.
Calling code that says
populationData( o );
doesn't really say anything about what it does. Is it perhaps an assertion that
'o' is population data? Is it perhaps an extraction of population data from 'o'?
What's going to happen here -- or not?
On the other hand, code that says
getPopulationData( o );
says what it does, because there are not many rôles that o can play here and
still have a reasonable programmer's-english sentence construct.
And also code that says
Container const o = populationData();
says what it does.
Of course, also with a "get" prefix there it says what it does because the
reader recognizes the prefix as a common redundant prefix. But, being redundant,
it is redundant. IMHO just visual clutter and more to read and write.
> If you use get/set prefixes, overload resolution will come into
> play for the selection of the get function. If you use no
> prefixes, then the get name is still available for use as above.
Overload resolution is fine with respect to the goal of having the correct
implementation invoked.
It's not fine with respect to e.g. searching in an editor.
And it's not fine with respect to readability, and other human cognitive
activities such as dicussing the code -- then distinct names are bestest. :)
> > This is pretty unclear, but unsigned opens the door for more
> > bugs, so this argument about probability of detecting those
> > bugs is pretty lame. :)
> Why? It's a classic application of "fail fast" at work: going
> into an array with -x __happens__. E.g. bad decrement
> somewhere gives you -1, or, bad difference gives (typically
> small!) -x. Now, that typically ends in reading/writing bad
> memory, which is with small negatives detected quickly only if
> you're lucky. If, however, that decrement/ subtraction is done
> unsigned, you typically explode immediately, because there's a
> very big chance that memory close to 0xFFFF... ain't yours.
Sorry, but the array class will certainly catch a negative index
(provided it uses a signed type for indexes).
Conceptually, there is an argument in favor of using a cardinal,
rather than an integer, as the index type, given that the
language (and the library) forces indexes to start at 0. (My
pre-standard array classes didn't, but that's another issue.)
But C++ doesn't have a type which emulates cardinal, so we're
stuck here. The fact remains that the "natural" type for all
integral values is int---it's what you get from an integral
literal by default, for example, it's what short, char, etc.
(and there unsigned equivalents!, if they fit in an int, which
they usually do) promote to. And mixing signed and unsigned
types in arithmetic expressions is something to be avoided. So
you want to avoid an unsigned type in this context.
> > The problems with unsigned types are well known.
> > Your compiler, if it's any good, will warn you about
> > comparisions unsigned/signed. Those warnings are serious.
> > Where you have such type mismatch (which results from
> > unsigned) you often have a bug.
> True, but why are signed and unsigned mixed in the first
> place? I say, because of the poor design! IOW, in a poor
> design, it's bad. So how about clearing that up first?
That's what we're trying to do. Since integral literals and, in
contexts where the usual arithmetic conversions apply, unsigned
char and unsigned shorts have signed type, you're pretty much
stuck.
I might add that a compiler is allowed to check for arithmetic
overflow in the case of signed arithmetic, and not in the case
of unsigned arithmetic. Realistically, I've only heard of one
that did, however, so this is more a theoretical argument than a
practical one.
> > Your compiler cannot, however, warn you about arithmetic
> > problems.
> True, but they exist for signed types, too. Only additional
> problem with unsigned is that subtraction is more tricky (must
> know that a>b before doing a-b). But then, I question the
> frequency at which e.g. sizes are subtracted.
Indexes are often subtracted. And there's no point in
supporting a size larger than that you can index.
> And even then (get this!), it's fine. Result is __signed__ and
> it all works.
Since when? And with what compiler? The standard states
clearly that for *all* binary operators between the same type,
the results have that type.
> (Hey, look! Basic math at work: subtract two natural numbers
> and you don't get a natural number!)
C++ arithmetic doesn't quite conform to the rules of basic
arithmetic. To a certain degree, it can't, since basic
arithmetic deals with infinite sets---you can't get overflow.
Unsigned arithmetic in C++ explicitely follows completely
different rules. (In passing: if you do happen to port to a
machine not using 2's complement, unsigned arithmetic is likely
to be significantly slower than signed. The C++ compiler for
the Unisys 2200 even has an option to turn off conformance here,
because of the performance penalty it exacts.)
> Well, it works unless you actually work on an array of bytes,
> but that example is contrived and irrelevant, I mighty agree
> with you there.
> I also question the relevance of signed for subtraction of
> indices, because going into an array with a-b where a<b is
> just as much of a bug as with unsigned. So with signed, there
> has to be a check (if (a- b>=0)), with unsigned, there has to
> be a check (if (a>b)). So I see no gain with signed, only
> different forms.
There's a fundamental problem with signed. Suppose I have an
index into an array, and a function which, given that index,
returns how many elements forward or back I shoud move. With
unsigned indexes, the function must return some sort of struct,
with a flag indicating whether the offset if positive or
negative, and the calling code needs an if. With signed
indexes, no problem---the function just returns a negative value
to go backwards.
[...]
> You claim that these potential bugs are important. I claim
> that they are not, because I see very little subtraction of
> indices in code I work with, and very little backwards-going
> loops.
So we work with different types of code.
Note that if you subtract pointers, you also get a signed value
(possibly undefined, if you allow arrays to have a size greater
than std::numeric_limits<ptrdiff_t>::max()).
> That may be different for you, but I'll still wager that these
> are overall in low percentiles.
> You also conveniently chose to overlook (or worse yet, call it
> hand- waiving) the true nature of a count and an index (they
> are natural numbers). I can't see how designing closer to
> reality can be pointless.
They are a subsets of the natural numbers (cardinals), and the
natural numbers are a subset of integers. C++ has a type which
sort of approximates integers; it doesn't have a type which
approximates cardinals. The special characterists of unsigned
types mean that they are best limited to raw memory (no
calculations), bit maps and such (only bitwise operations) and
cases where you need those special chacteristics (modulo
arithmetic). Generally speaking, when I see code which uses
arithmetic operators on unsigned types, and doesn't actually
need modulo arithmetic, I suppose that the author didn't really
understand unsigned in C++.
> But consider with signed index that is negative, corresponding
> to large value unsigned,
> a[i]
> If (1) the C++ implementation is based on unchecked two's
> complement (which is the usual), then the address computation
> yields the same as with unsigned index. So, no advantage for
> unsigned.
> If the C++ implementation isn't based on unchecked two's
> complement, then either (2) you get the same as with unsigned
> index (no advantage for unsigned), or (3) you get a trap on
> the /arithmetic/.
> So in all three possible cases unsigned lacks any advantage
> over signed.
If the imploementation isn't based on 2's complement, unsigned
arithmetic is likely to be considerably slower than signed,
since the compiler has to generate the code to implement the
modulo behavior of unsigned correctly.
> This, not from data -- for I haven't any experience that I
> can recall with code that supplies negative index (or
> corresponding with unsigned)
But you've certainly familiar with code which uses negative
offsets to an index. Binary search, for example.
The argument was given that indexes are natural numbers. That's
not totally true, since we expect to be able to add negative
values to them. (In an unchecked 2's complement machine, of
course, we'll probably land on our feet with the correct value
anyway. But it's hardly what I would call "clean". And if for
some reason, the offset passes through a smaller unsigned type,
e.g. unsigned int, on most 32 bit machines, we are screwed.)
> >> * Christopher:
> >>> const unsigned GetNumElements() const
> > (snip)
> >> And even though the standard library has size methods with
> >> unsigned result type, also that is an abomination. It's
> >> just sh*tty design, very much so, to use a C++ built-in
> >> type,
> > size_t seems the ideal type for returning a value which
> > indicates how many of something you've got, as this is more
> > or less what it was designed for. (Or do you not count
> > size_t as "built-in"?)
> > Why is it so dreadful for the standard library to return
> > built-in types, or are you not actually objecting to this
> > per se?
> It's not dreadful to have size_t as a built-in type.
Technically, it's not a built-in type, but a typedef to a
built-in type.
> It's dreadful to have it as an unsigned type.
Yes and no. There's some reasonable argument for it being
unsigned, *but* given that it's unsigned, it's being used (in
the standard and elsewhere) in a lot of places where ptrdiff_t
would be more appropriate. Basically, anytime you have
something that could reasonably be, in some code, calculated by
a difference between pointers (or iterators), then you should be
using ptrdiff_t. About the only time size_t is appropriate is
as an argument to malloc.
> That's because mixing signed and unsigned in C++ leads to a
> lot of problems, which is added work, which added work may not
> even catch all the errors.
Interestingly enough, part of the problem, at least, is that
unsigned has a larger range
> Well I do have some 8 and 16 bit embedded development boards I
> could power up....
Historically (and this does go back some), some 16 bit systems
used a segmented architecture, in which a user process could
have up to 640KB memory, but the maximum size of a single object
(or array) was 64KB, and size_t was 16 bits. In such systems,
there is an argument concerning the addressability; making
size_t signed effectively does divide the largest size of a byte
array by 2, and it isn't that unreasonable to imagine an
application which deals with byte arrays larger than 32KB, even
on such a system. Whether supporting the additional range is
worth the hassles it causes (due to mixing of signed and
unsigned types) is very debatable, but the fact that Stepanov
originally developed the STL on such a system is probably not
foreign to his choice of size_t for indexes.
Today, of course, you won't find such things other than in
embedded systems, and I'm, not sure whether such issues are
relevant in them.
Technically that depends on the definition of "built-in", in particular whether
the type is provided by standard C++ or only by the implementation.
But regarding the typedef, that's the same as I wrote, so it's just quibbling.
Below it seems you have no problems parsing a very similar sentence:
>> It's dreadful to have it as an unsigned type.
>
> Yes and no.
I respectfully disagree. It's a good thing if I use a clear,
consistent style *which I share with other C++ programmers*. This
group is a good place to pick up such style issues. For the word
"get", I already /know/ people (not just Alf) have strong feelings
about it -- not to mention the presence of get/set methods in C++
class design in general.
The code I'm working on now uses ALLUPPERCASE for class names. That is
both clear and consistent -- but whoever invented that style obviously
lived in a cave, isolated from other C++ programmers. I don't want to
be that guy. I don't want every programming project I enter to be its
own tiny C++ subculture.
/Jorgen
--
// Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
\X/ snipabacken.se> R'lyeh wgah'nagl fhtagn!
> There's a fundamental problem with signed. Suppose I have an
> index into an array, and a function which, given that index,
> returns how many elements forward or back I shoud move.
You mean like this, e.g., (Example A):
int64 GetRelativePositionToMoveTo(uint32 index);
void MoveRelative(int64 relative_position);
> With
> unsigned indexes, the function must return some sort of struct,
> with a flag indicating whether the offset if positive or
> negative, and the calling code needs an if.
???
(That's not a bad idea BTW: see aside note below).
> With signed
> indexes, no problem---the function just returns a negative value
> to go backwards.
The "only" thing using a signed index gets you (design-wise, i.e.) is ...
not much (Example B):
int32 GetRelativePositionToMoveTo(int32 index);
void MoveRelative(int32 relative_position);
Now, instead of the "impedance mismatch" between the index width/range and
the movement width/range of Example A, you have an "impedance mismatch"
between the signed index argument and the common-sense notion of "index"
which is unsigned.
(Aside: A relative movement has magnitude AND direction and is therefor
fundamentally different from an index. A class representing this may not be
a bad idea indeed and then the impedance mismatches go away entirely and the
design is then clean/clear.)
Tony
> They are a subsets of the natural numbers (cardinals), and the
> natural numbers are a subset of integers. C++ has a type which
> sort of approximates integers; it doesn't have a type which
> approximates cardinals. The special characterists of unsigned
> types mean that they are best limited to raw memory (no
> calculations), bit maps and such (only bitwise operations) and
> cases where you need those special chacteristics (modulo
> arithmetic). Generally speaking, when I see code which uses
> arithmetic operators on unsigned types, and doesn't actually
> need modulo arithmetic, I suppose that the author didn't really
> understand unsigned in C++.
How could the issues with unsigned be fixed in the C++ language (or in any
language for that matter)?
Tony
>> size_t seems the ideal type for returning a value which indicates how
>> many of something you've got, as this is more or less what it was
>> designed for. (Or do you not count size_t as "built-in"?)
>>
>> Why is it so dreadful for the standard library to return built-in
>> types, or are you not actually objecting to this per se?
>
> It's not dreadful to have size_t as a built-in type.
>
> It's dreadful to have it as an unsigned type.
>
> That's because mixing signed and unsigned in C++ leads to a lot of
> problems, which is added work, which added work may not even catch
> all the errors.
Or is it just being lazy or unknowing in the design and trying to shoehorn
abstractions into being represented by built-in types when they should
really be classes of their own? If C++ had real typedefs (instead of just
aliases), that would be acceptable in a lot of cases, but since it doesn't,
it's a questionable practice.
Tony
In other words, instead of a practical problem one has a clash with some ideology.
For me (and I guess also for James) choosing between the two is a no-brainer.
>> Why do you want a signed type to indicate a quantity, which can't be
>> negative? Aren't you wasting half its potential values?
>
> No, there's no waste except for the case of a single byte array
> that's more than half the size of addressable memory, which on a
> modern system you simply will not ever have. There's no waste because
> that extra range isn't used, and cannot be used (except for the
> single now purely hypothetical case mentioned).
Would you ever use a signed integer to represent a memory address?
I don't know whether there is a general solution.
But as noted in the general debate here, spread over many threads, the cases
where unsigned are relevant for indexing are special systems where the size of a
maximum available chunk of memory is very limited, and on those systems it is
fundamentally a speed trade-off, that of not using a too large size_t.
So it seems to me that any really cross-platform solution would have to
differentiate between two kinds of systems, for some aspects. That may sound
abhorrent, but is already to a large extent the situation with C++ (it's not for
nothing that the standard differentiates between hosted and non-hosted systems).
Currently the differentiation is of the sort where on a non-hosted system you
just may have /less/ of the standard functionality available; I gather that a
solution to the unsinged indexing issue would mean somehow also having some
more, dedicated functionality available, i.e. two different sets of
functionality. But then perhaps we're really talking about two different
languages. It may be that a "one size fits all" language (pun intended :) ) is
not the most practical approach...
Cheers & hth.,
- Alf (speculative)
What if you're allocating from the top of the virtual memory space down and
doing something with those addresses? The idea of using signed integers to
avoid language idiosynchacies which results in limiting the range that can
be represented by the platform to half, seems suspect.
> Technically, [size_t] not a built-in type, but a typedef to a
> built-in type.
"Technically" because C++ typedefs are just aliases rather than "real
typedefs".
> Hm, EC++ is AFAIK dead,
Or sleeping.
> The code analyzer Viva64 will simplify migration process!
> http://www.viva64.com/viva64-tool/
Expensive.
No, but perhaps a pointer. :-)
Bo Persson
Would you ever use an integral type to represent a memory address?
Modula-2 and Pascal handled it fairly well. Subrange types. An
array is indexed by a subrange type.
(Note that my pre-standard array classes followed the Pascal
model: the client specified both a lower and an upper bound.
And the lower bound---both actually---could be negative. While
I've never found much use for a lower bound greater than zero,
there are a couple of cases where it's useful for the lower
bound to be the complement of the upper bound, with 0 indexing
directly into the middle. In fact, I found I had a number of
cases where the array was indexed by a char, with lower bound
CHAR_MIN, and upper bound CHAR_MAX.)
I use an unsigned integer type, via preprocessor define, that is the
appropriate width for the target platform.
Bad design is bad design. If the language instigates bad design, then it's a
bad language. :)
But the question remains: Why is there such a wide demand for storing
adresses in integral type variables? What is wrong with pointers and
references?
Bo Persson
I am not sure that there is such a wide demand, but there are
legitimate reasons that one might want to look at pointeres as ints
that might creep up into many types of of programming. One such
frequent one could be for hashing, other one could be related to
special memory allocators.
/Peter
Agreed, but there is not much use for a general preprocessor define
here, as hashing would also need to things like how many bits are
actually used (48 out of 64?). The same for a memory alllocator.
You could just use int, or unsigned long long, or perhaps ptrdiff_t,
as appropriate for a specific system. You must know the target system
anyway.
Bo Persson
This is slightly useful in a C-style interfaces which call a user function
and allow the user to pass some data along unchanged. Rather than using a
union:
union Data {
void* p;
int i;
};
void library( void (*user_func)( Data ), Data d )
{
// ...
user_func( d );
}
which involves some inconvenience on the part of the user, they instead
accept an intptr_t (or equivalent if you don't have stdint.h):
void library( void (*user_func)( intptr_t ), intptr_t d )
{
// ...
user_func( d );
}
This allows the user to pass a plain integer without any work, or pass a
pointer via a cast at the call site and a cast inside user_func.
And when such a type doesn't exist? (Admittedly unlikely today,
with long long, but I've encountered it in the past, with 48 bit
pointers and 32 bit longs, and no larger integral type.)
More to the point, of course, is why? An integer (signed or
otherwise) isn't a pointer, and a pointer isn't an integer. And
there's really not much point in trying to represent one in the
other. (There are a very few cases where I would represent a
small integral value as a pointer, but I can't think of any
where I'd represent a pointer as an integer.)
> > >>>> No, there's no waste except for the case of a single
> > >>>> byte array that's more than half the size of
> > >>>> addressable memory, which on a modern system you simply
> > >>>> will not ever have. There's no waste because that
> > >>>> extra range isn't used, and cannot be used (except for
> > >>>> the single now purely hypothetical case mentioned).
> > >>> Would you ever use a signed integer to represent a
> > >>> memory address?
> > >> Would you ever use an integral type to represent a memory
> > >> address?
> > > I use an unsigned integer type, via preprocessor define,
> > > that is the appropriate width for the target platform.
> > But the question remains: Why is there such a wide demand
> > for storing adresses in integral type variables? What is
> > wrong with pointers and references?
> I am not sure that there is such a wide demand, but there are
> legitimate reasons that one might want to look at pointeres as
> ints that might creep up into many types of of programming.
> One such frequent one could be for hashing, other one could be
> related to special memory allocators.
I don't quite see the use with regards to memory allocators, but
the case of hashing is different: you need to extract an
integral value from a pointer, but you're not representing the
pointer in an integral type---the integral value is something
else, which can't necessarily be reconverted to a pointer.
(Portably hashing pointers is tricky; on many modern machines,
the simplest and best solution is just to consider it as an
array of unsigned char, and hash the array, but I've used
machines where this wouldn't work.) Similarly, it may be
interesting to view a pointer as an array of unsigned char, in
order to "dump" it.
[...]
> > I am not sure that there is such a wide demand, but there
> > are legitimate reasons that one might want to look at
> > pointeres as ints that might creep up into many types of of
> > programming. One such frequent one could be for hashing,
> > other one could be related to special memory allocators.
> Agreed, but there is not much use for a general preprocessor
> define here, as hashing would also need to things like how
> many bits are actually used (48 out of 64?). The same for a
> memory alllocator.
Not only how many bits, but how they are used. On segmented
architectures (IBM mainframes, Intel 16 and 32 bit processors),
you also run the risk that the results of converting to an
integer may result in different values, even when the pointers
compare equal.
For most RISC architectures (and many others: AMD/Intel 64 bits,
and Intel when compiled in small model, where only the offset is
relevant, and in some modes on a modern IBM mainframe), the
obvious solution is just to treat the pointer as an array of
unsigned char, e.g.:
unsigned
hashPtr( void* ptr )
{
unsigned result = 2166136261U ;
unsigned char* current
= reinterpret_cast< unsigned char* >( &ptr ) ;
unsigned char* end = current + sizeof( void* ) ;
while ( current != end ) {
result = 127U * result + *current ;
++ current ;
}
return result ;
}
I don't think I'd consider that "looking at a pointer as an
int".
> You could just use int, or unsigned long long, or perhaps
> ptrdiff_t, as appropriate for a specific system. You must know
> the target system anyway.
On some of the systems I've worked on, the "appropriate" type
would be a struct.
If you're talking about things like the argument to
pthread_create, this isn't so much a question of storing a
pointer as an int, as of storing an int as a pointer; the system
passes a void*, with the idea that this can point to anything.
If all you need is a small integral value, of course, converting
it to a pointer to be passed, then converting it back in the
callback, avoids many of the lifetime of object issues involved
if you pass the address of an int, and dereference. Of course,
it also involves a lot of undefined behavior, but in practice,
it will probably work on most Unix machines (and if you're
invoking pthread_create, you're under Unix); the portability
risk may be deamed less a problem than the added complexity of
managing the lifetime of the int designated by the pointer if
you use a real pointer.
I'm really only concerned about 32-bit and 64-bit platforms in my
application development (and I wish I was born a bit later so that 32-bit
would not be a concern).
> (Admittedly unlikely today,
> with long long, but I've encountered it in the past, with 48 bit
> pointers and 32 bit longs, and no larger integral type.)
Note above.
>
> More to the point, of course, is why? An integer (signed or
> otherwise) isn't a pointer, and a pointer isn't an integer. And
> there's really not much point in trying to represent one in the
> other. (There are a very few cases where I would represent a
> small integral value as a pointer, but I can't think of any
> where I'd represent a pointer as an integer.)
I distinguish "address" from "pointer". But I admit to thinking that
conversion to integral type is OK (I think? I'd have to go look at my
codebase... I'm old and drink a lot ;) ). It has to be, else ptr + offset
and printf of address wouldn't work. I'm not too concerned about the actual
pointer representation as long as the overloading works.
I should have curbed the discussion of pointer representations by using
OFFSET as the example instead of ADDRESS:
maddr whats_my_address; // my define thing called maddr that
// is 4 bytes wide on a 32-bit platform and 8 bytes wide on a
// 64-bit platform.
whats_my_address = (maddr)(some_ptr + some_32bit_offset);