Re: Using std::ofstream::write() to save data

Carl Barron

unread,

May 17, 2004, 10:00:42 AM5/17/04

to

In article <c88n6f$66j$1...@news1.nefonline.de>, Matthias Hofmann
<hof...@anvil-soft.com> wrote:

> This did not seem to be a problem until I tried to write data to a file like
>
> void SaveData( std::ofstream& s )
> {
> int i( 0 );
>
> s.write( &i, sizeof i ); // Error. Cannot convert int* to char*.
> }

s.write(static_cast<char *>(static_cast<void *>(&i)),sizeof i);

or more generally where OStream is any basic_ostream<C,T> and Item is a
POD, [copyable via
template <class OStream,class Item>
OStream & SaveData(OStream &os,Item *x)
{
return os.write(static_cast<static_cast<void *>(x)),sizeof(Item));
}

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Ulrich Eckhardt

unread,

May 17, 2004, 3:12:59 PM5/17/04

to

Matthias Hofmann wrote:
> I have just decided that it would be better to use streams for file I/O
> instead of the corresponding functions in C. However, I have stumbled into
> a slight problem. Contrary to what I had expected, std::ofstream::write(),
> as well as std::ifstream.read(), accept a pointer to char instead of a
> pointer to void as the first argument.

>
> void SaveData( std::ofstream& s )
> {
> int i( 0 );
> s.write( &i, sizeof i ); // Error. Cannot convert int* to char*.
> }

Don't do that. C++ IOStreams are designed to do formatted, textual IO. You
possibly want to look at streambufs instead, which just do the 'raw, binary
IO'.

> I first tried to use a static_cast from int* to char*, but the compiler
> (VC++ 6.0) said that an interpret_cast was required. An interpet_cast
> sounds like having portability problems sooner or later, so I do not want
> to use it.

There is a 're' in front of reinterpret_cast, and your files are already
non-portable, btw.

> My next guess was that I should not use std::ofstream::write(), but rather
> pass the data to std::ofstream::operator<<(). The function above would
> then look like

> void SaveData( std::ofstream& s )
> {
> int i( 0 );

> s << i; // Fine.
> }
>
> However, what if I actually have an array of ints? Do I have to iterate
> through each member in a loop and save each element individually? That
> would be a severe disadvantage towards C.

// write the whole array to the stream, separated by a space
std::copy( array, array+size, std::ostream_iterator<type>( out, " "));

> Why didn't the Standards Comittee decide to have std::ofstream::write()
> take a pointer to void? Why can I cast any pointer to void*, but not to
> char*?

void* are evil and IOStreams are not made for doing evil things (i.e.
'binary IO'). ;)

> If all pointer types possibly differ in implementation, how can the C
> functions for file I/O (which use void pointers) work at all?

If a pointer was just the index of an element is memory, you would have
real address = index * sizeof element
In that case, conversion between an int* and a void* would require a real
computation. Similar things apply if the the architecture uses a native
atomar data-type that is twice as large as a char. It might then store a
flag (upper char/lower char) in an otherwise unused bit of the address and
then do some computation to find the right one upon dereferenciation.
Note: all this is the same for C and C++, just that C++ has a slightly
different type-system than C(see below).

> Imagine the
> following situation:
> int _write( HFILE file, void* pv, size_t size )
> {
> char* pc = ( char* ) pv;
> // Save data.
> }

Is that C or C++? If C, why the cast? Why not const?

One last note: in C, woid* is a generic pointer which can implicitly be
converted to anything, which is very insecure (type safety). In C++ it is a
typeless pointer which you must explicitly assign a type before you can you
what it points to.

Uli

--
FAQ: http://parashift.com/c++-faq-lite/

/* bittersweet C++ */
default: break;

Maciej Sobczak

unread,

May 17, 2004, 3:22:20 PM5/17/04

to

Hi,

Matthias Hofmann wrote:

> I have just decided that it would be better to use streams for file I/O
> instead of the corresponding functions in C. However, I have stumbled into a
> slight problem. Contrary to what I had expected, std::ofstream::write(), as
> well as std::ifstream.read(), accept a pointer to char instead of a pointer
> to void as the first argument.

[...]

> I first tried to use a static_cast from int* to char*, but the compiler
> (VC++ 6.0) said that an interpret_cast was required. An interpet_cast sounds
> like having portability problems sooner or later, so I do not want to use
> it.

reinterpret_cast does not impose more portability problems than using
binary data representation itself, so you should not bother.
In other words, by deciding to use binary files you just dive into the
world constrained to what is defined by your hardware and by your
compiler. The reinterpret_cast is a way to tell the compiler that you
are aware of the consequences. If you are *really* sure that you want to
write the data exactly as it is in memory (and that you know how to read
it back to get meangingful results) then reinterpret_cast is your best
friend.

> I read in the FAQ that a pointer to a char can be implemented much
> differently from a pointer to an int, or any other type. Maybe this is the
> reason why the static_cast does not work.

static_cast does not work, because casting between pointer to unrelated
types is indeed meaningful only when you want to *reinterpret* the
contents of memory using other type's representation rules. So -
reinterpret_cast expresses your intents more clearly.

> My next guess was that I should not use std::ofstream::write(), but rather
> pass the data to std::ofstream::operator<<().

Not exactly. operator<< performs text formatting.

If you want portability, then you should use text-based files (for
example by using operator<< for writing and operator>> for reading) or
standardized binary formats (like ASN.1), but then I/O cannot be
performed "en bloc", but carefully composed using specialized functions.
If you "just" want to write the memory block to the file, forget the
portability. Relying only on what the standard says, such file can be
only written and read in the same session of the same program. That's
the extreme interpretation, but in reality you may even expect problems
when reading files with the same program compiled with different
compiler (the most obvious cause is that fundamental types may have
different sizes on different compilers or even with different switches
of the same compiler).

> However, what if I actually have an array of ints?

Then, if you are aware of the consequences (non-portability of the
resulting file), I see no problem with this:

int a[100];
// ...
f.write(reinterpret_cast<char*>(a), sizeof(a));

> Why didn't the Standards Comittee decide to have std::ofstream::write() take
> a pointer to void?

That's an interesting question.
I would also choose char*, because then the size parameter expressed as
a number of "chars" (that's in fact what sizeof(...) gives) would be
more meaningful, at least conceptually. The number of "voids" is not
meaningful, because void has no size. So, by using void* to pass a
pointer and the number of chars to pass the size of memory block, you
get the conceptual inconsistency.
But that's only my "terminological" opinion and I would be glad to know
the actual motivations of the committee.

> Why can I cast any pointer to void*, but not to char*?

void* is something you can keep your pointer in just until you get it
back to the form it had before:

int *pi;
void *pv = pi;
int *pi2 = static_cast<int*>(pv);

That's about the only thing you can do with void*.

char* is a pointer to the object that has a concrete type, so that you
can read its value. Either it is indeed a character (for example in the
middle of the string) or it was (part of) something else, in which case
you "reinterpret" that object's binary representation. Again,
reinterpert_cast allows you to express your intents.

--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/

ka...@gabi-soft.fr

unread,

May 17, 2004, 3:25:49 PM5/17/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> wrote in message
news:<c88n6f$66j$1...@news1.nefonline.de>...

> I have just decided that it would be better to use streams for file
> I/O instead of the corresponding functions in C. However, I have
> stumbled into a slight problem. Contrary to what I had expected,
> std::ofstream::write(), as well as std::ifstream.read(), accept a
> pointer to char instead of a pointer to void as the first argument.

Because all output eventually takes place through char's in C++.

A more reasonable question would be why write in a wostream takes a
wchar_t pointer, instead of a char or unsigned char pointer. On the
other hand, since character code translation will necessarily take place
in the filebuf...

> This did not seem to be a problem until I tried to write data to a

> file like

> void SaveData( std::ofstream& s )
> {
> int i( 0 );

> s.write( &i, sizeof i ); // Error. Cannot convert int* to char*.
> }

> I first tried to use a static_cast from int* to char*, but the

> compiler (VC++ 6.0) said that an interpret_cast was required. An
> interpet_cast sounds like having portability problems sooner or later,
> so I do not want to use it.

Well, if you dump a byte image of an int to a file, you will have
portability problems sooner or later. The write function is only useful
for outputting preformatted buffers, at least in portable code.

> I read in the FAQ that a pointer to a char can be implemented much
> differently from a pointer to an int, or any other type. Maybe this is
> the reason why the static_cast does not work.

Not really. static_cast doesn't work because it doesn't make sense
here.

> My next guess was that I should not use std::ofstream::write(), but
> rather pass the data to std::ofstream::operator<<().

Maybe. The semantics are considerably different: operator<< formats,
write doesn't.

> The function above would then look like

> void SaveData( std::ofstream& s )
> {
> int i( 0 );

> s << i; // Fine.
> }

> However, what if I actually have an array of ints? Do I have to
> iterate through each member in a loop and save each element
> individually?

Yes.

> That would be a severe disadvantage towards C.

That's what you had to do with FILE* in C as well. At least if you
wanted your code to be robust. If you don't care about robust, then
reintepret_cast will do the job.

> Why didn't the Standards Comittee decide to have

> std::ofstream::write() take a pointer to void? Why can I cast any

> pointer to void*, but not to char*?

Why don't we have a real raw memory type, so we can say what we mean?

What you pass certainly isn't an array of characters. It does have a
type, however: raw (preformatted) bytes. Traditionally, C has used
unsigned char for this, but on most machines, char works just as well.
In fact, given the amount of code which would break if char didn't work,
I'm sure that on any exotic hardware where a signed char would cause
problems, plain char would be unsigned, just to avoid these problems.

> If all pointer types possibly differ in implementation, how can the C
> functions for file I/O (which use void pointers) work at all?

There is a requirement that void* subsume all other types. That you can
convert a pointer to void* and back to its original type without loss of
information. Back to its original type only, however.

Or to unsigned char* or char* -- it is also guaranteed that you can
access the underlying raw memory as an unsigned char or char.

> Imagine the following situation:

> int _write( HFILE file, void* pv, size_t size )

What's HFILE?

> {
> char* pc = ( char* ) pv;

Void* and char* are required to be compatible. All pointers to class
are required to be compatible. You do have those guarantees.

> // Save data.
> }

> void f( HFILE hFile, int i )
> {
> _write( hFile, &i, sizeof i ); // Cast *int to *void, then from void* to
> char* !!!

Which is guaranteed to work. For a specific definition of "work": the
resulting char* is guaranteed to point to the first byte of the
underlying representation of the original object, and it is guaranteed
that you can access the underlying bytes as raw memory.

It is also pretty much guaranteed that if you write raw data bytes to
disk, sooner or later, you won't be able to reread them. (Not by the
standard, of course.:-))

> }

> How can this work at all, and what's the difference for streams?

To answer, you'll have to explain exactly what you mean by work. This
works, and there is no difference with streams (except that you have to
do the cast to char* yourself, and you have to ensure that the stream is
imbued with the "C" locale, since this is not the default) if AND ONLY
IF you reread the data using the same binary image -- e.g. for temporary
files which will be deleted when the program has finished only. For all
other uses, you should define the format you want, and write and read
it.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Dave Moore

unread,

May 17, 2004, 4:46:32 PM5/17/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> wrote in message news:<c88n6f$66j$1...@news1.nefonline.de>...

> Hello!

>
> I have just decided that it would be better to use streams for file I/O
> instead of the corresponding functions in C. However, I have stumbled into a
> slight problem. Contrary to what I had expected, std::ofstream::write(), as
> well as std::ifstream.read(), accept a pointer to char instead of a pointer
> to void as the first argument.
>

> This did not seem to be a problem until I tried to write data to a file like
>
> void SaveData( std::ofstream& s )
> {
> int i( 0 );
>
> s.write( &i, sizeof i ); // Error. Cannot convert int* to char*.
> }
>
> I first tried to use a static_cast from int* to char*, but the compiler
> (VC++ 6.0) said that an interpret_cast was required. An interpet_cast sounds
> like having portability problems sooner or later, so I do not want to use
> it.

The compiler is right concerning the direct cast, but you can
static_cast int* to void*, and then void* to int*. But why would you
want to? From your post it seems that you are expecting I/O in C++ to
work just like in C, and it doesn't. In this case, the output would
either be binary, if you opened the stream for binary output, or
garbage, since you would get the char-array equivalent of your data in
ASCII.

> I read in the FAQ that a pointer to a char can be implemented much
> differently from a pointer to an int, or any other type. Maybe this is the
> reason why the static_cast does not work.

No, it doesn't work because pointers are not generally
interconvertible in C++ ... they may only be cast to and from void (or
between accessible classes in a hierarchy), unless you use
reinterpret_cast. This is a 'good thing' .. for example in your
case, it probably saved you some puzzlement and debugging in the case
that your attempt to pass the int* to write had silently "worked".
The write member function of std::basic_ostream takes a
pointer-to-char because it is for doing char-by-char (or byte-by-byte)
output.

> My next guess was that I should not use std::ofstream::write(), but rather

> pass the data to std::ofstream::operator<<(). The function above would then

> look like
>
> void SaveData( std::ofstream& s )
> {
> int i( 0 );
>
> s << i; // Fine.
> }
>
> However, what if I actually have an array of ints? Do I have to iterate

> through each member in a loop and save each element individually? That would

> be a severe disadvantage towards C.

You could indeed iterate through a hand-written loop ... or you could
use the STL (preferred) and pass a custom output functor to the
for_each algorithm ...
for example:

#include <ostream>
#include <iostream>
#include <fstream>
#include <algorithm>
#include <vector>

// Generalized functor for basic output of container elements
template <typename T>
class Print_element {
std::ostream& _strm;
char _term; // to allow custom terminator (separator semantics are
harder)
public:
explicit Print_element(std::ostream &s, char t=' '): _strm(s),
_term(t) {}
void operator()(const T& e) {
_strm << e << _term;
}
};

int main() {
int i[]={1,2,3,4,5};
size_t n=sizeof(i)/sizeof(int);
std::for_each(i, i+n, Print_element<int>(std::cout));
std::cout << endl;

// now do file output
std::ofstream ofile("foo.txt");
std::vector<double> d;
d.push_back(1.0);
d.push_back(sqrt(2.0));
d.push_back(4*atan(1.0));
d.push_back(exp(1.0));
std::for_each(d.begin(), d.end(), Print_element<double>(ofile,
'\n'));

}

The templated Print_element functor above is quite simple, but
hopefully you can see the advantages of the general approach,
especially since you can overload operator<< for user defined types.

Finally, for binary output, you could convert to void* and back to
*char and use write, as you tried to do in your example above. This
might be necessary for you, but it requires quite a lot of extra care,
as I recently learned myself on this NG ... see my posts and James
Kanze's replies in the following thread. He eventually got me
straightened out.

http://www.google.com/groups?hl=en&lr=&ie=UTF-8&selm=306d400f.0404190207.3c4ec8f6%40posting.google.com

> If all pointer types possibly differ in implementation, how can the C

> functions for file I/O (which use void pointers) work at all? Imagine the

> following situation:
>
> int _write( HFILE file, void* pv, size_t size )

> {
> char* pc = ( char* ) pv;
>

> // Save data.
> }
>
> void f( HFILE hFile, int i )
> {
> _write( hFile, &i, sizeof i ); // Cast *int to *void, then from void* to
> char* !!!
> }
>

> How can this work at all, and what's the difference for streams?
>

Well, I don't really remember much about the way to do file output in
C, because I have been using the C++ way for about 10 years ..
however, AFAICS your above code should work ... except the C-style
cast in _write is ugly .. better to use static_cast to clarify your
intent. Hopefully my example above helps to illustrate why I prefer
the C++ way.

The "difference for streams" to which you are likely referring is a
detail that you probably won't need to know about .. but it is
documented somewhere in the Standard if you want to dig through it.
In C++, formatted, stream-based I/O occurs at a higher-level operation
than the equivalent operations in C, so you typically don't need to
worry about pointer-casts and stuff like that ... the STL takes care
of it for you. Just use the << and >> operators and get on with your
work 8*).

HTH, Dave Moore

Matthias Hofmann

unread,

May 18, 2004, 10:58:37 AM5/18/04

to

Carl Barron <cbarr...@adelphia.net> schrieb in im Newsbeitrag:
160520042058024975%cbarr...@adelphia.net...

> In article <c88n6f$66j$1...@news1.nefonline.de>, Matthias Hofmann
> <hof...@anvil-soft.com> wrote:
>
> > This did not seem to be a problem until I tried to write data to a file
like
> >
> > void SaveData( std::ofstream& s )
> > {
> > int i( 0 );
> >
> > s.write( &i, sizeof i ); // Error. Cannot convert int* to char*.
> > }
>
> s.write(static_cast<char *>(static_cast<void *>(&i)),sizeof i);
>
> or more generally where OStream is any basic_ostream<C,T> and Item is a
> POD, [copyable via
> template <class OStream,class Item>
> OStream & SaveData(OStream &os,Item *x)
> {
> return os.write(static_cast<static_cast<void *>(x)),sizeof(Item));
> }
>

That looks like is does compile, but it doesn't seem to be any more portable
than using a reinterpret_cast. I thought that performing a static_cast on a
void pointer is undefined unless it is cast back to the type that it used to
be?

Best regards,

Matthias Hofmann

Roger Orr

unread,

May 18, 2004, 11:29:09 AM5/18/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> wrote in message
news:c88n6f$66j$1...@news1.nefonline.de...
> Hello!
>
> I have just decided that it would be better to use streams for file I/O
> instead of the corresponding functions in C. However, I have stumbled into
a
> slight problem. Contrary to what I had expected, std::ofstream::write(),
as
> well as std::ifstream.read(), accept a pointer to char instead of a
pointer
> to void as the first argument.
>

> This did not seem to be a problem until I tried to write data to a file
like
>
> void SaveData( std::ofstream& s )
> {
> int i( 0 );
>
> s.write( &i, sizeof i ); // Error. Cannot convert int* to char*.
> }
>

> I first tried to use a static_cast from int* to char*, but the compiler
> (VC++ 6.0) said that an interpret_cast was required. An interpet_cast
sounds
> like having portability problems sooner or later, so I do not want to use
> it.

If you are writing raw ints to a file you have portability problems
already - what does one more matter :-)
Since you are using file streams you can be more portable by writing text:-

s << i;

will write a string representation of 'i' to the file.
You will need to think about field delimiters to enable you to read data
back in.

Roger Orr
--
MVP in C++ at www.brainbench.com

ka...@gabi-soft.fr

unread,

May 18, 2004, 5:48:24 PM5/18/04

to

dtm...@rijnh.nl (Dave Moore) wrote in message
news:<306d400f.04051...@posting.google.com>...

[...]

> From your post it seems that you are expecting I/O in C++ to
> work just like in C, and it doesn't.

Just a nit (since I agree with what you are saying), but I/O in C++ does
work just like in C, at least if you take care to imbue the "C" locale.
The semantics of C++ I/O are defined in terms of those of C I/O.

I'm just not sure that the original poster understood the issues in C
either.

[...]

> > If all pointer types possibly differ in implementation, how can the
> > C functions for file I/O (which use void pointers) work at all?
> > Imagine the following situation:

> > int _write( HFILE file, void* pv, size_t size )
> > {
> > char* pc = ( char* ) pv;
> >
> > // Save data.
> > }

> > void f( HFILE hFile, int i )
> > {
> > _write( hFile, &i, sizeof i ); // Cast *int to *void, then from void* to
> > char* !!!
> > }

> > How can this work at all, and what's the difference for streams?

> Well, I don't really remember much about the way to do file output in
> C, because I have been using the C++ way for about 10 years ..

What he wrote isn't C either. There is no function _write in C, nor can
a user define one at file scope, and there is no type HFILE.

The function you would use for this in C is:

size_t fwrite( void const* restrict buffer,
size_t element_size,
size_t element_count,
FILE* restrict dest ) ;

It does use a void*, rather than a char*. On the other hand, the
concerns you mentionned concerning opening the file in binary mode are
also relevant here.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Matthias Hofmann

unread,

May 19, 2004, 5:35:58 AM5/19/04

to

<ka...@gabi-soft.fr> schrieb in im Newsbeitrag:
d6652001.04051...@posting.google.com...

> "Matthias Hofmann" <hof...@anvil-soft.com> wrote in message
> news:<c88n6f$66j$1...@news1.nefonline.de>...
>

> What you pass certainly isn't an array of characters. It does have a
> type, however: raw (preformatted) bytes. Traditionally, C has used
> unsigned char for this, but on most machines, char works just as well.
> In fact, given the amount of code which would break if char didn't work,
> I'm sure that on any exotic hardware where a signed char would cause
> problems, plain char would be unsigned, just to avoid these problems.

This is another thing that has kept confusing me: what is the problem with
the signed char type? And why are there 3 types (signed, unsigned and plain)
of char at all? It does make sense for code like

void f( signed char c )
{
if ( c < 0 ) { /* whatever */ }
}

but why does it make any difference when walking through memory?

> There is a requirement that void* subsume all other types. That you can
> convert a pointer to void* and back to its original type without loss of
> information. Back to its original type only, however.
>
> Or to unsigned char* or char* -- it is also guaranteed that you can
> access the underlying raw memory as an unsigned char or char.

Why not to signed char?

> Which is guaranteed to work. For a specific definition of "work": the
> resulting char* is guaranteed to point to the first byte of the
> underlying representation of the original object, and it is guaranteed
> that you can access the underlying bytes as raw memory.

Then a char pointer is actually the best choice if I want to point to raw
memory, e.g. in order to write it to a file?

Best regrads,

Matthias

Matthias Hofmann

unread,

May 19, 2004, 5:36:21 AM5/19/04

to

Dave Moore <dtm...@rijnh.nl> schrieb in im Newsbeitrag:
306d400f.04051...@posting.google.com...

>
> The compiler is right concerning the direct cast, but you can
> static_cast int* to void*, and then void* to int*. But why would you
> want to? From your post it seems that you are expecting I/O in C++ to
> work just like in C, and it doesn't. In this case, the output would
> either be binary, if you opened the stream for binary output, or
> garbage, since you would get the char-array equivalent of your data in
> ASCII.

If it is so much more difficult (and apparently impossible) to simply write
some memory to a file in binary format, then I should maybe use my C
functions...

Please tell me what the generally preferred method for writing binary data
to a file is in C++, as I can't believe that it doesn't work! What if you
read a picture file or audio data from a disk? You are not saying that you
can't write code for that in C++, are you?

Regards,

Matthias

Matthias Hofmann

unread,

May 19, 2004, 5:40:14 AM5/19/04

to

Maciej Sobczak <no....@no.spam.com> schrieb in im Newsbeitrag:
c89rhq$85t$1...@atlantis.news.tpi.pl...

>
> reinterpret_cast does not impose more portability problems than using
> binary data representation itself, so you should not bother.
> In other words, by deciding to use binary files you just dive into the
> world constrained to what is defined by your hardware and by your
> compiler. The reinterpret_cast is a way to tell the compiler that you
> are aware of the consequences. If you are *really* sure that you want to
> write the data exactly as it is in memory (and that you know how to read
> it back to get meangingful results) then reinterpret_cast is your best
> friend.

According to 5.2.10 / 7, the result of using a reinterpret_cast to convert
from int* to char* is unspecified. That means I can merely guess what's
happening, and that makes me nervous. According to my understanding of
5.2.10, the main purpose of reinterpret_cast is to cast pointers forth and
back, as this always yields the original pointer (provided that alignment
restrictions are taken care of). One exception seems to be casting from
pointer to integer, where the result is implementation defined, but this
does not help me in this case.

Reading the other answers in this thread, I got the expression that the best
thing I can do is a static_cast to void*, and then to char*. What do you
think about the following template:

template <class T> T mighty_cast( void* p ) { return static_cast<T>( p ); }

int main()
{
int* pi = 0;

char* pc = mighty_cast<char*>( pi );

return 0;
}

This is actually very similar to reinterpret_cast, the difference being that
the result in the example above is well defined.

Best regards,

Matthias

"Philipp Bachmann" <"reverse email address"

unread,

May 19, 2004, 10:32:36 PM5/19/04

to

> > The compiler is right concerning the direct cast, but you can
> > static_cast int* to void*, and then void* to int*. But why would you
> > want to? From your post it seems that you are expecting I/O in C++ to
> > work just like in C, and it doesn't. In this case, the output would
> > either be binary, if you opened the stream for binary output, or
> > garbage, since you would get the char-array equivalent of your data in
> > ASCII.
>
> If it is so much more difficult (and apparently impossible) to simply
write
> some memory to a file in binary format, then I should maybe use my C
> functions...
>
> Please tell me what the generally preferred method for writing binary data
> to a file is in C++, as I can't believe that it doesn't work! What if you
> read a picture file or audio data from a disk? You are not saying that you
> can't write code for that in C++, are you?

You can establish a second stream hierarchy for binary output, if you want
to
keep the usual C++ idiom for I/O. There are some of them already available,
Take a look e.g. to Dietmar Kuehl's "XDRStream", now part of the "Berlin"
project.
As far as I know RogueWave has similar streams within their "Tools.h++"
product, too.

Cheers,
Philipp.

Dietmar Kuehl

unread,

May 20, 2004, 6:51:41 AM5/20/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> wrote:
> If it is so much more difficult (and apparently impossible) to simply write
> some memory to a file in binary format, then I should maybe use my C
> functions...

Well, this will just provide a simple to use interface for reading and
writing data in an unknown format: any change of the compiler, the library,
the underlying platform, etc. will break your binary data. IMO your C code
writing binary data with 'fwrite()' and reading it back in with 'fread()'
is already broken. Even worse, it silently broken and you will only realize
that your old data is broken when it is too late (as I have seen in real
live in a project: people had much fun repeating the work of the last three
month because the backups were broken...).

> Please tell me what the generally preferred method for writing binary data
> to a file is in C++, as I can't believe that it doesn't work!

You can read and write binary files in C++, of course. You just have to
understand that it is just another form of formatting data into a well
known format which is very different from reading or writing structures.
For example, in the project I'm working for we use a binary formatting
system which is based, although not in code but in design and intent, on
a binary stream system I wrote a long time ago: see
<http://www.dietmar-kuehl.de/cxxrt/binio.tar.gz>. This provides formatting
functions for built-in types and you would create formatting functions for
user defined types using these very similar to the text formatted stream
operators.

> What if you
> read a picture file or audio data from a disk? You are not saying that you
> can't write code for that in C++, are you?

If you look closer at the picture formats, you will noticed that these fall
clearly into the category of formatted binary data. In fact, the various
picture *formats* differ in, well, their formats :-) That is, you would
read and write pictures the same way as you would write any other binary
data: you format it into a sequence of bytes (aka 'char's) and sent them
to an appropriate write function. Later you read bytes and convert them
back into the data in your programs.
--
<mailto:dietma...@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.contendix.com> - Software Development & Consulting

Matthias Hofmann

unread,

May 20, 2004, 6:54:29 AM5/20/04

to

<ka...@gabi-soft.fr> schrieb in im Newsbeitrag:

d6652001.0405...@posting.google.com...

> dtm...@rijnh.nl (Dave Moore) wrote in message
> news:<306d400f.04051...@posting.google.com>...
>
> [...]
> > From your post it seems that you are expecting I/O in C++ to
> > work just like in C, and it doesn't.
>
> Just a nit (since I agree with what you are saying), but I/O in C++ does
> work just like in C, at least if you take care to imbue the "C" locale.
> The semantics of C++ I/O are defined in terms of those of C I/O.
>
> I'm just not sure that the original poster understood the issues in C
> either.

Well, I thought I did, but now it looks like I don't know anything. I
thought I could just pass a pointer to a function and write it to the file.
I also don't know what the "C" locale is good for, I wonder how my programs
could ever work at all.

>
> What he wrote isn't C either. There is no function _write in C, nor can
> a user define one at file scope, and there is no type HFILE.

You are right, the function and HFILE type are Windows specific. I found
them in the MSDN Library and I liked them better than the ones from C. But
why can't a user define such a function at file scope, is it because its
name starts with an underscore?

Best regards,

Matthias

Dave Moore

unread,

May 20, 2004, 6:57:35 AM5/20/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> wrote in message news:<c8dj2c$c6k$1...@news1.nefonline.de>...

No it ain't ... assuming that you have multi-byte int's and
single-byte char's, what you are doing is picking off the first byte
of the int and using it as a char. This is exactly what would be done
by reinterpret_cast. A reinterpret_cast is only well-defined when the
object being cast-to has exactly the same size and bit representation
as the object being cast from. Even then, the results will be
platform-dependent, due to the different binary representations of
data-types. In one of my other post's in this thread, I referred you
to a discussion in this NG between myself and James Kanze about the
proper way to do binary output in C++ ... I suggest you read it .. it
really helped me a lot with my understanding.

HTH, Dave Moore

Dave Moore

unread,

May 20, 2004, 6:58:06 AM5/20/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> wrote in message news:<c8df70$987$1...@news1.nefonline.de>...

> Dave Moore <dtm...@rijnh.nl> schrieb in im Newsbeitrag:
> 306d400f.04051...@posting.google.com...
> >
> > The compiler is right concerning the direct cast, but you can
> > static_cast int* to void*, and then void* to int*. But why would you
> > want to? From your post it seems that you are expecting I/O in C++ to
> > work just like in C, and it doesn't. In this case, the output would
> > either be binary, if you opened the stream for binary output, or
> > garbage, since you would get the char-array equivalent of your data in
> > ASCII.
>
> If it is so much more difficult (and apparently impossible) to simply write
> some memory to a file in binary format, then I should maybe use my C
> functions...

Did you really read my post carefully? I wasn't saying it was
particularly difficult .. only that you were going about it the wrong
way 8*). It is somewhat tricky to do it properly ... especially if
you care about portability, but I guess that is also true in C.

[As an aside, you never said specifically that you were trying to do
binary output .. I suspected you might be (which is why I commented on
it) but it wasn't clear, especially when you started using operator
>>, which is for formatted-text output only.]

> Please tell me what the generally preferred method for writing binary data
> to a file is in C++, as I can't believe that it doesn't work! What if you
> read a picture file or audio data from a disk? You are not saying that you
> can't write code for that in C++, are you?

Of course I am not saying that ... I already gave you a link to an
extensive discussion in this Newsgroup between myself and James Kanze
about how to properly handle binary data in C++ .. I can't really do
any better than that. If you have specific questions, I will try to
help you with them.

HTH, Dave Moore

Maciej Sobczak

unread,

May 20, 2004, 7:44:26 AM5/20/04

to

Hi,

Matthias Hofmann wrote:

> According to 5.2.10 / 7, the result of using a reinterpret_cast to convert
> from int* to char* is unspecified.

Yes. To some extent, it relies on your understanding of the anatomy of
your data types. That's the whole problem (and fun) with binary I/O.

> That means I can merely guess what's
> happening, and that makes me nervous. According to my understanding of
> 5.2.10, the main purpose of reinterpret_cast is to cast pointers forth and
> back, as this always yields the original pointer (provided that alignment
> restrictions are taken care of).

> Reading the other answers in this thread, I got the expression that the best

> thing I can do is a static_cast to void*, and then to char*.

As far as I understand the prose in 5.2.9/10, the pair:

static_cast<char*>(static_cast<void*>(p))

does not make the game any more defined than the direct reinterpret_cast.
There is a statement (in 5.2.9/10) that:

"A value of type pointer to object converted to "pointer to cv void" and
back to the ORIGINAL POINTER TYPE will have its original value."

The emphasis is mine and means that the pair of casts above is not
covered, because it ends up with different pointer type than the
original. The same level of mystery as with reinterpret_cast.

I am pretty close to the conclusion that:

1. a pair of static_casts from int* to void* to char* is not well-defined

2. reinterpret_cast from int* to char* is not well-defined

3. Compilers are free to set up things so that (1) and (2) give
different results, but from the lack of any reason to do so, (1) and (2)
are pretty equivalent; they both give unspecified results, but I expect
them to be *the same* unspecified results.

That's why I prefer this:

int *pi;
char *pc;

pc = reinterpret_cast<char*>(pi);

to this:

pc = static_cast<char*>(static_cast<void*>(pi));

(Of course, doing binary I/O makes sense only on machines where we know
enough about the data types to expect reasonable results. On machines
where this is not the case - different sizes of pointer types, etc - I
would just not dare to try this kind tricks.)

But I'm learning and I will welcome some more light from those who
really know the intents expressed in the standard.

--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/

Samuel Krempp

unread,

May 20, 2004, 7:46:08 AM5/20/04

to

le Thursday 20 May 2004 04:32, \"reverse email address\""
<ed....@nnamhcab.ppilihp écrivit :

>> If it is so much more difficult (and apparently impossible) to simply
> write
>> some memory to a file in binary format, then I should maybe use my C
>> functions...
>>
>> Please tell me what the generally preferred method for writing binary
>> data to a file is in C++, as I can't believe that it doesn't work! What
>> if you read a picture file or audio data from a disk? You are not saying
>> that you can't write code for that in C++, are you?
>
> You can establish a second stream hierarchy for binary output, if you want
> to
> keep the usual C++ idiom for I/O. There are some of them already
> available, Take a look e.g. to Dietmar Kuehl's "XDRStream", now part of
> the "Berlin" project.
> As far as I know RogueWave has similar streams within their "Tools.h++"
> product, too.

That's an easy way to do portable binary I/O.
But to make a clear answer to matthias's question and sum up other messages,
C++ streams can do binary I/O just as well as C : to write or read chunks
of bytes, there are the unformatted I/O functions write / read of streams
(or at streambuf level the sputn / sgetn functions).
What seemed to confuse you is just that those chunks of bytes are to be
passed -quite logically- by pointer to char, not void*.

--
Samuel.Krempp
cout << "@" << "crans." << (is_spam ? "trucs.en.trop." : "" )
<< "ens-cachan.fr" << endl;

James Kanze

unread,

May 20, 2004, 9:45:14 AM5/20/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> writes:

|> That looks like is does compile, but it doesn't seem to be any more
|> portable than using a reinterpret_cast. I thought that performing a
|> static_cast on a void pointer is undefined unless it is cast back to
|> the type that it used to be?

There's a special exception for character types. Otherwise, how would
you implement things like memcpy?

--
James Kanze

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung

9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

James Kanze

unread,

May 20, 2004, 9:47:09 AM5/20/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> writes:

|> <ka...@gabi-soft.fr> schrieb in im Newsbeitrag:
|> d6652001.04051...@posting.google.com...
|> > "Matthias Hofmann" <hof...@anvil-soft.com> wrote in message
|> > news:<c88n6f$66j$1...@news1.nefonline.de>...

|> > What you pass certainly isn't an array of characters. It does
|> > have a type, however: raw (preformatted) bytes. Traditionally, C
|> > has used unsigned char for this, but on most machines, char works
|> > just as well. In fact, given the amount of code which would break
|> > if char didn't work, I'm sure that on any exotic hardware where a
|> > signed char would cause problems, plain char would be unsigned,
|> > just to avoid these problems.

|> This is another thing that has kept confusing me: what is the
|> problem with the signed char type?

It could have two representations of 0.

|> And why are there 3 types (signed, unsigned and plain) of char at
|> all?

Because historically, char was signed on some implementations, and
unsigned on others. So signed char is guaranteed signed, unsigned char
unsigned, and char whatever.

|> It does make sense for code like

|> void f( signed char c )
|> {
|> if ( c < 0 ) { /* whatever */ }
|> }

|> but why does it make any difference when walking through memory?

Because copying a signed char could normalize the representation -- e.g.
convert all 0's (positive or negative) to positive 0's.

This is not allowed in an unsigned char. And I'm not sure, but I don't
think that C++ allows a change in the bit pattern when copying a char.

|> > There is a requirement that void* subsume all other types. That
|> > you can convert a pointer to void* and back to its original type
|> > without loss of information. Back to its original type only,
|> > however.

|> > Or to unsigned char* or char* -- it is also guaranteed that you
|> > can access the underlying raw memory as an unsigned char or char.

|> Why not to signed char?

See above. On an 8 bit ones complement machine, how would you
distinguish between the bit patterns 0xFF and 0x00?

|> > Which is guaranteed to work. For a specific definition of "work":
|> > the resulting char* is guaranteed to point to the first byte of
|> > the underlying representation of the original object, and it is
|> > guaranteed that you can access the underlying bytes as raw
|> > memory.

|> Then a char pointer is actually the best choice if I want to point
|> to raw memory, e.g. in order to write it to a file?

I tend to favor unsigned char, but mainly because of tradition, I think.

In practice, you never really want to write raw memory to a file, of
course. You want to format a buffer, and write it to the file. Given
the C++ interface in iostream, if I were writing it through iostream, I
would probably use char for the buffer. (In practice, every time I've
wanted to do something like this, I've ended up using a lower level
protocol -- the writes had to be atomic, or synchronized, or something
else that the iostream/filebuf pair doesn't support.)

--
James Kanze

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung

9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

James Kanze

unread,

May 20, 2004, 9:48:11 AM5/20/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> writes:

|> Maciej Sobczak <no....@no.spam.com> schrieb in im Newsbeitrag:
|> c89rhq$85t$1...@atlantis.news.tpi.pl...

|> > reinterpret_cast does not impose more portability problems than
|> > using binary data representation itself, so you should not
|> > bother. In other words, by deciding to use binary files you just
|> > dive into the world constrained to what is defined by your
|> > hardware and by your compiler. The reinterpret_cast is a way to
|> > tell the compiler that you are aware of the consequences. If you
|> > are *really* sure that you want to write the data exactly as it
|> > is in memory (and that you know how to read it back to get
|> > meangingful results) then reinterpret_cast is your best friend.

|> According to 5.2.10 / 7, the result of using a reinterpret_cast to
|> convert from int* to char* is unspecified.

The intent is specified.

|> That means I can merely guess what's happening, and that makes me
|> nervous.

It's implementation defined. The implementation documentation must
specify what is happening. Reading the implementation documentation is
not the same thing as guessing.

|> According to my understanding of 5.2.10, the main purpose of
|> reinterpret_cast is to cast pointers forth and back, as this always
|> yields the original pointer (provided that alignment restrictions
|> are taken care of).

The main purpose of reinterpret_cast is implementation defined type
punning. Exactly what you are doing.

|> One exception seems to be casting from pointer to integer, where the
|> result is implementation defined, but this does not help me in this
|> case.

|> Reading the other answers in this thread, I got the expression that
|> the best thing I can do is a static_cast to void*, and then to
|> char*.

If the implementation gives different results for a direct
reinterpret_cast, change compilers. The standard may not require the
same results, but quality of implementation issues do. And since what
you are doing is so highly implementation specific anyway, you want to
call attention to this fact -- and that is the message reinterpret_cast
passes to the reader.

--
James Kanze
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Maciej Sobczak

unread,

May 20, 2004, 12:38:28 PM5/20/04

to

Hi,

As a continuation to my previous post, it may be a good thing to point
out that probably the only portable way to do binary output of objects
of non-char POD types (portable in the sense that the code is kosher C++
on every implementation, not in the sense that the resulting file will
be portable) is to use memcpy and the helper char buffer:

int i;
char buf[sizeof(int)];
memcpy(buf, &i, sizeof(int));

stream.write(buf, sizeof(int));

The example with memcpy is even included in the standard (3.9/2).
I add the write to stream in the context of this thread.

Maciej Sobczak

unread,

May 20, 2004, 12:39:55 PM5/20/04

to

Hi,

James Kanze wrote:

> "Matthias Hofmann" <hof...@anvil-soft.com> writes:
>
> |> That looks like is does compile, but it doesn't seem to be any more
> |> portable than using a reinterpret_cast. I thought that performing a
> |> static_cast on a void pointer is undefined unless it is cast back to
> |> the type that it used to be?
>
> There's a special exception for character types.

Would you please provide s specific paragraph in the standard that makes
this exception?
There is something that gives a similar licence, but it is stated in the
context of memory that outlives the object that is going to live in it
(or that was in it): 3.8/5. Does it apply also to "regular" objects? Why?

Interestingly, even within that paragraph the result of such conversion
is only not-undefined. That's still far from being well-defined as I
would like it to be.

> Otherwise, how would
> you implement things like memcpy?

Are *we* (the language users) supposed to implement them?

--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/

llewelly

unread,

May 20, 2004, 4:08:27 PM5/20/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> writes:

> <ka...@gabi-soft.fr> schrieb in im Newsbeitrag:
> d6652001.0405...@posting.google.com...
> > dtm...@rijnh.nl (Dave Moore) wrote in message
> > news:<306d400f.04051...@posting.google.com>...
> >
> > [...]
> > > From your post it seems that you are expecting I/O in C++ to
> > > work just like in C, and it doesn't.
> >
> > Just a nit (since I agree with what you are saying), but I/O in C++ does
> > work just like in C, at least if you take care to imbue the "C" locale.
> > The semantics of C++ I/O are defined in terms of those of C I/O.
> >
> > I'm just not sure that the original poster understood the issues in C
> > either.
>
> Well, I thought I did, but now it looks like I don't know anything. I
> thought I could just pass a pointer to a function and write it to
> the file.

The key here is knowing what format your data is in. If you just cast
pointers to object to char* and write the appropriatly-sized
memory block to the file, you don't know what the format
is. Since you don't know the format, you can't reliably read it
back. Practicly every compiler has flags that subtly alter the
memory layout of some objects, and many compilers change memory
layouts from one version to the next.

> I also don't know what the "C" locale is good for, I wonder how my programs
> could ever work at all.
>
> >
> > What he wrote isn't C either. There is no function _write in C, nor can
> > a user define one at file scope, and there is no type HFILE.
>
> You are right, the function and HFILE type are Windows specific. I found
> them in the MSDN Library and I liked them better than the ones from C. But
> why can't a user define such a function at file scope, is it because its
> name starts with an underscore?

All global scope names that begin with an underscore are reserved for
the implementation. This doesn't apply to names inside a class,
function, or namespace. See 17.4.3.1.2 2nd bullet .

llewelly

unread,

May 20, 2004, 4:08:50 PM5/20/04

to

Maciej Sobczak <no....@no.spam.com> writes:
[snip]

> (Of course, doing binary I/O makes sense only on machines where we know
> enough about the data types to expect reasonable results.

[snip]

What you need is a rigorously documented data format. It can be
binary or text so long as it is well-documented. 'we know enough
about the data types' is a subtly dangerous assumption; change
your compiler flags, or compiler version, and what you 'know'
becomes subtly wrong. Hire somebody new, and maybe they don't
'know'.

But if you have rigorously documented the meaning of each byte in the
format, you can always write (new) code to read the existing data, and
new people can always read the documentation.

> On machines
> where this is not the case - different sizes of pointer types, etc - I
> would just not dare to try this kind tricks.)

Every compiler I've ever used has at least a few flags which change
the layout of some objects.

Matthias Hofmann

unread,

May 20, 2004, 9:50:31 PM5/20/04

to

Maciej Sobczak <no....@no.spam.com> schrieb in im Newsbeitrag:

c8hus5$alr$1...@atlantis.news.tpi.pl...

>
> As far as I understand the prose in 5.2.9/10, the pair:
>
> static_cast<char*>(static_cast<void*>(p))
>
> does not make the game any more defined than the direct reinterpret_cast.
> There is a statement (in 5.2.9/10) that:
>
> "A value of type pointer to object converted to "pointer to cv void" and
> back to the ORIGINAL POINTER TYPE will have its original value."
>
> The emphasis is mine and means that the pair of casts above is not
> covered, because it ends up with different pointer type than the
> original. The same level of mystery as with reinterpret_cast.
>
> I am pretty close to the conclusion that:
>
> 1. a pair of static_casts from int* to void* to char* is not well-defined
>
> 2. reinterpret_cast from int* to char* is not well-defined
>
> 3. Compilers are free to set up things so that (1) and (2) give
> different results, but from the lack of any reason to do so, (1) and (2)
> are pretty equivalent; they both give unspecified results, but I expect
> them to be *the same* unspecified results.

According to 3.9.2/4, a void pointer shall be able to hold any object
pointer. In addition, 4.10/2 says that a pointer to any object type can be
converted to a void pointer and that the result is a pointer to the start of
the storage location where the object resides.

According to 4.10/2, a pointer to any object type can be converted to a void
pointer and that the result is a pointer to the start of the storage
location where the object resides. This is confirmed by 3.9.2/4, which
furthermore says that a pointer to void shall have the same representation
and alignment requirements as a pointer to char.

The conversion from pointer to object to pointer to void can be done
implicitly or using a static_cast (5.2.9/4), the conversion to char can be
done by a static_cast (5.2.9/10).

I guess that is the exception for char pointers that James Kanze mentioned
in another post of this threat. I have spent a lot of time looking for the
corresponding evidence from the standard, and this is all I found.

Best regards,

Matthias Hofmann

unread,

May 21, 2004, 5:47:58 AM5/21/04

to

James Kanze <ka...@gabi-soft.fr> schrieb in im Newsbeitrag:
86zn83c...@lns-vlq-21-82-255-58-67.adsl.proxad.net...

> "Matthias Hofmann" <hof...@anvil-soft.com> writes:
>
> |> According to 5.2.10 / 7, the result of using a reinterpret_cast to
> |> convert from int* to char* is unspecified.
>
> The intent is specified.
>
> |> That means I can merely guess what's happening, and that makes me
> |> nervous.
>
> It's implementation defined. The implementation documentation must
> specify what is happening. Reading the implementation documentation is
> not the same thing as guessing.

The intent is only specified for casting from pointer to integral type, in
5.2.10/4. 5.2.10/3 does say that the mapping performed by reinterpret_cast
is implementation defined, but why does 5.2.10/7 say it is unspecified?
These two passages contradict each other.

Best regards,

Matthias Hofmann

unread,

May 21, 2004, 5:49:05 AM5/21/04

to

----- Original Message -----
From: James Kanze <ka...@gabi-soft.fr>
Newsgroups: comp.lang.c++.moderated
Sent: Thursday, May 20, 2004 3:47 PM
Subject: Re: Using std::ofstream::write() to save data
>
>
> Because copying a signed char could normalize the representation -- e.g.
> convert all 0's (positive or negative) to positive 0's.
>
> This is not allowed in an unsigned char. And I'm not sure, but I don't
> think that C++ allows a change in the bit pattern when copying a char.

You mean, doing a memcpy() with signed chars as the source buffer could lead
to a different bit pattern in the destination buffer? Or are you talking
about assignment?

Section 3.9/2 says that an array of chars or unsigned chars can be used in
order to copy an object forth and back without changing its value, but I
don't know if that's what you mean.

>
> |> > There is a requirement that void* subsume all other types. That
> |> > you can convert a pointer to void* and back to its original type
> |> > without loss of information. Back to its original type only,
> |> > however.
>
> |> > Or to unsigned char* or char* -- it is also guaranteed that you
> |> > can access the underlying raw memory as an unsigned char or char.

Please point me to the relevant section in the standard. I have only found
3.9.2/4, which only mentions char*, but not unsigned char*.

>
> |> Why not to signed char?
>
> See above. On an 8 bit ones complement machine, how would you
> distinguish between the bit patterns 0xFF and 0x00?
>

I can see the problem in interpreting the bit pattern, but I still do not
understand why I should not be able to get a pointer to it. Could you give a
brief code example that demonstrates the potential problem on certain CPUs?

>
> |> Then a char pointer is actually the best choice if I want to point
> |> to raw memory, e.g. in order to write it to a file?
>
> I tend to favor unsigned char, but mainly because of tradition, I think.

3.9.2/4 says that a pointer to void shall have the same representation and
alignment requirements as a pointer to char. However, it does not give such
guarantee for unsigned (or even signed char) pointers. Therefore, I'd favor
plain char.

> In practice, you never really want to write raw memory to a file, of
> course. You want to format a buffer, and write it to the file. Given
> the C++ interface in iostream, if I were writing it through iostream, I
> would probably use char for the buffer. (In practice, every time I've
> wanted to do something like this, I've ended up using a lower level
> protocol -- the writes had to be atomic, or synchronized, or something
> else that the iostream/filebuf pair doesn't support.)

Please forgive my asking so stupid questions, but what kind of buffer
formatting can there be necessary in order to write some integers to a file?
I got binary data in memory, so I want binary data in my file. Using an
ofstream object in binary mode should do the job, shouldn't it?

Best regards,

Matthias Hofmann

unread,

May 21, 2004, 5:55:07 AM5/21/04

to

Maciej Sobczak <no....@no.spam.com> schrieb in im Newsbeitrag:

c8ieai$7l2$1...@nemesis.news.tpi.pl...

> Hi,
>
> As a continuation to my previous post, it may be a good thing to point
> out that probably the only portable way to do binary output of objects
> of non-char POD types (portable in the sense that the code is kosher C++
> on every implementation, not in the sense that the resulting file will
> be portable) is to use memcpy and the helper char buffer:
>
> int i;
> char buf[sizeof(int)];
> memcpy(buf, &i, sizeof(int));
>
> stream.write(buf, sizeof(int));
>
> The example with memcpy is even included in the standard (3.9/2).
> I add the write to stream in the context of this thread.

This seems to be a good idea. And you are actually pointing out my
intention, which is to produce portable code, not portable files. However,
according to my interpretation of the standard, I might as well use the
following functions, which should be equally portable:

void write( std::ofstream& ofs, const void* buffer, int count )
{
ofs.write( static_cast<const char*>( buffer ), count );
}

void read( std::ifstream& ifs, void* buffer, int count )
{
ifs.read( static_cast<char*>( buffer ), count );
}

Best regards,

Matthias

Matthias Hofmann

unread,

May 21, 2004, 5:57:35 AM5/21/04

to

Dave Moore <dtm...@rijnh.nl> schrieb in im Newsbeitrag:
306d400f.04051...@posting.google.com...
>

> [As an aside, you never said specifically that you were trying to do
> binary output .. I suspected you might be (which is why I commented on
> it) but it wasn't clear, especially when you started using operator
> >>, which is for formatted-text output only.]

If I open the file in binary mode, shouldn't operator<< write unformatted
data?

Regards,

Matthias

Matthias Hofmann

unread,

May 21, 2004, 5:58:31 AM5/21/04

to

Dietmar Kuehl <dietma...@yahoo.com> schrieb in im Newsbeitrag:
5b15f8fd.04051...@posting.google.com...

> "Matthias Hofmann" <hof...@anvil-soft.com> wrote:
> > If it is so much more difficult (and apparently impossible) to simply
write
> > some memory to a file in binary format, then I should maybe use my C
> > functions...
>
> Well, this will just provide a simple to use interface for reading and
> writing data in an unknown format: any change of the compiler, the
library,
> the underlying platform, etc. will break your binary data. IMO your C code
> writing binary data with 'fwrite()' and reading it back in with 'fread()'
> is already broken. Even worse, it silently broken and you will only
realize
> that your old data is broken when it is too late (as I have seen in real
> live in a project: people had much fun repeating the work of the last
three
> month because the backups were broken...).

I think it is important to distinguish between broken/unportable code and
broken/unportable files. Maybe I should have made clear from the beginning
that I am interested in portable code, but not so much in portable files.
Therefore, I took pains to find out wether all these pointer conversions
have the same result on all standard compliant compilers (which is a pointer
to the first byte/char of whatever object I want to write to a file).

> You can read and write binary files in C++, of course. You just have to
> understand that it is just another form of formatting data into a well
> known format which is very different from reading or writing structures.
> For example, in the project I'm working for we use a binary formatting
> system which is based, although not in code but in design and intent, on
> a binary stream system I wrote a long time ago: see
> <http://www.dietmar-kuehl.de/cxxrt/binio.tar.gz>. This provides formatting
> functions for built-in types and you would create formatting functions for
> user defined types using these very similar to the text formatted stream
> operators.

I guess the trick about producing portable files is to write an integer not
in its binary representation, but as an ASCII text, actually converting it
to a string, where each value is separated by a space or return character.
However, this is not what I am interested in, although I will remember this
just in case I need to create portable files one day.

> If you look closer at the picture formats, you will noticed that these
fall
> clearly into the category of formatted binary data. In fact, the various
> picture *formats* differ in, well, their formats :-) That is, you would
> read and write pictures the same way as you would write any other binary
> data: you format it into a sequence of bytes (aka 'char's) and sent them
> to an appropriate write function. Later you read bytes and convert them
> back into the data in your programs.

I think I understand what you mean. For example, a 32 bit integer could be
stored in big endian or little endian format. If you write a big endian
integer to a file and read it in on a little endian machine, you are in
trouble, unless you know that you must reverse the byte order.

So by formatting the binary data you mean, that each byte in the raw memory
must be in the order required by the file format. This is not a problem in
my case, as the project I am working at is intented to be run on x86 and
compatible machines only, and files will rarely be interchanged with each
other.

Best regards,

Matthias

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Maciej Sobczak

unread,

May 21, 2004, 3:13:08 PM5/21/04

to

Hi,

Matthias Hofmann wrote:

> According to 4.10/2, a pointer to any object type can be converted to a void
> pointer and that the result is a pointer to the start of the storage
> location where the object resides. This is confirmed by 3.9.2/4, which
> furthermore says that a pointer to void shall have the same representation
> and alignment requirements as a pointer to char.

I think that this is the most important paragraph here.
However, there is still no requirement that the pair of conversions T*
-> void* -> char* gives predictible and well-defined results.
We can try to deduce this requirement, but the loose collection of
paragraphs still misses something. There are statements scattered around
saying that the results are well-defined if the conversions are to void*
and back to the *original* type. Why such disclaimers? To indicate that
other conversions aren't well-defined? Such disclaimers would not be
needed if the rule were applicable to other conversions as well.
That's what I understand.

I'm aware of the fact that most of this discussion is just a
hair-splitting, because I simply cannot imagine a platform where a pair
of static_casts or a direct reinterpret_cast does not give what the
programmer expects, but I'm still not satisfied with the specification
in the standard.

I would like to see a specific statement like:

"The result of converting a pointer to cv-void to a pointer to cv-char
points to the beginning of the storage location and can be used to read
the consecutive bytes of this storage."

Or even better:

"The result of converting a pointer to cv T to a pointer to cv void and
then to a pointer to cv char points to the beginning of the storage
location occupied by the T object, as if the original object and an
imaginary array of chars of appropriate size were declared in the same
union:

union
{
T obj;
char buf[sizeof(T)];
};

(the cv-qualification should be applied respectively)

The converted pointer can be used to read the consecutive bytes of
memory occupied by the T object.
"

This is the missing link I would like to see explicitly stated in the
standard.

--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/

ka...@gabi-soft.fr

unread,

May 21, 2004, 6:50:03 PM5/21/04

to

Maciej Sobczak <no....@no.spam.com> wrote in message
news:<c8idiq$oqb$1...@atlantis.news.tpi.pl>...

> James Kanze wrote:

> > "Matthias Hofmann" <hof...@anvil-soft.com> writes:

> > |> That looks like is does compile, but it doesn't seem to be any
> > |> more portable than using a reinterpret_cast. I thought that
> > |> performing a static_cast on a void pointer is undefined unless
> > |> it is cast back to the type that it used to be?

> > There's a special exception for character types.

> Would you please provide s specific paragraph in the standard that makes
> this exception?

It's spread out all over the place:-):

§1.8/5:
Unless it is a bit-field, a most derived object shall have a
non-zero size and shall occupy one or more bytes of storage.

[Raw memory can be considered a array of "bytes".]

§4.10/2:
An rvalue of type "pointer to cv T," where T is an object type, can
be converted to an rvalue of type "pointer to cv void." The result
of converting a "pointer to cv T" to a "pointer to cv void" points
to the start of the storage location where the object of type T
resides, as if the object is a most derived object of type T (that
is, not a base class subobject).

[Casting a pointer to T to a void* results in a pointer to the first
byte of this array of bytes.]

§5.3.3/1:
The sizeof operator yields the number of bytes in the object
representation of its operand. [...] sizeof(char), sizeof(signed
char) and sizeof(unsigned char) are 1; [...].

[Character types are bytes.]

§3.9.1/1:
For character types, all bits of the object representation
participate in the value representation. For unsigned character
types, all possible bit patterns of the value representation
represent numbers.

[Really -- they cannot contain padding, and in the case of unsigned
char, they must behave exactly as raw memory.]

Finally, a bit of deduction: the first two points mean that the result
of casting to void* must be a pointer to the same address as the address
of the array of bytes containing the object. An unsigned char* (or any
other character type) pointing to this array would result in the same
address. So casting it back to unsigned char* is legal, and has defined
behavior.

> There is something that gives a similar licence, but it is stated in
> the context of memory that outlives the object that is going to live
> in it (or that was in it): 3.8/5. Does it apply also to "regular"
> objects? Why?

That's not the key. The general guarantee is inherited from C (where it
is also spread out over a number of different sections.) The guarantee
in §3.8/5 is actually an explination of what the more general guarantee
means with regards to objects with non-trivial constructors and
destructors.

> Interestingly, even within that paragraph the result of such
> conversion is only not-undefined. That's still far from being
> well-defined as I would like it to be.

> > Otherwise, how would you implement things like memcpy?

> Are *we* (the language users) supposed to implement them?

You're not supposed to implement memcpy, but you are supposed to be able
to.

--
James Kanze GABI Software

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung

9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

ka...@gabi-soft.fr

unread,

May 21, 2004, 6:53:39 PM5/21/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> wrote in message

news:<c8iom7$e0h$1...@news1.nefonline.de>...

That's pretty much it. You also need the fact that sizeof(char) is
guaranteed to be one byte, and that all bits in a char participate in
the value representation. The sum of it is that the "raw memory" of an
object is an array of bytes, that a void* obtained from converting a
pointer to the object cannot in any way be distinguished from a void*
that would be obtained by converting a pointer to this raw memory, and
that you can effectively use unsigned char* (and in some contexts,
possibly char* or signed char*) as the type of this raw memory.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Matthias Hofmann

unread,

May 21, 2004, 8:32:47 PM5/21/04

to

llewelly <llewe...@xmission.dot.com> schrieb in im Newsbeitrag:
861xlff...@Zorthluthik.local.bar...

> "Matthias Hofmann" <hof...@anvil-soft.com> writes:
>
> All global scope names that begin with an underscore are reserved for
> the implementation. This doesn't apply to names inside a class,
> function, or namespace. See 17.4.3.1.2 2nd bullet .

Does this also apply to macro names? I have so far used the following kinds
of inclusion guards:

#ifndef _MYFILE_H_
#define _MYFILE_H_

// Declare what I need.

#endif // _MYFILE_H_

No I wonder, is this a violation of the standard?

Best regards,

Matthias

Ulrich Eckhardt

unread,

May 21, 2004, 8:40:05 PM5/21/04

to

Matthias Hofmann wrote:
> If I open the file in binary mode, shouldn't operator<< write unformatted
> data?

That's answered in the documentation/textbook. You might already have
guessed it, the answer is no. It only supresses conversion of lineendings.

Uli

--
FAQ: http://parashift.com/c++-faq-lite/

/* bittersweet C++ */
default: break;

llewelly

unread,

May 22, 2004, 5:39:44 AM5/22/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> writes:

> llewelly <llewe...@xmission.dot.com> schrieb in im Newsbeitrag:
> 861xlff...@Zorthluthik.local.bar...
>> "Matthias Hofmann" <hof...@anvil-soft.com> writes:
>>
>> All global scope names that begin with an underscore are reserved for
>> the implementation. This doesn't apply to names inside a class,
>> function, or namespace. See 17.4.3.1.2 2nd bullet .
>
> Does this also apply to macro names? I have so far used the following kinds
> of inclusion guards:
>
> #ifndef _MYFILE_H_
> #define _MYFILE_H_
>
> // Declare what I need.
>
> #endif // _MYFILE_H_
>
> No I wonder, is this a violation of the standard?

I believe it is. Someone once presented a counter argument, but I
forget the counter argument.

James Kanze

unread,

May 22, 2004, 4:03:29 PM5/22/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> writes:

|> Dietmar Kuehl <dietma...@yahoo.com> schrieb in im Newsbeitrag:
|> 5b15f8fd.04051...@posting.google.com...
|> > "Matthias Hofmann" <hof...@anvil-soft.com> wrote:
|> > > If it is so much more difficult (and apparently impossible) to
|> > > simply write some memory to a file in binary format, then I
|> > > should maybe use my C functions...

|> > Well, this will just provide a simple to use interface for
|> > reading and writing data in an unknown format: any change of the
|> > compiler, the library, the underlying platform, etc. will break
|> > your binary data. IMO your C code writing binary data with
|> > 'fwrite()' and reading it back in with 'fread()' is already
|> > broken. Even worse, it silently broken and you will only realize
|> > that your old data is broken when it is too late (as I have seen
|> > in real live in a project: people had much fun repeating the work
|> > of the last three month because the backups were broken...).

|> I think it is important to distinguish between broken/unportable
|> code and broken/unportable files. Maybe I should have made clear
|> from the beginning that I am interested in portable code, but not so
|> much in portable files. Therefore, I took pains to find out wether
|> all these pointer conversions have the same result on all standard
|> compliant compilers (which is a pointer to the first byte/char of
|> whatever object I want to write to a file).

I think we understood your question, but since it is so common for
people not to understand the limitations of just dumping a memory image
to disk. As others have already said, you cannot even be sure of being
able to read it if you recompile the same program with the same
compiler, but different options.

As far as I am concerned, memory dumps are acceptable for temporary
files, which will be deleted on program termination, but not for much
else.

|> > You can read and write binary files in C++, of course. You just
|> > have to understand that it is just another form of formatting
|> > data into a well known format which is very different from
|> > reading or writing structures. For example, in the project I'm
|> > working for we use a binary formatting system which is based,
|> > although not in code but in design and intent, on a binary stream
|> > system I wrote a long time ago: see
|> > <http://www.dietmar-kuehl.de/cxxrt/binio.tar.gz>. This provides
|> > formatting functions for built-in types and you would create
|> > formatting functions for user defined types using these very
|> > similar to the text formatted stream operators.

|> I guess the trick about producing portable files is to write an
|> integer not in its binary representation, but as an ASCII text,
|> actually converting it to a string, where each value is separated by
|> a space or return character. However, this is not what I am
|> interested in, although I will remember this just in case I need to
|> create portable files one day.

The trick about producing portable files is to write something well
defined, so you can write code to read it at some future date. Whenever
reasonable, I prefer text, because it makes debugging easier -- a simple
glance at the file will show me if I have written what I thought I did,
I can create files to test input with the editor, etc. But it is quite
possible to write portable files in binary format as well.

|> > If you look closer at the picture formats, you will noticed that
|> > these fall clearly into the category of formatted binary data. In
|> > fact, the various picture *formats* differ in, well, their
|> > formats :-) That is, you would read and write pictures the same
|> > way as you would write any other binary data: you format it into
|> > a sequence of bytes (aka 'char's) and sent them to an appropriate
|> > write function. Later you read bytes and convert them back into
|> > the data in your programs.

|> I think I understand what you mean. For example, a 32 bit integer
|> could be stored in big endian or little endian format.

Or 2's complement, or 1's complement or signed magnitude. For a long
time, I favored signed magnitude, because it is the easiest to program
portably. The Internet uses 2's complement, and given that most
machines today, and all new architectures, use 2's complement, I now use
a 2's complement representation. And code which will only work on
machines using 2's complement representation.

|> If you write a big endian integer to a file and read it in on a
|> little endian machine, you are in trouble, unless you know that you
|> must reverse the byte order.

Or unless you read and write it correctly. There is no need to know the
byte order on any machine to do this.

|> So by formatting the binary data you mean, that each byte in the raw
|> memory must be in the order required by the file format.

Order and representation.

|> This is not a problem in my case, as the project I am working at is
|> intented to be run on x86 and compatible machines only, and files
|> will rarely be interchanged with each other.

Rarely doesn't necessarily mean never. And I've seen at least two
different formats for longs on x86 -- from different versions of the
same compiler.

Note that as soon as you try to use something larger than an int, it
becomes even more complicated. Different compilers, or even different
options with the same compiler, introduce more or less padding at
different places in structs. The result is that you cannot necessarily
reread the data with a program compiled using different compiler flags.

--
James Kanze

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung

9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

James Kanze

unread,

May 22, 2004, 4:04:03 PM5/22/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> writes:

|> llewelly <llewe...@xmission.dot.com> schrieb in im Newsbeitrag:
|> 861xlff...@Zorthluthik.local.bar...
|> > "Matthias Hofmann" <hof...@anvil-soft.com> writes:

|> > All global scope names that begin with an underscore are reserved
|> > for the implementation. This doesn't apply to names inside a
|> > class, function, or namespace. See 17.4.3.1.2 2nd bullet .

|> Does this also apply to macro names? I have so far used the
|> following kinds of inclusion guards:

|> #ifndef _MYFILE_H_
|> #define _MYFILE_H_

All names beginning with an _ followed by a capital letter are reserved
for the implementation.

|> // Declare what I need.

|> #endif // _MYFILE_H_

|> No I wonder, is this a violation of the standard?

Definitely.

--
James Kanze
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

James Kanze

unread,

May 22, 2004, 4:05:07 PM5/22/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> writes:

|> Dave Moore <dtm...@rijnh.nl> schrieb in im Newsbeitrag:
|> 306d400f.04051...@posting.google.com...

|> > [As an aside, you never said specifically that you were trying to
|> > do binary output .. I suspected you might be (which is why I
|> > commented on it) but it wasn't clear, especially when you started
|> > using operator >>, which is for formatted-text output only.]

|> If I open the file in binary mode, shouldn't operator<< write
|> unformatted data?

No. Binary mode is simply passed on to the filebuf, and controls the
way the data is written to or read from the disk. It has nothing to do
with what those data are. If you use >>, you tell ostream to format
whatever into text format, and write the resulting text to the
streambuf.

Strangely enough, binary mode does not suppress code translation by the
imbued locale either. All it does is affect how the system represents
record separators (new lines in text format, inexistant in binary) and
end of file.

--
James Kanze
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

James Kanze

unread,

May 22, 2004, 4:07:15 PM5/22/04

to

Samuel Krempp <kre...@crans.ens-cachan.fr> writes:

|> le Thursday 20 May 2004 04:32, \"reverse email address\""
|> <ed....@nnamhcab.ppilihp écrivit :

|> >> If it is so much more difficult (and apparently impossible) to
|> >> simply write some memory to a file in binary format, then I
|> >> should maybe use my C functions...

|> >> Please tell me what the generally preferred method for writing
|> >> binary data to a file is in C++, as I can't believe that it
|> >> doesn't work! What if you read a picture file or audio data from
|> >> a disk? You are not saying that you can't write code for that in
|> >> C++, are you?

|> > You can establish a second stream hierarchy for binary output, if
|> > you want to keep the usual C++ idiom for I/O. There are some of
|> > them already available, Take a look e.g. to Dietmar Kuehl's
|> > "XDRStream", now part of the "Berlin" project. As far as I know
|> > RogueWave has similar streams within their "Tools.h++" product,
|> > too.

|> That's an easy way to do portable binary I/O.

|> But to make a clear answer to matthias's question and sum up other
|> messages, C++ streams can do binary I/O just as well as C : to write
|> or read chunks of bytes, there are the unformatted I/O functions
|> write / read of streams (or at streambuf level the sputn / sgetn
|> functions).

Just to be clear: there is no such thing as "unformatted data". All
data has a format. The only question is whether you know the format or
not.

Output routines may or may not format. In standard C++, the <<
operators format, using a text format. ostream::write and company do
not format. The fact that the output routine doesn't format doesn't
mean that the data isn't formatted; a more accurate characterization
would be the ostream::write and company do not change the format they
are given.

Since an ostream writes char's, it is normal that it take a buffer of
char as a parameter. It is also normal that the user "format" his data
into a buffer of char.

In this context, reinterpret_cast< char* > can be thought of as a
somewhat degenerate formatting routine. Using this as a formatting
routine, you output data in the internal format of the machine.
Whatever that happens to be, according to compiler options, etc.

--
James Kanze
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Matthias Hofmann

unread,

May 23, 2004, 7:48:35 AM5/23/04

to

<ka...@gabi-soft.fr> schrieb in im Newsbeitrag:

d6652001.04052...@posting.google.com...

>
> That's pretty much it. You also need the fact that sizeof(char) is
> guaranteed to be one byte, and that all bits in a char participate in
> the value representation. The sum of it is that the "raw memory" of an
> object is an array of bytes, that a void* obtained from converting a
> pointer to the object cannot in any way be distinguished from a void*
> that would be obtained by converting a pointer to this raw memory, and
> that you can effectively use unsigned char* (and in some contexts,
> possibly char* or signed char*) as the type of this raw memory.

I am still confused about the distinction between a pointer to char* and a
pointer to unsigned char*, especially as 3.9.2/4 only guarantees a char* to
have the same respresentation as a void*.

In another post of this threat, you gave an explanation of 3.9.1/1 with
respect to the difference of char* and unsigned char*. However, I don't
understand why, in the context of our discussion, it is important wether
all possible bit patterns of an unsigned char represent numbers, as long as
all char types share the same object representation (which is guaranteed by
3.9.1/1 also).

So if I use a char*, I might have problems with bit patterns, if I use an
unsigned char*, the pointer representation might differ from a void*, and a
signed char* seems to be an unpopular thing anyway - now what's a poor
programmer to do?

Best regards,

Matthias Hofmann

unread,

May 23, 2004, 7:49:20 AM5/23/04

to

llewelly <llewe...@xmission.dot.com> schrieb in im Newsbeitrag:

86lljl4...@Zorthluthik.local.bar...

> "Matthias Hofmann" <hof...@anvil-soft.com> writes:
>
> > llewelly <llewe...@xmission.dot.com> schrieb in im Newsbeitrag:
> > 861xlff...@Zorthluthik.local.bar...
> >> "Matthias Hofmann" <hof...@anvil-soft.com> writes:
> >>
> >> All global scope names that begin with an underscore are reserved for
> >> the implementation. This doesn't apply to names inside a class,
> >> function, or namespace. See 17.4.3.1.2 2nd bullet .
> >
> > Does this also apply to macro names? I have so far used the following
kinds
> > of inclusion guards:
> >
> > #ifndef _MYFILE_H_
> > #define _MYFILE_H_
> >
> > // Declare what I need.
> >
> > #endif // _MYFILE_H_
> >
> > No I wonder, is this a violation of the standard?
>
> I believe it is. Someone once presented a counter argument, but I
> forget the counter argument.

A counter argument might be that there is a special section for macro names
(17.4.3.1.1), which makes no mention of underscores. Section 17.4.3.1.2, on
the other hand, only talks about "names", so it depends on wether a macro
definition is a name.

Regards,

Matthias

P.J. Plauger

unread,

May 23, 2004, 7:58:17 AM5/23/04

to

"James Kanze" <ka...@gabi-soft.fr> wrote in message
news:8665ao4...@lns-vlq-35-82-254-142-237.adsl.proxad.net...

> Strangely enough, binary mode does not suppress code translation by the
> imbued locale either. All it does is affect how the system represents
> record separators (new lines in text format, inexistant in binary) and
> end of file.

Not strange at all. Some encodings *must* be read and written in binary
mode or suffer potential corruption. UTF32BE springs to mind. It can
have embedded bytes that look like newline, nul, ctl-Z, and all sorts
of other things that don't pass transparently to/from a text stream.
Other encodings, such as UTF-8, are designed to survive transmission
as a text file *or* a binary file quite nicely.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Matthias Hofmann

unread,

May 23, 2004, 9:23:09 PM5/23/04

to

James Kanze <ka...@gabi-soft.fr> schrieb in im Newsbeitrag:
86ekpc4...@lns-vlq-35-82-254-142-237.adsl.proxad.net...

>
> |> If you write a big endian integer to a file and read it in on a
> |> little endian machine, you are in trouble, unless you know that you
> |> must reverse the byte order.
>
> Or unless you read and write it correctly. There is no need to know the
> byte order on any machine to do this.

I guess this boils down to using bit shifts and bit masks to get each byte
of a (as an example) 32 bit integer and write it individually. Reading
occurs in an analogous fashion. If not only the byte order but also the
representation plays a role, you probably even have to shift individual bits
arounds.

That sounds like a library for portably writing binary data looks somewhat
like

WriteAs_32BitBigEndian2sComplement( std::ofstream& s, long x );
WriteAs_16BitBigEndian2sComplement( std::ofstream& s, short x );

Inside these functions, the bits are set up in a buffer which is then
written to the stream. I guess you also have to consult std::numeric_limits
in order to verify that a long has at least 32 bits and so on.

> Note that as soon as you try to use something larger than an int, it
> becomes even more complicated. Different compilers, or even different
> options with the same compiler, introduce more or less padding at
> different places in structs. The result is that you cannot necessarily
> reread the data with a program compiled using different compiler flags.

I followed an advice I once got from a text book, which is to save each
member of a struct (or class) separately. This should avoid the problem.

Best regards,

Matthias Hofmann

unread,

May 23, 2004, 9:23:42 PM5/23/04

to

James Kanze <ka...@gabi-soft.fr> schrieb in im Newsbeitrag:

861xlc4...@lns-vlq-35-82-254-142-237.adsl.proxad.net...

>
> Since an ostream writes char's, it is normal that it take a buffer of
> char as a parameter. It is also normal that the user "format" his data
> into a buffer of char.

I wonder if ostreams where ever intented to write binary data. It seems like
operator<< was intended to convert into text, while write() is supposed to
accept a number that has already been converted to text by the user.
However, the interface does not seem to offer a method for binary data. Such
a function should, in my opinion accept a void pointer.

> In this context, reinterpret_cast< char* > can be thought of as a
> somewhat degenerate formatting routine. Using this as a formatting
> routine, you output data in the internal format of the machine.
> Whatever that happens to be, according to compiler options, etc.

I think I have found an adequate solution for my char*/void* problem, which
are the following two convenience functions:

// Version for non-constant pointers.
inline char* char_ptr( void* p )
{ return static_cast<char*>( p ); }

// Version for constant pointers.
inline const char* char_ptr( const void* p )
{ return static_cast<const char*>( p ); }

They allow me to use them like a casting operator, as in this example:

void f( std::ofstream& s )
{
int i;
...
s.write( char_ptr( &i ), sizeof i );
}

Best regards,

Matthias Hofmann

unread,

May 23, 2004, 9:24:13 PM5/23/04

to

James Kanze <ka...@gabi-soft.fr> schrieb in im Newsbeitrag:
86ad004...@lns-vlq-35-82-254-142-237.adsl.proxad.net...

> "Matthias Hofmann" <hof...@anvil-soft.com> writes:
>
> |> llewelly <llewe...@xmission.dot.com> schrieb in im Newsbeitrag:
> |> 861xlff...@Zorthluthik.local.bar...
> |> > "Matthias Hofmann" <hof...@anvil-soft.com> writes:
>
> |> > All global scope names that begin with an underscore are reserved
> |> > for the implementation. This doesn't apply to names inside a
> |> > class, function, or namespace. See 17.4.3.1.2 2nd bullet .
>
> |> Does this also apply to macro names? I have so far used the
> |> following kinds of inclusion guards:
>
> |> #ifndef _MYFILE_H_
> |> #define _MYFILE_H_
>
> All names beginning with an _ followed by a capital letter are reserved
> for the implementation.
>
> |> // Declare what I need.
>
> |> #endif // _MYFILE_H_
>
> |> No I wonder, is this a violation of the standard?
>
> Definitely.

The way I understand 17.4.3.1.2, it is not even allowed to have member
functions or data members start with an underscore. For example, I saw many
people write code loke this:

class X
{
int _i;

public:
X( int i ) : _i( i );
};

or

class X
{
public:
X* Clone() { return _Clone(); }

private:
virtual X* _Clone() = 0;
};

The caption above the relevant section is titled "Global Names", but the
text otherwise does not specify any exceptions. Now is my observation
correct that there are no exceptions for class scope? Or can you point me to
a section of the standard that defines any exceptions?

Best regards,

Matthias Hofmann

James Kanze

unread,

May 23, 2004, 9:40:11 PM5/23/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> writes:

|> > Because copying a signed char could normalize the representation
|> > -- e.g. convert all 0's (positive or negative) to positive 0's.

|> > This is not allowed in an unsigned char. And I'm not sure, but I
|> > don't think that C++ allows a change in the bit pattern when
|> > copying a char.

|> You mean, doing a memcpy() with signed chars as the source buffer
|> could lead to a different bit pattern in the destination buffer? Or
|> are you talking about assignment?

No. I mean that if you implement memcpy by casting the void* to signed
char*, you might get unexpected results.

Traditionally, I've always understood that the only legal type
guaranteed to give total bitwise equivalence when copying was unsigned
char.

|> Section 3.9/2 says that an array of chars or unsigned chars can be
|> used in order to copy an object forth and back without changing its
|> value, but I don't know if that's what you mean.

That would be it. That basically means that if signed char does do some
adjustment, then plain char must be unsigned.

In practice, I suspect that even on platforms where signed char has two
distinct bit patterns for 0 (+0 and -0), there are basic instructions
for just copying, and that the compiler will use these when just
copying, even if the type is signed. I'm not sure, though. I tried to
find information on the Burroughs 5000 and its successors (including the
Univac A series, which only went out of production a couple of years
ago) on the net Friday, but without avail. From what little I know of
the architecture, it has no direct support for unsigned. (I was
actually looking with regards to the thread as to whether
UINT_MAX==INT_MAX is legal. From what little I know of the
architecture, it is quite possible that the only reasonable
implementation of unsigned int on it would be to use a signed int, and
mask off the sign bit.)

|> > |> > There is a requirement that void* subsume all other types.
|> > |> > That you can convert a pointer to void* and back to its
|> > |> > original type without loss of information. Back to its
|> > |> > original type only, however.

|> > |> > Or to unsigned char* or char* -- it is also guaranteed
|> > |> > that you can access the underlying raw memory as an
|> > |> > unsigned char or char.

|> Please point me to the relevant section in the standard. I have only
|> found 3.9.2/4, which only mentions char*, but not unsigned char*.

I suspect that that is an oversight. I also suspect that the general
possibility that signed and unsigned types might have incompatible
pointers didn't occur to the authors. There is a footnote in C90 (to
where it says that signed and unsigned types must use the same amount of
storage and have the same alignment requirements): "The same
representation and alignment requirements are meant to imply
interchangeability as arguments to functions, return values from
functions, and members of unions." As statements go, you can't get much
vaguer or more ambiguous, and of course, footnotes aren't normative
anyway. But I think it gives an idea as to what the authors of the
original C standard had in mind. (And there is a footnote in the C++
standard which implies that the authors of the C++ standard intended to
be compatible with C.)

All I can really say is that for over 15 years now, I've religiously
used unsigned char as the type for raw memory, and that this choice was
based on discussions held with regards to the standardization of C. But
that it has been a long time, and I don't remember all of the details.
Just the conclusion. And that from a practical standpoint, I think that
there are enough restrictions (same alignement, same total size in
memory) to ensure that no reaonsable implementation will violate the
expectation that pointers to signed and unsigned types are compatible.

|> > |> Why not to signed char?

|> > See above. On an 8 bit ones complement machine, how would you
|> > distinguish between the bit patterns 0xFF and 0x00?

|> I can see the problem in interpreting the bit pattern, but I still
|> do not understand why I should not be able to get a pointer to
|> it. Could you give a brief code example that demonstrates the
|> potential problem on certain CPUs?

I'm not sure what the question is any more:-). I am pretty sure that
you can get a signed char*. What I'm not sure of is that copying data
through a signed char* is guaranteed to result in a faithful bit image
of the original data.

|> > |> Then a char pointer is actually the best choice if I want to
|> > |> point to raw memory, e.g. in order to write it to a file?

|> > I tend to favor unsigned char, but mainly because of tradition, I
|> > think.

|> 3.9.2/4 says that a pointer to void shall have the same
|> representation and alignment requirements as a pointer to char.
|> However, it does not give such guarantee for unsigned (or even
|> signed char) pointers. Therefore, I'd favor plain char.

The C90 standard (and possibly the C++ standard), and long tradition,
says that compatible signed and unsigned types have the same alignment
requirements and occupy the same space in memory. It's hard to imagine
how char*, unsigned char* and signed char* could be any different. It
is easy to image cases where copying a signed char do not result in the
same bit pattern. Plain char could be either signed or unsigned;
apparently, there is text ensuring that IF signed char does not bitwise
copy exactly, then plain char must be unsigned, but I wasn't sure of
this.

|> > In practice, you never really want to write raw memory to a file,
|> > of course. You want to format a buffer, and write it to the
|> > file. Given the C++ interface in iostream, if I were writing it
|> > through iostream, I would probably use char for the buffer. (In
|> > practice, every time I've wanted to do something like this, I've
|> > ended up using a lower level protocol -- the writes had to be
|> > atomic, or synchronized, or something else that the
|> > iostream/filebuf pair doesn't support.)

|> Please forgive my asking so stupid questions, but what kind of
|> buffer formatting can there be necessary in order to write some
|> integers to a file? I got binary data in memory, so I want binary
|> data in my file. Using an ofstream object in binary mode should do
|> the job, shouldn't it?

The binary data in memory has a format. Copying it to disk results in
this format on the disk.

The problem is that this format was not designed for persistence, and
can easily change depending on the compiler, or even simply the version
or the options.

--
James Kanze
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Samuel Krempp

unread,

May 23, 2004, 9:46:37 PM5/23/04

to

le Sunday 23 May 2004 13:48, hof...@anvil-soft.com écrivit :

> I am still confused about the distinction between a pointer to char* and a
> pointer to unsigned char*, especially as 3.9.2/4 only guarantees a char*
> to have the same respresentation as a void*.
>
> In another post of this threat, you gave an explanation of 3.9.1/1 with
> respect to the difference of char* and unsigned char*. However, I don't
> understand why, in the context of our discussion, it is important wether
> all possible bit patterns of an unsigned char represent numbers, as long
> as all char types share the same object representation

I agree with you, I think it has no importance. As long as you don't do
arithmetic on the pointed values, the way they are associated with numbers
has no impact, you can copy the char values without any loss (because "all

bits of the object representation participate in the value

representation"). several values might map to the same number, but as long
as you use the values and don't rely on the numbers they represent, you're
fine.

--
Samuel.Krempp
cout << "@" << "crans." << (is_spam ? "trucs.en.trop." : "" )
<< "ens-cachan.fr" << endl;

ka...@gabi-soft.fr

unread,

May 24, 2004, 12:12:56 PM5/24/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> wrote in message

news:<c8qn59$g7a$1...@news1.nefonline.de>...

> James Kanze <ka...@gabi-soft.fr> schrieb in im Newsbeitrag:
> 86ad004...@lns-vlq-35-82-254-142-237.adsl.proxad.net...
> > "Matthias Hofmann" <hof...@anvil-soft.com> writes:

> > |> llewelly <llewe...@xmission.dot.com> schrieb in im
> > |> Newsbeitrag: 861xlff...@Zorthluthik.local.bar...
> > |> > "Matthias Hofmann" <hof...@anvil-soft.com> writes:

> > |> > All global scope names that begin with an underscore are
> > |> > reserved for the implementation. This doesn't apply to names
> > |> > inside a class, function, or namespace. See 17.4.3.1.2 2nd
> > |> > bullet .

> > |> Does this also apply to macro names? I have so far used the
> > |> following kinds of inclusion guards:

> > |> #ifndef _MYFILE_H_
> > |> #define _MYFILE_H_

> > All names beginning with an _ followed by a capital letter are
> > reserved for the implementation.

> > |> // Declare what I need.

> > |> #endif // _MYFILE_H_

> > |> No I wonder, is this a violation of the standard?

> > Definitely.

> The way I understand 17.4.3.1.2, it is not even allowed to have member
> functions or data members start with an underscore.

That's not what it says. It says that names containing a double
underscore, or beginning with an underscore followed by an uppercase
letter are reserved for the implementation for any use. ANY, of course,
includes such things as macros, new keywords (__far, __builtin_memcpy,
etc.), globale symbols... Literally anything. The second bullet says
that names beginning with an underscore are reserved to the
implementation for use as a name IN THE GLOBAL NAMESPACE. So if I
write:
int _i ;
I have undefined behavior, but:
namespace { int _i ; }
namespace MySpace { int _i ; }
and
struct MyClass { int _i ; } ;
are all valid.

At this point, of course, we should distinguish between the standard and
reality. In reality, a quick grep through /usr/include on my system
turns up quite a few definitions of the form _[a-z][a-zA-Z0-9_]*,
including one or two macros. This has been the case for every system
I've ever encountered. So regardless of what the standard says, why
take the risk?

> For example, I saw many people write code loke this:

> class X
> {
> int _i;

> public:
> X( int i ) : _i( i );
> };

> or

> class X
> {
> public:
> X* Clone() { return _Clone(); }

> private:
> virtual X* _Clone() = 0;
> };

> The caption above the relevant section is titled "Global Names", but
> the text otherwise does not specify any exceptions. Now is my
> observation correct that there are no exceptions for class scope? Or
> can you point me to a section of the standard that defines any
> exceptions?

Forget the caption. The standard says, way back in §2.10, what you can
and cannot put in a symbol. There, after giving total liberty (all
alphanumerics, + underscore, first character non numeric), it introduces
a certain number of restrictions: you can't use keywords, and you can't
use things reserved for the implementation in §17.4.3.1.2. As far as
this latter section is concerned, names beginning with an underscore
followed by a lower case letter are only reserved in the global
namespace. Not in a class or a user defined namespace. Nor in a
function, where they don't have linkage. (Note that in your example,
_Clone IS illegal. A conforming implementation could interpret this as
a compiler builtin instructing the compiler to reformat your hard disk,
for example. Or to send an email to your boss, suggesting that he
should hire someone competent. Or to start a game of nethack; at least
one compiler in the past actually did this in certain similar cases of
undefined behavior.)

--
James Kanze GABI Software

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung

9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

ka...@gabi-soft.fr

unread,

May 24, 2004, 12:14:50 PM5/24/04

to

Samuel Krempp <kre...@crans.ens-cachan.fr> wrote in message
news:<40b0a161$0$7697$636a...@news.free.fr>...

> le Sunday 23 May 2004 13:48, hof...@anvil-soft.com écrivit :

> > I am still confused about the distinction between a pointer to char*
> > and a pointer to unsigned char*, especially as 3.9.2/4 only
> > guarantees a char* to have the same respresentation as a void*.

> > In another post of this threat, you gave an explanation of 3.9.1/1
> > with respect to the difference of char* and unsigned char*. However,
> > I don't understand why, in the context of our discussion, it is
> > important wether all possible bit patterns of an unsigned char
> > represent numbers, as long as all char types share the same object
> > representation

> I agree with you, I think it has no importance. As long as you don't
> do arithmetic on the pointed values, the way they are associated with
> numbers has no impact, you can copy the char values without any loss
> (because "all bits of the object representation participate in the
> value representation"). several values might map to the same number,
> but as long as you use the values and don't rely on the numbers they
> represent, you're fine.

Where does it say this?

The current version of the C standard (C99) explicitly says that in the
case of signed magnitude or one's complement, the representation of a
negative zero is allowed to trap. (See §6.2.6.2, paragraphs 2 and 3 of
C99.) If, as Francis says (and it agrees with what little I know), the
intent of the changes in C99 is just to clarify what it was intended
that C90 say, and if, as a footnote suggests, the intent of C++98 to be
compatible with C (which I certainly hope is the case), then if trapping
representations for signed char are not allowed, it is an error in the
standard.

Personally, given the history of C and the expectations of users, if I
had to implement a compiler for such a machine, I would definitly make
plain char unsigned. But I don't think that the standard requires it.

All in all, as a user, I think I will trust C99 on this one. Simply
because it is the only standard of the three (C90, C++98 and C99) that
is in anyway clear. And while officially C99 has no influence on C++, I
cannot believe that the intent was not to be compatible with C, nor that
C99 meant to make any radical changes or break code with respect to C90.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

ka...@gabi-soft.fr

unread,

May 24, 2004, 12:18:36 PM5/24/04

to

"P.J. Plauger" <p...@dinkumware.com> wrote in message
news:<LbQrc.6708$No1....@nwrddc02.gnilink.net>...

> "James Kanze" <ka...@gabi-soft.fr> wrote in message
> news:8665ao4...@lns-vlq-35-82-254-142-237.adsl.proxad.net...

> > Strangely enough, binary mode does not suppress code translation by
> > the imbued locale either. All it does is affect how the system
> > represents record separators (new lines in text format, inexistant
> > in binary) and end of file.

> Not strange at all. Some encodings *must* be read and written in
> binary mode or suffer potential corruption. UTF32BE springs to
> mind. It can have embedded bytes that look like newline, nul, ctl-Z,
> and all sorts of other things that don't pass transparently to/from a
> text stream. Other encodings, such as UTF-8, are designed to survive
> transmission as a text file *or* a binary file quite nicely.

Interesting. I hadn't thought of that. So what happens when I write a
'\n' to a file with a UTF32BE encoding under Windows? Does it remain a
an LF, or become an CRLF? And I presume, of course, that there is
nothing which enforces this; that you can very well imbue a UTF32BE
locale in a stream opened in text mode. There is nothing the locale can
do to know that it is being misused, and I don't know how the filebuf
could detect that the locale requires binary mode.

Although, come to think of it, it doesn't matter, since the only
programs likely to read it that are also on the same machine will use
the same mechanism to read it. Or at least: if you are using anything
other than the default locale, it is probably to respect some externally
imposed format, so you want binary anyway, since whether you want LF or
CRLF, or something completely different, depends on the specification of
the format you are writing or reading, and NOT what is usual for the
platform. Given that, it might even make sense to offer a UTF32BE Unix
and a UTF32BE Windows (which converts a simple '\n' to CRLF).

The fact remains, however, that if I am really reading or writing binary
formatted data, a non-degenerate codecvt is the last thing I want. To
date, I've been recommending creating your own locale, by mixing
whatever the stream has with the codecvt from "C". It occurs to me,
however, that an alternative solution would be to imbue a different
locale in the filebuf, e.g.:

std::ifstream source( "foreign.data", std::ios::binary ) ;
source.rdbuf()->imbue( std::locale::classic() ) ;

This has the advantage of not modifying the locale that the stream
itself sees. It has the disadvantage, obviously, that if anyone
modifies the locale of the stream, you loose the customization of the
locale of the filebuf. But of course, I suspect that most programs that
modify the locale of a stream simply imbue a new locale anyway, rather
than creating a new locale from the previous locale by replacing only
the facets they are concerned with. So you'd loose your specialization
anyway.

I hate to say it, but the whole thing seems awfully fragile. How much
real experience did people have with this before standardizing it? (I
ask because every time I try to do what I think should be everyday
things, I hit on problems. When you need work-arounds just to read
simple binary, something's wrong.)

--
James Kanze GABI Software

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung

9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Ulrich Eckhardt

unread,

May 24, 2004, 6:10:55 PM5/24/04

to

ka...@gabi-soft.fr wrote:
> "P.J. Plauger" <p...@dinkumware.com> wrote:

>> "James Kanze" <ka...@gabi-soft.fr> wrote:
>> Not strange at all. Some encodings *must* be read and written in
>> binary mode or suffer potential corruption. UTF32BE springs to
>> mind. It can have embedded bytes that look like newline, nul, ctl-Z,
>> and all sorts of other things that don't pass transparently to/from a
>> text stream. Other encodings, such as UTF-8, are designed to survive
>> transmission as a text file *or* a binary file quite nicely.
>
> Interesting. I hadn't thought of that. So what happens when I write a
> '\n' to a file with a UTF32BE encoding under Windows? Does it remain a
> an LF, or become an CRLF?

It becomes '\0\0\0\r\n', which is readable when opened in textmode on a
platform supporting that particular textmode but garbage (and surely not
UTF32BE) otherwise.

> And I presume, of course, that there is
> nothing which enforces this; that you can very well imbue a UTF32BE
> locale in a stream opened in text mode. There is nothing the locale can
> do to know that it is being misused, and I don't know how the filebuf
> could detect that the locale requires binary mode.

Right. Been there, done that...

> Although, come to think of it, it doesn't matter, since the only
> programs likely to read it that are also on the same machine will use
> the same mechanism to read it.

IMHO above program believing/claiming to write UTF32BE is broken, as it
doesn't. It will never properly interact with non-broken programs.

> Or at least: if you are using anything
> other than the default locale, it is probably to respect some externally
> imposed format, so you want binary anyway, since whether you want LF or
> CRLF, or something completely different, depends on the specification of
> the format you are writing or reading, and NOT what is usual for the
> platform.

Right, always use wstreams opened in binary mode with a proper codecvt
facet, IMHO preferably for external UTF-8. You can still write ASCII or
8859-x with the proper codecvt facet, and even get feedback is some
conversion failed, provided you flush the stream before testing the state.

> Given that, it might even make sense to offer a UTF32BE Unix
> and a UTF32BE Windows (which converts a simple '\n' to CRLF).

What should that do with a '\n', make it a '\0\0\0\r\n' or a
'\0\0\0\r\0\0\0\n'? FTP will do the former in text-mode, but that's surely
not UTF32.
Anyhow, textmode was a big mistake and programmers have already understood
that: In my experience, programs that are capable of handling anything
other than external ASCII also work regardless of line-ending style
(notepad being a sorry exception to that rule), so when using a
Unicode-capable format(UTF-*), you usually don't need any special
line-endings.

Fun begins when customers complain they can't read their files and you
finally find that some bastard-mailclient-from-hell chopped off all bit7.
Or a codecvt facet that tries to merge external '\r\n' into a single '\n'
but fails if those two bytes cross a page boundary because the libc can't
ungetc() both of them. But I'm drifting off-topic....

And yes, I too think that there are some dark sides of IOStreams and Locales
that you always have to work around.

Uli

--
FAQ: http://parashift.com/c++-faq-lite/

/* bittersweet C++ */
default: break;

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Matthias Hofmann

unread,

May 24, 2004, 6:13:07 PM5/24/04

to

James Kanze <ka...@gabi-soft.fr> schrieb in im Newsbeitrag:

86wu335...@alex.gabi-soft.fr...

>
> I'm not sure what the question is any more:-). I am pretty sure that
> you can get a signed char*. What I'm not sure of is that copying data
> through a signed char* is guaranteed to result in a faithful bit image
> of the original data.

The way I understand 3.9/2, there is no such guarantee. You have at least
convinced me that a pointer to signed char is not a good thing for my
purposes.

Best regards,

Matthias

Matthias Hofmann

unread,

May 24, 2004, 6:16:28 PM5/24/04

to

<ka...@gabi-soft.fr> schrieb in im Newsbeitrag:
d6652001.04052...@posting.google.com...
>

> > I agree with you, I think it has no importance. As long as you don't
> > do arithmetic on the pointed values, the way they are associated with
> > numbers has no impact, you can copy the char values without any loss
> > (because "all bits of the object representation participate in the
> > value representation"). several values might map to the same number,
> > but as long as you use the values and don't rely on the numbers they
> > represent, you're fine.
>
> Where does it say this?

Section 3.9/2 (the emphasis is mine):

"For any complete POD object type T, wether or not the object holds a valid
value of type T, the underlying bytes (1.7) making up the object can be
copied into an array of CHAR OR UNSIGNED CHAR. If the content of the array
of CHAR OR UNSIGNGED CHAR is copied back into the object, the object shall
subsequently hold its original value."

So if copying has no influence on the bits, than any other read access
shouldn't change anything either. I'd be careful with signed char, but for
both plain and unsigned char there seems to be no problem. And as 3.9.2/4
guarantees the same representation and alignment requirements for void* and
char* (without mentioning unsigned char*), I'd favour a pointer to plain
char.

Best regards,

Matthias Hofmann

Walter Tross

unread,

May 24, 2004, 6:17:49 PM5/24/04

to

ka...@gabi-soft.fr 2004-05-24 :

> That's not what it says. It says that names containing a double
> underscore, or beginning with an underscore followed by an uppercase
> letter are reserved for the implementation for any use. ANY, of course,
> includes such things as macros, new keywords (__far, __builtin_memcpy,
> etc.), globale symbols... Literally anything. The second bullet says
> that names beginning with an underscore are reserved to the
> implementation for use as a name IN THE GLOBAL NAMESPACE. So if I
> write:
> int _i ;
> I have undefined behavior, but:
> namespace { int _i ; }
> namespace MySpace { int _i ; }
> and
> struct MyClass { int _i ; } ;
> are all valid.
>
> At this point, of course, we should distinguish between the standard and
> reality. In reality, a quick grep through /usr/include on my system
> turns up quite a few definitions of the form _[a-z][a-zA-Z0-9_]*,
> including one or two macros. This has been the case for every system
> I've ever encountered. So regardless of what the standard says, why
> take the risk?

I personally *do* take the risk you are talking of (and also the risk of
being unpopular) by using _[a-z][a-zA-Z0-9]* for data members.
I think that since the standard guarantees that I can do it, I prefer my
code to point out those compilers which are not standard compliant
(yes, I do, at least as long as my job allows me this privilege). And it
will almost certainly, because it's very unlikely that a macro colliding
with a data member will leave behind some compiling code.
Anyhow, I deem the leading underscore to be much more readable than the
trailing one, which of course is the main reason why I use it.

Walter Tross

llewelly

unread,

May 24, 2004, 6:21:20 PM5/24/04

to

ka...@gabi-soft.fr writes:

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> letter are reserved for the implementation for any use. ANY, of course,

^^^^^^

> includes such things as macros, new keywords (__far, __builtin_memcpy,
> etc.), globale symbols... Literally anything. The second bullet says
> that names beginning with an underscore are reserved to the
> implementation for use as a name IN THE GLOBAL NAMESPACE. So if I
> write:
> int _i ;
> I have undefined behavior, but:
> namespace { int _i ; }
> namespace MySpace { int _i ; }
> and
> struct MyClass { int _i ; } ;
> are all valid.
>
> At this point, of course, we should distinguish between the standard and
> reality. In reality, a quick grep through /usr/include on my system
> turns up quite a few definitions of the form _[a-z][a-zA-Z0-9_]*,

Whoa. Try _[a-z0-9]* instead, as _[A-Z]* are still reserved at class,
namespace, or function scope.

> including one or two macros. This has been the case for every system
> I've ever encountered. So regardless of what the standard says, why
> take the risk?

IMO, underscores are for seperating words in a identifier, and as
such belong between words, and not at the begining or the
end. (Though I admit I've broken this rule a time or two, and of
course underscores at the end are safe.)

Samuel Krempp

unread,

May 25, 2004, 8:39:46 AM5/25/04

to

le Monday 24 May 2004 18:14, ka...@gabi-soft.fr écrivit :

> Samuel Krempp <kre...@crans.ens-cachan.fr> wrote in message
> news:<40b0a161$0$7697$636a...@news.free.fr>...
>
>> le Sunday 23 May 2004 13:48, hof...@anvil-soft.com écrivit :
>
>> > I am still confused about the distinction between a pointer to char*
>> > and a pointer to unsigned char*, especially as 3.9.2/4 only
>> > guarantees a char* to have the same respresentation as a void*.
>
>> > In another post of this threat, you gave an explanation of 3.9.1/1
>> > with respect to the difference of char* and unsigned char*. However,
>> > I don't understand why, in the context of our discussion, it is
>> > important wether all possible bit patterns of an unsigned char
>> > represent numbers, as long as all char types share the same object
>> > representation
>
>> I agree with you, I think it has no importance. As long as you don't
>> do arithmetic on the pointed values, the way they are associated with
>> numbers has no impact, you can copy the char values without any loss
>> (because "all bits of the object representation participate in the
>> value representation"). several values might map to the same number,
>> but as long as you use the values and don't rely on the numbers they
>> represent, you're fine.
>
> Where does it say this?

you're right, I was confused. I thought all possible bit patterns for char
had to be valid 'values'.

I've looked at the norm with more attention : in fact there's nothing more
precise than the information is §3.9 ("For any complete POD object type T,
whether or not the object holds a valid value of type T, the underlying
bytes (1.7) making up the object can be copied into an array of char or
unsigned char.").

now how should we interpret this copying of bytes into an array of char ?
either the norm implicitly says copying the bytes as chars does produce the
same result as memcpy, either it seems like the norm doesn't provide any
guaranteed way for user to re-implement memcpy. (because from void*, it
seems we can make a char* but not an unsigned char*. And then while copying
unsigned chars does copy all possible bit patterns, it seems we don't have
the same guarantee with chars..)

it's strange because I was under the impression that handling bytes with
chars was the normal thing to do in C++.

--
Samuel.Krempp
cout << "@" << "crans." << (is_spam ? "trucs.en.trop." : "" )
<< "ens-cachan.fr" << endl;

P.J. Plauger

unread,

May 25, 2004, 8:41:46 AM5/25/04

to

<ka...@gabi-soft.fr> wrote in message
news:d6652001.04052...@posting.google.com...

> "P.J. Plauger" <p...@dinkumware.com> wrote in message
> news:<LbQrc.6708$No1....@nwrddc02.gnilink.net>...
> > "James Kanze" <ka...@gabi-soft.fr> wrote in message
> > news:8665ao4...@lns-vlq-35-82-254-142-237.adsl.proxad.net...
>
> > > Strangely enough, binary mode does not suppress code translation by
> > > the imbued locale either. All it does is affect how the system
> > > represents record separators (new lines in text format, inexistant
> > > in binary) and end of file.
>
> > Not strange at all. Some encodings *must* be read and written in
> > binary mode or suffer potential corruption. UTF32BE springs to
> > mind. It can have embedded bytes that look like newline, nul, ctl-Z,
> > and all sorts of other things that don't pass transparently to/from a
> > text stream. Other encodings, such as UTF-8, are designed to survive
> > transmission as a text file *or* a binary file quite nicely.
>
> Interesting. I hadn't thought of that. So what happens when I write a
> '\n' to a file with a UTF32BE encoding under Windows? Does it remain a
> an LF, or become an CRLF? And I presume, of course, that there is
> nothing which enforces this; that you can very well imbue a UTF32BE
> locale in a stream opened in text mode. There is nothing the locale can
> do to know that it is being misused, and I don't know how the filebuf
> could detect that the locale requires binary mode.

Ulrich Eckhardt answered this for you. Simply put, UTF32BE is one
of those encodings that is easily corrupted when written to a text
file.

> .....

> I hate to say it, but the whole thing seems awfully fragile. How much
> real experience did people have with this before standardizing it?

Absolutely none. It is worth observing that the first couple of
locale proposals *accepted* by the committee included

a) a function whose name was a keyword

b) a constructor that was impossible to invoke

You figure it out.

> ask because every time I try to do what I think should be everyday
> things, I hit on problems. When you need work-arounds just to read
> simple binary, something's wrong.)

They're not that bad, once you develop the idioms. In our CoreX
library, for example, we identify all the codecvt facets that
must be used only with binary files. The biggest problem we
encountered was the broad range of bugs across various implementations
of the Standard C++ library. The specifications for locales in general,
and codecvt in particular, were so vague that differences of opinion,
and genuine lacunae, are still widespread. We fixed that problem
by adding wbuffer, a filtering streambuf that handles all interactions
with codecvt facets stresses existing streambufs minimally.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

ka...@gabi-soft.fr

unread,

May 25, 2004, 5:36:25 PM5/25/04

to

Ulrich Eckhardt <doom...@knuut.de> wrote in message
news:<2heqfsF...@uni-berlin.de>...

> ka...@gabi-soft.fr wrote:
> > "P.J. Plauger" <p...@dinkumware.com> wrote:
> >> "James Kanze" <ka...@gabi-soft.fr> wrote:
> >> Not strange at all. Some encodings *must* be read and written in
> >> binary mode or suffer potential corruption. UTF32BE springs to
> >> mind. It can have embedded bytes that look like newline, nul,
> >> ctl-Z, and all sorts of other things that don't pass transparently
> >> to/from a text stream. Other encodings, such as UTF-8, are
> >> designed to survive transmission as a text file *or* a binary file
> >> quite nicely.

> > Interesting. I hadn't thought of that. So what happens when I
> > write a '\n' to a file with a UTF32BE encoding under Windows? Does
> > it remain a an LF, or become an CRLF?

> It becomes '\0\0\0\r\n', which is readable when opened in textmode on
> a platform supporting that particular textmode but garbage (and surely
> not UTF32BE) otherwise.

Logically, it should become the sequence 0x00, 0x00, 0x00, 0x0C, 0x00,
0x00, 0x00, 0x0A. I just don't quite see how an implementation could do
this.

> > And I presume, of course, that there is nothing which enforces this;
> > that you can very well imbue a UTF32BE locale in a stream opened in
> > text mode. There is nothing the locale can do to know that it is
> > being misused, and I don't know how the filebuf could detect that
> > the locale requires binary mode.

> Right. Been there, done that...

> > Although, come to think of it, it doesn't matter, since the only
> > programs likely to read it that are also on the same machine will
> > use the same mechanism to read it.

> IMHO above program believing/claiming to write UTF32BE is broken, as
> it doesn't. It will never properly interact with non-broken programs.

What I meant was that it doesn't matter that you have to use binary
mode. Any program reading you will have to use binary mode as well, and
will end up seeing what you wrote. And presumably, not have problems
because the end of line marker is only a single '\n'.

> > Or at least: if you are using anything other than the default
> > locale, it is probably to respect some externally imposed format, so
> > you want binary anyway, since whether you want LF or CRLF, or
> > something completely different, depends on the specification of the
> > format you are writing or reading, and NOT what is usual for the
> > platform.

> Right, always use wstreams opened in binary mode with a proper codecvt
> facet, IMHO preferably for external UTF-8.

Well, I would prefer UTF-8, too, as does the Internet. On the other
hand, if the client for whom I'm writing the files prefers something
else...

The problem here is that there is no sense writing files unless someone
can read them, so the prefered codeset should be something the expected
client can handle.

> You can still write ASCII or 8859-x with the proper codecvt facet, and
> even get feedback is some conversion failed, provided you flush the
> stream before testing the state.

> > Given that, it might even make sense to offer a UTF32BE Unix and a
> > UTF32BE Windows (which converts a simple '\n' to CRLF).

> What should that do with a '\n', make it a '\0\0\0\r\n' or a
> '\0\0\0\r\0\0\0\n'?

The latter, obviously. The reason for doing it in the codecvt facet,
obviously, is that it cannot be done correctly later.

> FTP will do the former in text-mode, but that's surely not UTF32.

FTP will also convert ASCII into EBCDIC in text-mode, if you're doing a
get from a client whose native code is EBCDIC:-). If you want to get
exactly what you have at the other end in FTP, you use binary mode. If
you want to transfert text, and end up with text in the native encoding
on the target machine, you use the (misnamed) ascii mode. But that has
a lot of limitations, and I wouldn't count on it if the text contains
characters outside of US ASCII.

All in all, FTP works a lot like C and C++ streams in this regard.

> Anyhow, textmode was a big mistake and programmers have already
> understood that: In my experience, programs that are capable of
> handling anything other than external ASCII also work regardless of
> line-ending style (notepad being a sorry exception to that rule), so
> when using a Unicode-capable format(UTF-*), you usually don't need any
> special line-endings.

That's not been my experience. Most Windows programs (except Notepad)
don't seem to have problems with simple LF', but I've had problems with
CRLF a number of times under Unix.

The basic idea behind the text mode is good. The problem is that it was
originally specified by people who had no experience with accented
characters and other such things, and who thought that 128 characters
(including control characters) were enough.

> Fun begins when customers complain they can't read their files and you
> finally find that some bastard-mailclient-from-hell chopped off all
> bit7.

Or the modem interpreted 0x84 as EOT, and dropped the connection:-).

> Or a codecvt facet that tries to merge external '\r\n' into a single
> '\n' but fails if those two bytes cross a page boundary because the
> libc can't ungetc() both of them. But I'm drifting off-topic....

> And yes, I too think that there are some dark sides of IOStreams and
> Locales that you always have to work around.

Part of the problem, I suspect, is that we really don't know all of the
problem yet. Which makes it a bit early to try to specify a final, all
encompassing solution. IMHO, at least part of the problem with
internationalization in C++ is that it tried to do too much, without any
underlying experience. In C, there is a lot less direct support, but
what there is seems to work correctly.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

ka...@gabi-soft.fr

unread,

May 25, 2004, 5:37:58 PM5/25/04

to

llewelly <llewe...@xmission.dot.com> wrote in message
news:<861xl9v...@Zorthluthik.local.bar>...
> ka...@gabi-soft.fr writes:

> > At this point, of course, we should distinguish between the standard
> > and reality. In reality, a quick grep through /usr/include on my
> > system turns up quite a few definitions of the form
> > _[a-z][a-zA-Z0-9_]*,

> Whoa. Try _[a-z0-9]* instead, as _[A-Z]* are still reserved at class,
> namespace, or function scope.

Whoa, yourself. "_[a-z0-9]*" will match any _ in the file:-). Look
again at the regular expression I used:-).

In fact, things seem to be improving. I just tried:

find /usr/include -type f | xargs egrep '# *define *_[a-z]'

under Solaris 2.8, and came up with absolutely nothing. This definitely
wasn't the case on some earlier versions. Similarly -- the same thing
over the includes for g++ 3.3.1 turns up nothing, where as those for
2.95.2 still reveal a few.

> > including one or two macros. This has been the case for every
> > system I've ever encountered. So regardless of what the standard
> > says, why take the risk?

> IMO, underscores are for seperating words in a identifier, and as
> such belong between words, and not at the begining or the
> end. (Though I admit I've broken this rule a time or two, and of
> course underscores at the end are safe.)

IMO, underscores have no place in code, since I've occasionally had
fonts in which they weren't visible:-). Seriously, it is a question of
coding convention. There are pratical reasons for avoiding them at the
start of a word, and they are an esthetic abomination at the end, but
other than that, as long as you don't double them...

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

ka...@gabi-soft.fr

unread,

May 25, 2004, 5:39:29 PM5/25/04

to

Walter Tross <wal...@waltertross.com> wrote in message
news:<37ihzqqn0tog.1fs2vtfm48p0n$.d...@40tude.net>...

I hope you rigorously export all of your templates, too. After all, the
standard guarantees it will work.

Personally, my employers pay me to produce programs that work. My first
goal is always the compiler I'm using. A larger goal is to write the
code as portably as possible; you never know when someone will want to
port it. Standard's conformance is only a goal in so far as it is a
means to this second end.

In the context in which I work, of course, the compiler vendors don't
always have a choice. The incriminating macros are often in files like
<pthread.h> or <socket.h>, for example, over which the compiler vendor
has no influence. And the fact that they are outside of the C++
standard doesn't mean that I can ignore them.

> And it will almost certainly, because it's very unlikely that a macro
> colliding with a data member will leave behind some compiling code.

True. It will be an error that you will have to find; the error message
won't have much relationship with what you see in the file. More work
for you.

> Anyhow, I deem the leading underscore to be much more readable than
> the trailing one, which of course is the main reason why I use it.

The trailing one is an abomination. I agree there. I don't use it, and
I don't know anyone who does. The most frequent conventions seem to be
either m_name or myName, with s_name or ourName for static member
variables. And no prefix or suffix for anything else (except for
ex-Windows programmers, you prefix all class names with a C).

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

ka...@gabi-soft.fr

unread,

May 25, 2004, 5:40:00 PM5/25/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> wrote in message

news:<c8tba4$dl0$1...@news1.nefonline.de>...

> James Kanze <ka...@gabi-soft.fr> schrieb in im Newsbeitrag:
> 86wu335...@alex.gabi-soft.fr...

> > I'm not sure what the question is any more:-). I am pretty sure
> > that you can get a signed char*. What I'm not sure of is that
> > copying data through a signed char* is guaranteed to result in a
> > faithful bit image of the original data.

> The way I understand 3.9/2, there is no such guarantee. You have at
> least convinced me that a pointer to signed char is not a good thing
> for my purposes.

And your quote of the standard in another posting has convinced me that
using char* is correct in C++, if not in C.

But discussion at the level we've reached really belongs in
comp.std.c++. From a pratical point of view, both char* and unsigned
char* will work. Portably. Most of the people I've worked with when
I've done this sort of thing come from a C background, where unsigned
char* was the correct type (and the only correct type) for accessing the
underlying bytes of an object. So I will continue using unsigned char*;
it gives the message I want to the readers I can see. On the other
hand, if I happen to get some of your code, and it uses char*, I won't
bother to change it. (Unless, of course, I have to port it to C:-).)

--
James Kanze GABI Software

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung

9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

ka...@gabi-soft.fr

unread,

May 25, 2004, 5:40:55 PM5/25/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> wrote in message

news:<c8taob$dd3$1...@news1.nefonline.de>...

> <ka...@gabi-soft.fr> schrieb in im Newsbeitrag:
> d6652001.04052...@posting.google.com...

> > > I agree with you, I think it has no importance. As long as you
> > > don't do arithmetic on the pointed values, the way they are
> > > associated with numbers has no impact, you can copy the char
> > > values without any loss (because "all bits of the object
> > > representation participate in the value representation"). several
> > > values might map to the same number, but as long as you use the
> > > values and don't rely on the numbers they represent, you're fine.

> > Where does it say this?

> Section 3.9/2 (the emphasis is mine):

> "For any complete POD object type T, wether or not the object holds a
> valid value of type T, the underlying bytes (1.7) making up the object
> can be copied into an array of CHAR OR UNSIGNED CHAR. If the content
> of the array of CHAR OR UNSIGNGED CHAR is copied back into the object,
> the object shall subsequently hold its original value."

This is interesting. It basically means that C++ makes a guarantee
concerning the representation of a char that C explicitly doesn't.
Which could theoretically mean that there could be platforms where a C++
implementation could not be both conform, and use a compatible
representation to C.

In practice, of course, it will never be a problem. Regardless of what
the C standard says, C programmers do use char for this, and breaking
there code would not be in the interest of the C compiler vendor. On
platforms where copying a signed char does not guarantee the same bit
pattern, the implementation will make plain char unsigned.

I am curious as to whether this difference with regards to C is
intentional.

> So if copying has no influence on the bits, than any other read access
> shouldn't change anything either. I'd be careful with signed char, but
> for both plain and unsigned char there seems to be no problem. And as
> 3.9.2/4 guarantees the same representation and alignment requirements
> for void* and char* (without mentioning unsigned char*), I'd favour a
> pointer to plain char.

In practice, char*, unsigned char* and void* should be equally good for
pointers to raw memory. I tend to favor void* at the interface level,
since in my mind, it signals that the orginal type information is lost.
In the implementation (where I need to do things like p++), I generally
use unsigned char*, at least when accessing, because that is what is
unambiguously required in C. (If all I need is, say, to calculate the
number of bytes between two pointers, I'll cast my void* to char*,
because char* is a lot shorter to write than unsigned char*.)

I guess my habits as an old C programmer are just too ingrained to
break. (And I still think it would be nicer if we had a set of "raw
memory" types.)

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Alan McKenney

unread,

May 25, 2004, 6:16:43 PM5/25/04

to

Maciej Sobczak <no....@no.spam.com> wrote in message news:<c89rhq$85t$1...@atlantis.news.tpi.pl>...
> Hi,
>
> Matthias Hofmann wrote:
>

> > I read in the FAQ that a pointer to a char can be implemented much
> > differently from a pointer to an int, or any other type. Maybe this is the
> > reason why the static_cast does not work.
>
> static_cast does not work, because casting between pointer to unrelated
> types is indeed meaningful only when you want to *reinterpret* the
> contents of memory using other type's representation rules. So -
> reinterpret_cast expresses your intents more clearly.

This leaves unanswered the question: does "reinterpret_cast<>"
of a pointer result in something that is a valid pointer to
anything. E.g., in:

int i;
char *cp = reinterpret_cast<char *>(&i);

are we guarranteed that "cp" points to anything near
"i"?

This becomes an issue on a machine like a CRAY-XMP, where
the smallest directly addressable memory is a 64-bit word,
and characters are packed 8 to a word.

"int *" is implemented as a memory address, while
"char *" is implemented as

( memory_address << 3 ) | ( byte_no_within_word )

The naive implementation of reinterpret_cast<char *>(&i)
(use the same bit represenation) would result in a
pointer to a completely different word.

If the address of i is 0x10000, then cp would point
to byte 0 in word 0x2000.

Is the naive implementation legal? Or does the
standard require that the bit representation of
a pointer be changed to point to (more or less)
the same memory, at least when casting to
char *?

Matthias Hofmann

unread,

May 25, 2004, 6:21:28 PM5/25/04

to

Samuel Krempp <kre...@crans.ens-cachan.fr> schrieb in im Newsbeitrag:
40b27d41$0$29909$626a...@news.free.fr...

>
> now how should we interpret this copying of bytes into an array of char ?
> either the norm implicitly says copying the bytes as chars does produce
the
> same result as memcpy, either it seems like the norm doesn't provide any
> guaranteed way for user to re-implement memcpy. (because from void*, it
> seems we can make a char* but not an unsigned char*. And then while
copying
> unsigned chars does copy all possible bit patterns, it seems we don't have
> the same guarantee with chars..)
>
> it's strange because I was under the impression that handling bytes with
> chars was the normal thing to do in C++.

Meanwhile I feel unsecure about plain chars, too, so I have changed my
conversion function to use unsigned char pointers:

inline unsigned char* byte_ptr( void* p ) { return static_cast<unsigned
char*>( p ); }

However, I was in for a surprise when I used this function to pass a pointer
to std::basic_ostream::write(), as I got an error for being unable to
convert an unsigned char* to a const char*. I thought that there was an
overload for each of the three char types, but this only goes for the ">>"
operators!

This is one more hint that write() was never meant to write binary data. In
another post, James Kanze said that "What you pass certainly isn't an array
of characters. It does have a type, however: raw (preformatted) bytes", but
27.6.2.6/5 describes the effect of this member as follows: "Obtains
CHARACTERS to insert from successive locations of an array [...]" and a
footnote confirms that there is no overload on unsigned char or signed char.

Thus, it looks like this method was not designed for passing "raw
(preformatted) bytes". By the way, how can a string be implemented in terms
of plain chars (which is normal for C-style strings anyway) if some CPU is
allowed to change their value at will?

Best regards,

Matthias

Walter Tross

unread,

May 26, 2004, 5:32:19 AM5/26/04

to

ka...@gabi-soft.fr 2004-05-25 :

[about C++ standard allowing _[a-z][a-zA-Z0-9_]* members]

>>> At this point, of course, we should distinguish between the standard
>>> and reality. In reality, a quick grep through /usr/include on my
>>> system turns up quite a few definitions of the form
>>> _[a-z][a-zA-Z0-9_]*, including one or two macros. This has been the
>>> case for every system I've ever encountered. So regardless of what
>>> the standard says, why take the risk?
>
>> I personally *do* take the risk you are talking of (and also the risk
>> of being unpopular) by using _[a-z][a-zA-Z0-9]* for data members. I
>> think that since the standard guarantees that I can do it, I prefer my
>> code to point out those compilers which are not standard compliant
>> (yes, I do, at least as long as my job allows me this privilege).
>
> I hope you rigorously export all of your templates, too. After all, the
> standard guarantees it will work.

Mmmm - maybe I should :-)

>
> Personally, my employers pay me to produce programs that work. My first
> goal is always the compiler I'm using. A larger goal is to write the
> code as portably as possible; you never know when someone will want to
> port it. Standard's conformance is only a goal in so far as it is a
> means to this second end.

Absolutely agreed - but as with all things in life, it's a matter of
weighing benefits and drawbacks (with their probabilities). To me, the
benefit of clearly and nicely marking members in a not annoying way
outweighs the remote probability of colliding with a nonconforming
environment, even more so when writing long-lived code, which I think
should be as clear and readable as possible.

>
> In the context in which I work, of course, the compiler vendors don't
> always have a choice. The incriminating macros are often in files like
> <pthread.h> or <socket.h>, for example, over which the compiler vendor
> has no influence. And the fact that they are outside of the C++
> standard doesn't mean that I can ignore them.

I looked into pthread.h and socket.h: no _[a-z][a-zA-Z0-9_]* macro,
nor anything otherwise dangerous of the same kind.

>
>> And it will almost certainly, because it's very unlikely that a macro
>> colliding with a data member will leave behind some compiling code.
>
> True. It will be an error that you will have to find; the error message
> won't have much relationship with what you see in the file. More work
> for you.

See above "benefits and drawbacks". But I'm sure that after 10 minutes of
looking at the class definition without understanding the error message, I
would generate the .i file, just to make sure it's not a macro. And I hope
anyone would do the same, if ever (!) that should happen.

>
>> Anyhow, I deem the leading underscore to be much more readable than
>> the trailing one, which of course is the main reason why I use it.
>
> The trailing one is an abomination. I agree there. I don't use it, and
> I don't know anyone who does. The most frequent conventions seem to be
> either m_name or myName, with s_name or ourName for static member
> variables. And no prefix or suffix for anything else (except for
> ex-Windows programmers, you prefix all class names with a C).

Yes, and a big company has fostered Hungarian notation for a long time...
The m_ convention is not yet that old, but nearly that ugly :-)
(IMO, of course)
(BTW, it's because we are talking about not-so-interesting opinions here
that I don't start another thread. I would even suggest exiting this fork)

Walter Tross

Samuel Krempp

unread,

May 26, 2004, 5:45:14 AM5/26/04

to

le Wednesday 26 May 2004 00:21, hof...@anvil-soft.com écrivit :

> Meanwhile I feel unsecure about plain chars, too, so I have changed my
> conversion function to use unsigned char pointers:
>
> inline unsigned char* byte_ptr( void* p ) { return static_cast<unsigned
> char*>( p ); }
>
> However, I was in for a surprise when I used this function to pass a
> pointer to std::basic_ostream::write(), as I got an error for being unable
> to convert an unsigned char* to a const char*. I thought that there was an
> overload for each of the three char types, but this only goes for the ">>"
> operators!
>
> This is one more hint that write() was never meant to write binary data.

or that chars really are meant to be able to convey raw memory bytes :)
The use of chars for ostream::write() is directly related to the
streambufs<char_type>'s sputc(char_type ch), and that's the kind of things
that always made me think C++ was saying "chars" for "raw bytes".

Indeed, the other possibility is you can't be sure to do full binary output
to ostreams (including standard outputs).

> In another post, James Kanze said that "What you pass certainly isn't an
> array of characters. It does have a type, however: raw (preformatted)
> bytes", but 27.6.2.6/5 describes the effect of this member as follows:
> "Obtains CHARACTERS to insert from successive locations of an array [...]"
> and a footnote confirms that there is no overload on unsigned char or
> signed char.

well, if you choose a basic_stream<unsigned char> you'll get to pass it
unsigned chars. (but I'm not sure what is guaranteed about such streams).

> Thus, it looks like this method was not designed for passing "raw
> (preformatted) bytes". By the way, how can a string be implemented in
> terms of plain chars (which is normal for C-style strings anyway) if some
> CPU is allowed to change their value at will?

by reducing the set of valid values of chars inside a string to less than
2^(8*sizeof(char)). I think it's not as much paradoxal for strings as it
is for ostreams..

--
Samuel.Krempp
cout << "@" << "crans." << (is_spam ? "trucs.en.trop." : "" )
<< "ens-cachan.fr" << endl;

ka...@gabi-soft.fr

unread,

May 26, 2004, 5:07:02 PM5/26/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> wrote in message

news:<c8vo3c$j0q$1...@news1.nefonline.de>...

[...]

> This is one more hint that write() was never meant to write binary
> data. In another post, James Kanze said that "What you pass certainly
> isn't an array of characters. It does have a type, however: raw
> (preformatted) bytes", but 27.6.2.6/5 describes the effect of this
> member as follows: "Obtains CHARACTERS to insert from successive
> locations of an array [...]" and a footnote confirms that there is no
> overload on unsigned char or signed char.

> Thus, it looks like this method was not designed for passing "raw
> (preformatted) bytes".

Sort of. What we have here is a failure to communicate between the
library and the language. All IO in C++ is defined in terms of char.
Even a wofstream writes char's to the file (and not wchar_t). So for
the iostream chapter in the library, "raw memory" (or preformatted bytes
or strings) has type char.

> By the way, how can a string be implemented in terms of plain chars
> (which is normal for C-style strings anyway) if some CPU is allowed to
> change their value at will?

There are actually a number of interesting issues at stake. Going back
to C, in fact. For example, is a negative 0 also a nul character string
terminator? (If you test *p == '\0', and p points to a negative 0, the
comparison is true.)

>From a practical point of view:
- machines on which signed char will actually cause a problem are
exceedingly rare, so you can probably forget them, and
- given all the ambiguities and uncertainties, any implementation of
C++ for such a machine will certainly make plain char unsigned (to
start with, you probably can't implement filebuf otherwise).

If I'm formatting a buffer for IO, I will use plain char, even if it is
a binary format. I'm dealing with bytes for output, so I use the IO
subsystem's definition. If I'm fiddling with raw memory internally (say
to extract the exponent field of a float), I will use unsigned char,
because I'm concerned at the language level. But if you want to use
plain char for the latter, go ahead -- enough people do that no
implementation will dare break it, and you probably cannot implement the
iostream section of the standard if you do break it. And if you want to
use unsigned char for the IO buffer, that's almost surely OK too.
You'll need a reinterpret_cast (or two static_cast) when you pass it to
ostream::write, but again, regardless of the formal guarantees, an
implementation WILL make this work; anything else is just too outrageous
and to unexpected to be practical.

--
James Kanze GABI Software

Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34

ka...@gabi-soft.fr

unread,

May 26, 2004, 7:30:18 PM5/26/04

to

Samuel Krempp <kre...@crans.ens-cachan.fr> wrote in message

news:<40b27d41$0$29909$626a...@news.free.fr>...

> le Monday 24 May 2004 18:14, ka...@gabi-soft.fr écrivit :

> > Samuel Krempp <kre...@crans.ens-cachan.fr> wrote in message
> > news:<40b0a161$0$7697$636a...@news.free.fr>...

> >> le Sunday 23 May 2004 13:48, hof...@anvil-soft.com écrivit :

> >> > I am still confused about the distinction between a pointer to
> >> > char* and a pointer to unsigned char*, especially as 3.9.2/4
> >> > only guarantees a char* to have the same respresentation as a
> >> > void*.

> >> > In another post of this threat, you gave an explanation of
> >> > 3.9.1/1 with respect to the difference of char* and unsigned
> >> > char*. However, I don't understand why, in the context of our
> >> > discussion, it is important wether all possible bit patterns of
> >> > an unsigned char represent numbers, as long as all char types
> >> > share the same object representation

> >> I agree with you, I think it has no importance. As long as you
> >> don't do arithmetic on the pointed values, the way they are
> >> associated with numbers has no impact, you can copy the char
> >> values without any loss (because "all bits of the object
> >> representation participate in the value representation"). several
> >> values might map to the same number, but as long as you use the
> >> values and don't rely on the numbers they represent, you're fine.

> > Where does it say this?

> you're right, I was confused. I thought all possible bit patterns for
> char had to be valid 'values'.

I think they do. But several different bit patterns can have the same
value (probably). And copying can change one of these bit patterns into
the other (probably not in C++, but certainly in C).

> I've looked at the norm with more attention : in fact there's nothing more
> precise than the information is §3.9 ("For any complete POD object type T,
> whether or not the object holds a valid value of type T, the underlying
> bytes (1.7) making up the object can be copied into an array of char or
> unsigned char.").

> now how should we interpret this copying of bytes into an array of
> char ? either the norm implicitly says copying the bytes as chars
> does produce the same result as memcpy, either it seems like the norm
> doesn't provide any guaranteed way for user to re-implement
> memcpy. (because from void*, it seems we can make a char* but not an
> unsigned char*. And then while copying unsigned chars does copy all
> possible bit patterns, it seems we don't have the same guarantee with
> chars..)

I think that lack of a guarantee concerning void* to unsigned char* is
an oversight.

There are several functions (inherited from C) where this could be an
issue. In memchr, the C standard says "The memchr function locates the
first occurrence of c (converted to an unsigned char) in the initial n
characters (each interpreted as unsigned char) of the object pointed to
be s", which seems to imply that the void* will be treated as an
unsigned char* in it. Curiously enough, none of the other
mem... functions seem to mention this; I can sort of accept it for
memcpy, but the results of memcmp very definitely depend on whether you
access the memory as char or as unsigned char, at least on the machines
I use. Yet all I see is "The memcmp function compares the first n
characters of the object pointed to by s1 with the first n characters of
the object pointed to by s2."

The official definition of character (in the C standard) doesn't help
much either. First, because there are several: "<abstract> member of a
set of elements used for the organization, control, or representation of
data", "single-byte character" (which seems circular to me), "<C> bit
representation that fits in a byte." And secondly because none of them
seem to help in the definition of the above functions -- memcpy copies,
that's all we know, and memcmp compares, and returns a value according
to whether one object is greater than, equal to or less than the other,
without giving the slightest indication as to what greater than, equal
to or less than mean in this context. (A footnote indicating
unspecified results when comparing padding a struct strongly suggests
that bitwise equality is what is meant. But that still doesn't help for
greater than or less than.)

> it's strange because I was under the impression that handling bytes
> with chars was the normal thing to do in C++.

You're just too young:-).

In the earliest days, using char for raw memory was the usual practice.
Some time in the middle or late 80's, however, the use shifted to
unsigned char, at least amongst the C programmers I know (or knew at the
time); I'm pretty sure that this is still the usual practice in C, and
this is backed up by the standard. The C standard guarantees that "when
a pointer to an object is converted to a pointer to a character type,
the result points to the lowest addressed byte of the object." On the
other hand, as I've pointed out, it explicitly allows signed types
(including char) to have negative zeros, and says that "It is
unspecified [...] whether a negative zero becomes a normal zero when
stored in an object."

C++ apparently weakened the guarantee concerning pointers to just char
pointers, but strengthened the guarantee concerning accessing memory to
include char. I don't know whether either change was intentional; in
practice, I can't imagine a C++ implementation doing anything here that
would not be legal in C (a C++ implementation must provide memchr, for
example), but maybe there was an intent to allow char to be used as well
as unsigned char.

In C, it is fairly clear that the only guaranteed way to access the
underlying raw memory and get the actual bit pattern is through an
unsigned char*. For the moment, the various citations of the C++
standard in this thread have me thoroughly confused. Is it:

- char* is correct, and unsigned char* is not guaranteed,
- unsigned char* is correct, and char* is not guaranteed,
- both are correct and guaranteed, or
- neither is actually guaranteed, and you can't do it in "standard"
C++?

Personally, I would have expected C compatiblity to reign here.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Matthias Hofmann

unread,

May 26, 2004, 7:40:11 PM5/26/04

to

Samuel Krempp <kre...@crans.ens-cachan.fr> schrieb in im Newsbeitrag:

40b444fa$0$19635$626a...@news.free.fr...

>
> > Thus, it looks like this method was not designed for passing "raw
> > (preformatted) bytes". By the way, how can a string be implemented in
> > terms of plain chars (which is normal for C-style strings anyway) if
some
> > CPU is allowed to change their value at will?
>
> by reducing the set of valid values of chars inside a string to less than
> 2^(8*sizeof(char)). I think it's not as much paradoxal for strings as it
> is for ostreams..

So what if I pass 0xFF (assuming 8 bit chars) to ostream::write()? Does it
get converted to 0x00 on some machines, in order to normalize the 1's
complement representation?

I feel still unsecure: we have found out that an array of chars or unsigned
chars can be used to copy an object forth and back without changing any of
the bits - but where's the limit, when would the bit representation
unexpectedly change? For example, can I assign a plain char to an unsigned
char and back, so that the bit pattern stays the same?

Best regards,

Matthias Hofmann

unread,

May 26, 2004, 7:43:57 PM5/26/04

to

Alan McKenney <alan_mc...@yahoo.com> schrieb in im Newsbeitrag:
16a885f9.04052...@posting.google.com...

Section 5.2.10 / 7 says that the result of using a reinterpret_cast to
convert from int* to char* is unspecified. However, you can reinterpret_cast
it back to char* and get the same pointer. The way I see things, this means
that the new bit pattern contains all the necessary information to restore
the original one, but only the compiler knows what it looks like. So you
should not use the resulting char* except for casting it back to int*.

This is why I refuse to believe that using reinterpret_cast for converting a
T* to a char* is a good idea. The only thing that reinterpret_cast seems to
be good for is casting a pointer to an integral type, as in that case the
result is implementation defined, which means you can look it up in the
documentation of your compiler. This might be useful if you want to keep
track of the memory addresses in use while your program runs. Storing them
in an integer is always safe, while, e.g., a pointer whose memory has been
deleted, may not be used for anything else but assigning a valid value to
it:

int* p = new int;
...
int i = reinterpret_cast<int>( p );
delete p;
cout << i << endl; // OK.
cout << p << endl; // Undefined behaviour!

I think that reinterpret_cast is rather useless for conversion between
pointer types, I guess the behaviour for these cases has just been designed
for the sake of completeness.

Best regards,

Matthias Hofmann

llewelly

unread,

May 27, 2004, 7:19:51 AM5/27/04

to

ka...@gabi-soft.fr writes:

> llewelly <llewe...@xmission.dot.com> wrote in message
> news:<861xl9v...@Zorthluthik.local.bar>...
>> ka...@gabi-soft.fr writes:
>
>> > At this point, of course, we should distinguish between the standard
>> > and reality. In reality, a quick grep through /usr/include on my
>> > system turns up quite a few definitions of the form
>> > _[a-z][a-zA-Z0-9_]*,
>
>> Whoa. Try _[a-z0-9]* instead, as _[A-Z]* are still reserved at class,
>> namespace, or function scope.
>
> Whoa, yourself. "_[a-z0-9]*" will match any _ in the file:-). Look
> again at the regular expression I used:-).

Sorry, you were right all along - I missed the leading [a-z], and
then fumbled my own regex. Now I'm thinking it ought to be
_[a-z0-9][a-zA-Z0-9_]* , but maybe I'm just wrong again.

> In fact, things seem to be improving. I just tried:
>
> find /usr/include -type f | xargs egrep '# *define *_[a-z]'
>
> under Solaris 2.8, and came up with absolutely nothing. This definitely
> wasn't the case on some earlier versions. Similarly -- the same thing
> over the includes for g++ 3.3.1 turns up nothing, where as those for
> 2.95.2 still reveal a few.

With the default settings, g++ still defines a few things which
intrude into the user namespace, but those come from the
compiler, not the header files. They can all be turned off with
-ansi, or -std=c++98 . You can see them with:

$ touch foo.cc ; g++ -E -dM -std=c++98 foo.cc | grep -v '__' | grep -v '_[A-Z]'

If I leave out the -std=c++98 , I get 'i386' and 'unix' . Obviously
the first will be different under solaris. This is fine with me
since -std=c++98 is the flag for maximum feasible conformance.

>
>> > including one or two macros. This has been the case for every
>> > system I've ever encountered. So regardless of what the standard
>> > says, why take the risk?
>
>> IMO, underscores are for seperating words in a identifier, and as
>> such belong between words, and not at the begining or the
>> end. (Though I admit I've broken this rule a time or two, and of
>> course underscores at the end are safe.)
>
> IMO, underscores have no place in code, since I've occasionally had
> fonts in which they weren't visible:-).

Eewww. I'm glad I've never had to read code with such a font. Since the
standard library uses '_' in more than a few places, such a font
is nearly as bad as those in which '1', 'l', and 'I' all look the
same.

> Seriously, it is a question of
> coding convention.

Yes. So while I prefer using underscores to seperate words, in
practice I just do whatever the team is already doing.

> There are pratical reasons for avoiding them at the
> start of a word, and they are an esthetic abomination at the end, but
> other than that, as long as you don't double them...

Some people like them at the end. I used to, until I realized I tended
to miss them when reading and forget them when typing new code. I
don't see them as an esthetic abomination, but I've come to avoid
underscores at the end for readability.

llewelly

unread,

May 27, 2004, 7:20:33 AM5/27/04

to

ka...@gabi-soft.fr writes:
[snip]

> I guess my habits as an old C programmer are just too ingrained to
> break. (And I still think it would be nicer if we had a set of "raw
> memory" types.)

[snip]
There is a lot of ugliness which I think comes from the fact that
'char' is really used for at least 3 distinct jobs: a small
integral type, a character type, and a raw memory type.

Francis Glassborow

unread,

May 27, 2004, 8:34:08 PM5/27/04

to

In message <86r7t6b...@Zorthluthik.local.bar>, llewelly
<llewe...@xmission.dot.com> writes

>ka...@gabi-soft.fr writes:
>[snip]
> > I guess my habits as an old C programmer are just too ingrained to
> > break. (And I still think it would be nicer if we had a set of "raw
> > memory" types.)
>[snip]
>There is a lot of ugliness which I think comes from the fact that
> 'char' is really used for at least 3 distinct jobs: a small
> integral type, a character type, and a raw memory type.

But by intent if not by definition:

small integer type: signed char (or possibly unsigned char but never
char)
character type: char or wchar_t
raw memory: unsigned char

The problem is that not all programmers abide by the intent.

--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects

Matthias Hofmann

unread,

May 27, 2004, 9:12:57 PM5/27/04

to

<ka...@gabi-soft.fr> schrieb in im Newsbeitrag:
d6652001.04052...@posting.google.com...
>

> In C, it is fairly clear that the only guaranteed way to access the
> underlying raw memory and get the actual bit pattern is through an
> unsigned char*. For the moment, the various citations of the C++
> standard in this thread have me thoroughly confused. Is it:
>
> - char* is correct, and unsigned char* is not guaranteed,
> - unsigned char* is correct, and char* is not guaranteed,
> - both are correct and guaranteed, or
> - neither is actually guaranteed, and you can't do it in "standard"
> C++?
>
> Personally, I would have expected C compatiblity to reign here.

I am not an expert, but the way I interpret the standard, both plain char*
unsigned char* are guaranteed. As you have cited from the C standard, the
resulting bit pattern is unspecified when you copy memory using a plain
char. However, strengthening the guarantee to include plain char pointers
does not break existing C code, where only unigned char pointers are used.
Therefore, the only compatibility problem would arise if you port C++ code
back to C, but in that case, you would probably have a lot of other problems
besides chars... ;-)

But I still wonder what is allowed to happen to the bit pattern of a signed
char in C++ when the machine uses 1's complement representation (assuming 8
bit chars and 32 bit ints):

// Bit pattern should be "11111110" (shouldn't it?)
signed char c = -1;

// What is the bit pattern now - "11111111" or "00000000"?
++c;

// And now? 0x00000000 or 0xFFFFFFFF?
int i = c;

And does it make any difference wether plain char is signed or unsigned, now
that we have learned that both plain char and unsigned char can be used to
copy a value forth and back?

Best regards,

Matthias Hofmann

Alan McKenney

unread,

May 27, 2004, 9:13:25 PM5/27/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> wrote in message news:<c92i9c$mp5$1...@news1.nefonline.de>...

> Alan McKenney <alan_mc...@yahoo.com> schrieb in im Newsbeitrag:
> 16a885f9.04052...@posting.google.com...
> > Maciej Sobczak <no....@no.spam.com> wrote in message
> news:<c89rhq$85t$1...@atlantis.news.tpi.pl>...
> > > Hi,
> > >
> > > Matthias Hofmann wrote:
> > >
>
> > > > I read in the FAQ that a pointer to a char can be implemented much
> > > > differently from a pointer to an int, or any other type. Maybe this is
> the
> > > > reason why the static_cast does not work.

<snip>

> > This becomes an issue on a machine like a CRAY-XMP, where
> > the smallest directly addressable memory is a 64-bit word,
> > and characters are packed 8 to a word.
> >
> > "int *" is implemented as a memory address, while
> > "char *" is implemented as
> >
> > ( memory_address << 3 ) | ( byte_no_within_word )
> >
> >
> > The naive implementation of reinterpret_cast<char *>(&i)
> > (use the same bit represenation) would result in a
> > pointer to a completely different word.
> >
> > If the address of i is 0x10000, then cp would point
> > to byte 0 in word 0x2000.

<snip>

> I think that reinterpret_cast is rather useless for conversion between
> pointer types, I guess the behaviour for these cases has just been designed
> for the sake of completeness.

<snip>

It *can* be used for conversion between pointer types. I do it all
the time to cast between "const char *" and const unsigned char *".
I think this would even work on a Cray-XMP.

However, you have to know enough about the implementations you
use to know whether it will work.

One problem is that the most popular platforms -- Intel, SPARC,
Mac, etc. -- use byte addressing, and all pointers are byte
pointers. On these, "reinterpret_cast" for pointers works as
most users expect.

When we old-timers who have used -- or still use -- computers
with 36 or 60-bit words, 6- and/or 7-bit characters, address registers
that automatically load from (or store to) memory when they
are set, or "tagged" architectures, less experienced posters will
dismiss us as crazy or stupid.

"reinterpret_cast" is perfectly usable, as long as you are willing
to check each time you port your code to a new machine that
it still does what you expect in each case you use it.

ka...@gabi-soft.fr

unread,

May 28, 2004, 12:03:56 PM5/28/04

to

Francis Glassborow <fra...@robinton.demon.co.uk> wrote in message
news:<f9SIxvXS...@robinton.demon.co.uk>...

> In message <86r7t6b...@Zorthluthik.local.bar>, llewelly
> <llewe...@xmission.dot.com> writes
>>ka...@gabi-soft.fr writes:
>>[snip]
>>> I guess my habits as an old C programmer are just too ingrained to
>>> break. (And I still think it would be nicer if we had a set of
>>> "raw memory" types.)
>>[snip]
>>There is a lot of ugliness which I think comes from the fact that
>> 'char' is really used for at least 3 distinct jobs: a small
>> integral type, a character type, and a raw memory type.

> But by intent if not by definition:

> small integer type: signed char (or possibly unsigned char but never
> char)
> character type: char or wchar_t
> raw memory: unsigned char

> The problem is that not all programmers abide by the intent.

The problem is that there was a lot of code writen before this intent
was invented:-). The original C that I used didn't even have signed
char, for example.

I agree that what you suggest is a good convention. (It must be, since
it is the one I use:-).) If you think it should be the "intent" of the
standard, however, then some things need clarification. Like, for
example, why (as has been pointed out here), there is standard
guaranteed way of going from char* to unsigned char* (but I think this
is more oversight, and that it was intended to guarantee it), or why IO
to and from raw memory uses char instead of unsigned char.

--
James Kanze GABI Software

Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34

ka...@gabi-soft.fr

unread,

May 28, 2004, 12:07:53 PM5/28/04

to

llewelly <llewe...@xmission.dot.com> wrote in message

news:<86y8neb...@Zorthluthik.local.bar>...

> ka...@gabi-soft.fr writes:
> > IMO, underscores have no place in code, since I've occasionally had
> > fonts in which they weren't visible:-).

> Eewww. I'm glad I've never had to read code with such a font. Since the
> standard library uses '_' in more than a few places, such a font
> is nearly as bad as those in which '1', 'l', and 'I' all look the
> same.

I've never had to develop code with such a font, but I have encountered
it when reading email or news -- including mails or news with embedded
code.

For that matter, if emacs/GNUS sees a word with a _ at both ends, it
suppresses the _'s, and displays it in italics or bold. I'm not sure
which, since the font I actually use doesn't have either, and there is
no change in the display. So something like _ABC_ doesn't come out
right.

> > Seriously, it is a question of coding convention.

> Yes. So while I prefer using underscores to seperate words, in
> practice I just do whatever the team is already doing.

I actually prefer(red) underscores too. But all of the projects I've
been on for over 10 years have used camel case, so it's become more or
less a habit now. Just goes to show that you can get used to
anything:-).

> > There are pratical reasons for avoiding them at the
> > start of a word, and they are an esthetic abomination at the end, but
> > other than that, as long as you don't double them...

> Some people like them at the end. I used to, until I realized I tended
> to miss them when reading and forget them when typing new code. I
> don't see them as an esthetic abomination, but I've come to avoid
> underscores at the end for readability.

Beauty is in the eye of the beholder:-). My personal preference is for
either myVar or my_var. Or maVar/ma_var, depending on who I'm working
for. (Luckily, the convention in the last German company I worked for
was m_var. meine_var starts getting to be a bit long:-).)

--
James Kanze GABI Software

Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34

Gabriel Dos Reis

unread,

May 29, 2004, 7:51:25 AM5/29/04

to

ka...@gabi-soft.fr writes:

[...]

| I agree that what you suggest is a good convention. (It must be, since
| it is the one I use:-).) If you think it should be the "intent" of the
| standard, however, then some things need clarification. Like, for
| example, why (as has been pointed out here), there is standard
| guaranteed way of going from char* to unsigned char* (but I think this
| is more oversight, and that it was intended to guarantee it), or why IO
| to and from raw memory uses char instead of unsigned char.

You must not have been following the recent discussion on the LWG
reflector :-)

--
Gabriel Dos Reis
g...@integrable-solutions.net

Ben Hutchings

unread,

May 30, 2004, 9:56:48 PM5/30/04

to

Gabriel Dos Reis wrote:
> ka...@gabi-soft.fr writes:
>
> [...]
>
>| I agree that what you suggest is a good convention. (It must be, since
>| it is the one I use:-).) If you think it should be the "intent" of the
>| standard, however, then some things need clarification. Like, for
>| example, why (as has been pointed out here), there is standard
>| guaranteed way of going from char* to unsigned char* (but I think this
>| is more oversight, and that it was intended to guarantee it), or why IO
>| to and from raw memory uses char instead of unsigned char.
>
> You must not have been following the recent discussion on the LWG
> reflector :-)

That is a singularly unhelpful observation. Why don't you enlighten us?

Matthias Hofmann

unread,

Jun 2, 2004, 8:51:45 AM6/2/04

to

Alan McKenney <alan_mc...@yahoo.com> schrieb in im Newsbeitrag:
16a885f9.04052...@posting.google.com...
>

> > I think that reinterpret_cast is rather useless for conversion between
> > pointer types, I guess the behaviour for these cases has just been
designed
> > for the sake of completeness.
>
> <snip>
>
> It *can* be used for conversion between pointer types. I do it all
> the time to cast between "const char *" and const unsigned char *".
> I think this would even work on a Cray-XMP.

According to 5.2.10/7 the result of such a cast is unspecified. Note that
plain char, unsigned char and signed char are distinct types. However, I
have no doubt that it can work in practice, but according to the standard it
does not have to.

[snip]

> "reinterpret_cast" is perfectly usable, as long as you are willing
> to check each time you port your code to a new machine that
> it still does what you expect in each case you use it.

Well, you can do *anything* you like if you are willing to go thru and
retest the entire code each time you port it... ;-)

Best regards,

Matthias Hofmann

James Kanze

unread,

Jun 3, 2004, 7:15:47 AM6/3/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> writes:

|> Well, you can do *anything* you like if you are willing to go thru
|> and retest the entire code each time you port it... ;-)

Do you mean to say that you deliver code for a machine without having
tested it? Any time you port, you do have to go through and retest
everything. Any time you upgrade the compiler, you have to go through
and retest everything.

In fact, anytime anything in your code or your production chain changes,
you need to do full regression tests before release. This is why
serious software vendors don't release new versions for each individual
bug fix.

--
James Kanze
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

Matthias Hofmann

unread,

Jun 5, 2004, 5:39:16 PM6/5/04

to

James Kanze <ka...@gabi-soft.fr> schrieb in im Newsbeitrag:
m2y8n51...@thomas-local.gabi-soft.fr...

> "Matthias Hofmann" <hof...@anvil-soft.com> writes:
>
> |> Well, you can do *anything* you like if you are willing to go thru
> |> and retest the entire code each time you port it... ;-)
>
> Do you mean to say that you deliver code for a machine without having
> tested it? Any time you port, you do have to go through and retest
> everything. Any time you upgrade the compiler, you have to go through
> and retest everything.

Well, I have actually never ported code to another machine so far. This is
mainly because I am a games programmer, and due to the massive use of
DirectX and Windows API calls, my code is highly platform dependent. Of
course we thoroughly test the game on each target platform before releasing
it.

But my point actually was that when your code complies with the standard,
you should not *have* to do these tests, although they are recommended. Your
code may be standard compliant, but maybe the compiler on the new platform
isn't.

However, I was also trying to say that it is not a good idea to write code
that is wrong with respect to the standard, so that it might fail even if
the compiler does comply with the standard. Or do you really check the
assembly code created every place you use a reinterpret_cast? Even if it
works at the moment, it might be a mere coincidence, and the program might
crash as soon as the pointer gets a different value, one that it simply
never had during your testing.

Best regards,

Matthias

ka...@gabi-soft.fr

unread,

Jun 7, 2004, 6:58:17 PM6/7/04

to

"Matthias Hofmann" <hof...@anvil-soft.com> wrote in message

news:<c9sffq$9nc$1...@news1.nefonline.de>...

> James Kanze <ka...@gabi-soft.fr> schrieb in im Newsbeitrag:
> m2y8n51...@thomas-local.gabi-soft.fr...
> > "Matthias Hofmann" <hof...@anvil-soft.com> writes:

> > |> Well, you can do *anything* you like if you are willing to go
> > |> thru and retest the entire code each time you port it... ;-)

> > Do you mean to say that you deliver code for a machine without
> > having tested it? Any time you port, you do have to go through and
> > retest everything. Any time you upgrade the compiler, you have to go
> > through and retest everything.

> Well, I have actually never ported code to another machine so far.
> This is mainly because I am a games programmer, and due to the massive
> use of DirectX and Windows API calls, my code is highly platform
> dependent. Of course we thoroughly test the game on each target
> platform before releasing it.

So why worry about this?

> But my point actually was that when your code complies with the
> standard, you should not *have* to do these tests, although they are
> recommended. Your code may be standard compliant, but maybe the
> compiler on the new platform isn't.

Whether your code complies with the standard, you DO have to do these
tests. Because, of course, there may be unintentional errors in it
which don't, in fact, comply with the standard. Things like
dependencies on order of initialization (which slip in incredibly
easily) or order of evaluation. And how many programmers here could
really swear that their code will work on a 49 bit signed magnitude
machine, with trapping representations in ints, even though the intent
was that it contained no hardware dependencies.

You test, or you don't release.

> However, I was also trying to say that it is not a good idea to write
> code that is wrong with respect to the standard, so that it might fail
> even if the compiler does comply with the standard.

It's a tradeoff. If you've followed the discussion carefully, you'll
realize that the standard isn't as precise as we would like, and that it
seems in contradiction with the C standard (although a footnote says
that the intent is that it be compatible), and that taken (too)
literally, basic_stream<char> wouldn't be implementable. And of course,
you're considering something that "everybody does", and that everybody
has done since the days of C, so a compiler vendor can't afford not to
support it.

On the other hand, what is the cost of trying to be conform, given that
it isn't 100% clear what conform really means? Is it worth the cost.

> Or do you really check the assembly code created every place you use a
> reinterpret_cast?

I never check the assembly code of anything. I do read the
documentation of the compiler, which will often tell me what I need to
know.

> Even if it works at the moment, it might be a mere coincidence, and
> the program might crash as soon as the pointer gets a different value,
> one that it simply never had during your testing.

But that's true regardless of what you do, conform or not.

--
James Kanze GABI Software

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung

9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34