Trying to come to terms with typecasting operators

Rune Allnor

unread,

Nov 23, 2009, 4:44:19 PM11/23/09

to

Hi all.

I need to inspect the bit patterns in a buffer. In order to do that, I
want
to print the values of each character to screen as numerical values
on
hex format.

Below is an example program that shows a number of more or less
successful attempts to achieve this. The buffer c is initialized with
the
hexadecimal numbers 0x30, 0x31, 0x32 and 0x33, which happens
to be the ASCII values of the characters '0', '1', '2' and '3'.

In attempt a) I just pipe the ASCII values to std::cout along with a
hint that I want the hex values to be printed. The attempt is clearly
unsuccessful, as the characters, not the hex values, are printed.

In attempt b) I type cast the values of the buffer from uchar to
uint. This has the desired effect, at the expense of using a
C-style type cast, which is something I want to avoid, if at all
possible.

Attempt c), where I achieve the desired result by assigning the
uchar value to be printed to the temporary uint variable d is a bit
clumsy, since it involves a temporary variable that is not strictly
necessary, either from a semantic or algorithmic point of view.

So the question is if it is possible to achieve the desired result
by using a reinterpret_cast<> or static_cast<> directly on the uchar
values inside the buffer c?

I can't see any obvious way to achieve this, since chars
have a different memory footprint than any of the int types
I am aware of, that otherwise would suit the purpose.

Are there any data types with the same memory footprint as
chars, that are interpreted by the IO library as numeric integer
values, as opposed to text glyphs?

Rune

--------------- Output ----------------
a) Naive output:
0 1 2 3

b) C-style type cast:
30 31 32 33

c) Type cast by variable assignment:
30 31 32 33
---------------------------------------

/////////////////////////////////////////////////////////////////////////////
#include <iomanip>
#include <iostream>

int main()
{
// Hex values of digits: 0 1 2 3
unsigned char c[] = {0x30,0x31,0x32,0x33};
std::cout << "a) Naive output:" << std::endl;
for (size_t n = 0; n != 4; ++n)
{
std::cout << std::hex << c[n] << " ";
}
std::cout << std::endl << std::endl;

std::cout << "b) C-style type cast:" << std::endl;
for (size_t n = 0; n != 4; ++n)
{
std::cout << std::hex << (unsigned int) c[n] << " ";
}
std::cout << std::endl << std::endl;

std::cout << "c) Type cast by variable assignment:" << std::endl;
for (size_t n = 0; n != 4; ++n)
{
size_t d = c[n];
std::cout << std::hex << d << " ";
}
std::cout << std::endl;
return 0;
}

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Neil Butterworth

unread,

Nov 24, 2009, 12:36:40 AM11/24/09

to

I can't see what is wrong with the obvious:

std::cout << std::hex << static_cast<unsigned int>(c[n]) << " ";

And I don't see why you think having a "different memory footprint"
precludes a cast - a cast creates a new value which may have a
completely different "footprint" from the thing the cast is applied to.

Neil Butterworth

Daniel Krügler

unread,

Nov 24, 2009, 12:50:19 AM11/24/09

to

On 23 Nov., 22:44, Rune Allnor <all...@tele.ntnu.no> wrote:
> Hi all.
>
> I need to inspect the bit patterns in a buffer. In order to do that, I
> want
> to print the values of each character to screen as numerical values
> on
> hex format.
>
> Below is an example program that shows a number of more or less
> successful attempts to achieve this. The buffer c is initialized with
> the
> hexadecimal numbers 0x30, 0x31, 0x32 and 0x33, which happens
> to be the ASCII values of the characters '0', '1', '2' and '3'.
>
> In attempt a) I just pipe the ASCII values to std::cout along with a
> hint that I want the hex values to be printed. The attempt is clearly
> unsuccessful, as the characters, not the hex values, are printed.

Yes, this is intended behavior.

> In attempt b) I type cast the values of the buffer from uchar to
> uint. This has the desired effect, at the expense of using a
> C-style type cast, which is something I want to avoid, if at all
> possible.

So, why not using a static_cast then?

> Attempt c), where I achieve the desired result by assigning the
> uchar value to be printed to the temporary uint variable d is a bit
> clumsy, since it involves a temporary variable that is not strictly
> necessary, either from a semantic or algorithmic point of view.
>
> So the question is if it is possible to achieve the desired result
> by using a reinterpret_cast<> or static_cast<> directly on the uchar
> values inside the buffer c?

No. What is your hesitation about a static_cast? You could
also simply rely on implicit conversion with a very tiny helper
function:

inline int to_int(char c) { return c; }

You could also easily define your own IO manipulator, which
performs this for you and would perform the to-hex conversion
in one step:

struct char_as_hex_t {
int c;
char_as_hex_t(char c) : c(c) {}
};

inline char_as_hex_t as_hex(char c) {
return char_as_hex_t(c);
}

template <typename Ch, typename Tr>
inline std::basic_ostream<Ch, Tr>& operator<<(
std::basic_ostream<Ch, Tr>& os,
char_as_hex_t manip)
{
os << std::hex << manip.c;
return os;
}

and use it like this:

std::cout << as_hex('a') << std::endl;

If you are really cool, you define char_as_hex_t
as a template, which is specialized for all integral
types with special handling for the character types.
In this case I would rename it to "as_hex" or so.

> I can't see any obvious way to achieve this, since chars
> have a different memory footprint than any of the int types
> I am aware of, that otherwise would suit the purpose.
>
> Are there any data types with the same memory footprint as
> chars, that are interpreted by the IO library as numeric integer
> values, as opposed to text glyphs?

This depends on which sizeof ratios the fundamental types
have ;-)
On some machines you may find sizeof(char) == sizeof(long double).
IMO your search for "same memory footprint" in this kind of
example is premature optimization of the worst kind: For most
machines an int function argument is much more advantageous
than a (1-octet) char, because the registers need to be
unpacked again, while int typically has already the proper
register size. You could use unsigned short, but I would
never do this in this example - just stay with int.

HTH & Greetings from Bremen,

Daniel Kr�gler

Francis Glassborow

unread,

Nov 24, 2009, 12:53:10 AM11/24/09

to

What has memory footprint got to do with static_cast<> ?

int main() {
// Hex values of digits: 0 1 2 3
unsigned char c[] = {0x30,0x31,0x32,0x33};

std::cout << "Using static_cast<> " << std::endl;

for (size_t n = 0; n != 4; ++n) {

std::cout << std::hex << static_cast<unsigned int> (c[n])
<< " ";
}

std::cout << std::endl << std::endl;

return 0;
}

Note that casts are generally about values not about storage types. In
C++ problems can arise if you cast to a reference type but that is not
the case here.

Rune Allnor

unread,

Nov 24, 2009, 12:51:40 AM11/24/09

to

Hm.

Wrote the code and the post in parallel, so I lost the
point first time around.

In the code below is the test using reinterpret_cast<>
added, along with the corresponding added output.
As can be seen, each char value is preceeded by a number
of leading zeros that I am unable to get rid of. I assume
this has something to do with memory footprints and/or byte
ordering (I am using VS2008 on a PC, which, according to

http://en.wikipedia.org/wiki/Endianness

means that the internal byte organization is little-endian).

Rune

> --------------- Output ----------------
> a) Naive output:
> 0 1 2 3
>
> b) C-style type cast:
> 30 31 32 33
>
> c) Type cast by variable assignment:
> 30 31 32 33

d) reinterpret_cast:
00000030 00000031 00000032 00000033

> ---------------------------------------
>
> ///////////////////////////////////////////////////////////////////////////�//

> #include <iomanip>
> #include <iostream>
>
> int main()
> {
> // Hex values of digits: 0 1 2 3
> unsigned char c[] = {0x30,0x31,0x32,0x33};
> std::cout << "a) Naive output:" << std::endl;
> for (size_t n = 0; n != 4; ++n)
> {
> std::cout << std::hex << c[n] << " ";
> }
> std::cout << std::endl << std::endl;
>
> std::cout << "b) C-style type cast:" << std::endl;
> for (size_t n = 0; n != 4; ++n)
> {
> std::cout << std::hex << (unsigned int) c[n] << " ";
> }
> std::cout << std::endl << std::endl;
>
> std::cout << "c) Type cast by variable assignment:" << std::endl;
> for (size_t n = 0; n != 4; ++n)
> {
> size_t d = c[n];
> std::cout << std::hex << d << " ";
> }
> std::cout << std::endl;

///////////////////////////////////////////////////
std::cout << "d) reinterpret_cast:"

<< std::endl;
for (size_t n = 0; n != 4; ++n)
{

std::cout << std::hex << reinterpret_cast<int*> (c[n]) << " ";

}
std::cout << std::endl << std::endl;

///////////////////////////////////////////////////

Francis Glassborow

unread,

Nov 24, 2009, 11:00:22 AM11/24/09

to

Rune Allnor wrote:
> Hm.
>
> Wrote the code and the post in parallel, so I lost the
> point first time around.
>
> In the code below is the test using reinterpret_cast<>
> added, along with the corresponding added output.
> As can be seen, each char value is preceeded by a number
> of leading zeros that I am unable to get rid of. I assume
> this has something to do with memory footprints and/or byte
> ordering (I am using VS2008 on a PC, which, according to
>
> http://en.wikipedia.org/wiki/Endianness
>
> means that the internal byte organization is little-endian).
>

But reinterpret_cast<> is exactly the case where bit patterns are being
considered along with memory 'footprints'. It is very low level and
intentionally unlikely to be useful in a portable program. Easy
detection was one reason for the long. ugly name (seriously, that was
part of the design of C++ casts)

Nick Hounsome

unread,

Nov 24, 2009, 11:01:40 AM11/24/09

to

On 24 Nov, 05:51, Rune Allnor <all...@tele.ntnu.no> wrote:
> Hm.
>
> Wrote the code and the post in parallel, so I lost the
> point first time around.
>
> In the code below is the test using reinterpret_cast<>
> added, along with the corresponding added output.
> As can be seen, each char value is preceeded by a number
> of leading zeros that I am unable to get rid of. I assume
> this has something to do with memory footprints and/or byte
> ordering (I am using VS2008 on a PC, which, according to

Output format of pointers is not portable and so should never be used
except for debug. (I've seen some horrible implementations that treat
pointers as signed int!!!).

Incidentally you probably should have had setw(2) in your original
hex output because you want 0x01 to come out as 01 not 1.

Also reinterpret_cast is the one new style cast that you really really
want to avoid since it is the least restrictive (The only thing that
the compiler can complain about is trying to use it to remove
constness or volatility)

A char really IS a small integer in both C and C++ and uinsigned int
is definitely at least as big as char so static_cast<unsigned int> is
not in the least bit dodgy. This is why you don't even need a cast to
assign to a temporary variable.

A char really IS NOT a pointer which is why you have to resort to
reinterpret_cast or C style cast to treat it as one.

> http://en.wikipedia.org/wiki/Endianness
>
> means that the internal byte organization is little-endian).
>

Endianness is irrelevant when considering single bytes (char) and
nothing in your output implies little endian anyway.
The only way to see endianness is to take a multibyte integral or
pointer value and treat it as a sequence of bytes:

void* x = ...; // or "int x=..."
char* cp = reinterpret_cast<char*>(&x); // note the &
for(int i=0; i < sizeof(vp); ++i)
cout << setw(2) << hex << static_cast<unsigned int>(*cp);

> Rune

John H.

unread,

Nov 24, 2009, 3:40:32 PM11/24/09

to

On Nov 23, 4:44 pm, Rune Allnor <all...@tele.ntnu.no> wrote:
> So the question is if it is possible to achieve the desired result
> by using a reinterpret_cast<> or static_cast<> directly on the uchar
> values inside the buffer c?
>
> I can't see any obvious way to achieve this, since chars
> have a different memory footprint than any of the int types
> I am aware of, that otherwise would suit the purpose.

static_cast can be thought of as taking a value of one type and
creating a corresponding value in another type. "Corresponding" can
pretty much mean any relationship, but often means the value of the
created type will be as similar as possible to the value of the source
type.

int num1 = static_cast<long>(3.14);
Here, 3.14 was typecast to a long, which is an integer, so from the
3.14 a 3 is created, which gets stored in num1.

float num2 = static_cast<float>(3);
Here 3 wass typecast to a float, so a 3.0f is created and stored in
num2.

float num3 = static_cast<float>(3.1415926535897932384626433832795);
Here, pi up to many digits of accuracy was converted to a float, which
will have less digits of accuracy

double num4 = static_cast<float>(3.1415926535897932384626433832795);
Here, pi up to many digits of accuracy was converted to a float, which
will have less digits of accuracy. This then gets stored in the
double num4, which could have held more digits of accuracy, but as it
is, those digits were discarded by the cast to float, so when the
float is stored into the double, the extra digits will just be zeros.

reinterpret_cast is different. I am going to say it is usually
applied to pointers and my description will reflect this, but there
may be valid applications to non-pointer types. The reinterpret_cast
takes the bits pointed at by a pointer of some type, and allows those
bit to be accessed as if they were the bits of a variable of a
different type.

Here is an example that shows how the two types of casts behave
differently (details may differ depending on platform):

int num1 = 3; // num1 has a value of 3 and a byte pattern of
0x00000003
float num2 = static_cast<float>(num1); // num2 has a value of 3 and a
byte pattern of 0x40400000
float num3 = *dynamic_cast<float*>(&num1); // num3 has a value of
4.20390e-045 and a byte pattern of 0x00000003

> std::cout << std::hex << c[n] << " ";

I think the std::cout behavior for char and unsigned char is to output
to the ASCII character representation, but for int, long etc. it will
output the numeric representation. So to get the desired behavior, we
have to get to a type that triggers the number rather than character
behavior:
std::cout << std::hex << static_cast<unsigned int>(c[n]) << std::endl;