What does operating on raw bytes mean in a C++ context?

Paul

unread,

Nov 3, 2018, 4:53:14 PM11/3/18

to

A cryptopals.com problem involves translating hex to base 64.
For example, 49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d

should produce
SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t

I'm confused by the instruction: "Always operate on raw bytes, never on encoded strings. Only use hex and base64 for pretty-printing."

What does "raw bytes" mean in terms of the input/output parameters.
Presumably it means I shouldn't have a const std::string& as input
and a std::string as output?

Does anyone know what it means to translate hex to base 64 "by operating
on raw bytes" in a C++ context?

Thank you,

Paul

Alf P. Steinbach

unread,

Nov 3, 2018, 5:17:56 PM11/3/18

to

On 03.11.2018 21:52, Paul wrote:
> A cryptopals.com problem involves translating hex to base 64.
> For example, 49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d
>
> should produce
> SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
>
> I'm confused by the instruction: "Always operate on raw bytes, never on encoded strings. Only use hex and base64 for pretty-printing."

As I read it, the instruction is so unclear as to be pretty meaningless.

> What does "raw bytes" mean in terms of the input/output parameters.

I would guess that given a string "49276d206b", that that string itself
should not be encoded as base 64, but rather the byte values that it
*denotes*, should be so encoded.

Then you run into two issues: the assumed byte size (8 bits? 16 bits?
what?), and the endianess assumed for the string.

But the example clears that: you can just try the most reasonable
assumption first, and other assumptions if that doesn't reproduce the
example. And if none of these match the example, then perhaps my
interpretation of the instruction is wrong. So. :)

> Presumably it means I shouldn't have a const std::string& as input
> and a std::string as output?

No no, `std::string` is fine as container of raw bytes.

E.g. it's used a raw byte container by some Google networking API.

So, `std::string` doesn't necessarily imply any encoding. Where an
encoding has to assumed, then usually the only reasonable assumption is
the program's Basic Execution Character Set, as discussed in the
standard. But if that isn't UTF-8, then in some cases one has to be very
careful about interpreting a textual string as BECS or UTF-8.

> Does anyone know what it means to translate hex to base 64 "by operating
> on raw bytes" in a C++ context?

I don't, but my thoughts about it above.

"Raw byte" just means uninterpreted byte values.

Cheers!,

- Alf

Paul

unread,

Nov 3, 2018, 6:10:20 PM11/3/18

to

Thanks a lot.
Ok, so std::string is fine. What would be an example of something
that you're not allowed to do?

Öö Tiib

unread,

Nov 3, 2018, 7:45:58 PM11/3/18

to

On Saturday, 3 November 2018 22:53:14 UTC+2, Paul wrote:
> A cryptopals.com problem involves translating hex to base 64.
> For example, 49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d
>
> should produce
> SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
>
> I'm confused by the instruction: "Always operate on raw bytes, never on encoded strings. Only use hex and base64 for pretty-printing."
>
> What does "raw bytes" mean in terms of the input/output parameters.

Perhaps "raw bytes" is meant as "unencoded bytes".
It is advice to operate on unencoded bytes instead of hex
encoded or base64 encoded textual representations.

> Presumably it means I shouldn't have a const std::string& as input
> and a std::string as output?

I don't think that it was meant.

> Does anyone know what it means to translate hex to base 64 "by operating
> on raw bytes" in a C++ context?

Perhaps it just means that you need to translate hex to bytes and
those bytes to base64 ... not to attempt to translate hex text
directly to base64 text.

Ben Bacarisse

unread,

Nov 3, 2018, 8:21:47 PM11/3/18

to

Paul <peps...@gmail.com> writes:

> A cryptopals.com problem involves translating hex to base 64.
> For example, 49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d
>
> should produce
> SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
>
> I'm confused by the instruction: "Always operate on raw bytes, never
> on encoded strings. Only use hex and base64 for pretty-printing."

It's probably because the site is language agnostic. You really would
care what this mean if you were using, say, Python.

> What does "raw bytes" mean in terms of the input/output parameters.
> Presumably it means I shouldn't have a const std::string& as input
> and a std::string as output?

I disagree with the advice you've had that std::string is OK for this
sort of work. You might get away with it for this first task, but zero
bytes can be a problem in std::string objects.

I'd use std::vector<unsigned char>. The unsigned is to smooth the way
for arithmetic and bit operations.

--
Ben.

Alf P. Steinbach

unread,

Nov 3, 2018, 9:22:45 PM11/3/18

to

On 04.11.2018 01:21, Ben Bacarisse wrote:
> Paul <peps...@gmail.com> writes:
>
>> A cryptopals.com problem involves translating hex to base 64.
>> For example, 49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d
>>
>> should produce
>> SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
>>
>> I'm confused by the instruction: "Always operate on raw bytes, never
>> on encoded strings. Only use hex and base64 for pretty-printing."
>
> It's probably because the site is language agnostic. You really would
> care what this mean if you were using, say, Python.
>
>> What does "raw bytes" mean in terms of the input/output parameters.
>> Presumably it means I shouldn't have a const std::string& as input
>> and a std::string as output?
>
> I disagree with the advice you've had that std::string is OK for this
> sort of work. You might get away with it for this first task, but zero
> bytes can be a problem in std::string objects.

`std::string` has no problem with zero-bytes.

Perhaps you're thinking of using `.c_str()` to convert to C-string.

That's a different string representation, that does have such a problem.

> I'd use std::vector<unsigned char>. The unsigned is to smooth the way
> for arithmetic and bit operations.

I think I'd also use a vector of traditional byte type, `unsigned char`.
But there's no /technical/ problem with using `std::string`.

After all, if it's good enough for this for Google, it's good enough,
even though other considerations IMO make it a less than perfect choice.
Those other considerations include that the default item type, `char`,
is typically signed, which needs more conversion operations sprinkled in
the code, which is an invitation to bugs to enter please, free
admission. And judging by what I've seen of questions about this, the
non-technical considerations include that it's easy for novices to get
confused about whether a string represents binary data or text.

Cheers!,

- Alf

Ben Bacarisse

unread,

Nov 3, 2018, 10:57:49 PM11/3/18

to

"Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:

> On 04.11.2018 01:21, Ben Bacarisse wrote:
>> Paul <peps...@gmail.com> writes:
>>
>>> A cryptopals.com problem involves translating hex to base 64.
>>> For example, 49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d
>>>
>>> should produce
>>> SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
>>>
>>> I'm confused by the instruction: "Always operate on raw bytes, never
>>> on encoded strings. Only use hex and base64 for pretty-printing."
>>
>> It's probably because the site is language agnostic. You really would
>> care what this mean if you were using, say, Python.
>>
>>> What does "raw bytes" mean in terms of the input/output parameters.
>>> Presumably it means I shouldn't have a const std::string& as input
>>> and a std::string as output?
>>
>> I disagree with the advice you've had that std::string is OK for this
>> sort of work. You might get away with it for this first task, but zero
>> bytes can be a problem in std::string objects.
>
> `std::string` has no problem with zero-bytes.
>
> Perhaps you're thinking of using `.c_str()` to convert to C-string.
>
> That's a different string representation, that does have such a
> problem.

That's a part of it, yes, though I was thinking in more general terms
about the interaction between std::string and null-terminated character
arrays. The std::string API uses a lot of CharT * parameters that are
taken to be null-terminated. Even trying to initialise a std::string
with a null-containing array can trip up the unwary.

It's all manageable with a few simple rules, but I don't see the point
for cryptographic manipulation. You won't be using the specifically
string-oriented parts of the std::string interface.

<snip>
--
Ben.

Pavel

unread,

Nov 4, 2018, 12:50:13 AM11/4/18

to

In practice, organizations still use cow version of "std::string" which is less
efficient than vector in both memory and time especially for short strings --
unless its cow feature is needed. Regardless, if I get to choose the API for
this facility, I would probably selected a traditional algorithm approach i.e.
something like:

template <typename InIter, typename OutIter>
OutIter ReEncodeHexToBase64(InIter beg, InIter end, OutIter out);

This way, you can use it to produce result in whichever container or stream your
need it, with no intermediate copying.

Just my 2c
-Pavel

Alf P. Steinbach

unread,

Nov 4, 2018, 3:42:57 AM11/4/18

to

On 04.11.2018 05:49, Pavel wrote:
> In practice, organizations still use cow version of "std::string" which is less
> efficient than vector in both memory and time especially for short strings --
> unless its cow feature is needed.

You mean, less efficient unless at some point it's copied.

Have you timed this, for an optimized build?

Cheers!,

- Alf

Jorgen Grahn

unread,

Nov 4, 2018, 12:09:02 PM11/4/18

to

On Sat, 2018-11-03, Paul wrote:
> A cryptopals.com problem involves translating hex to base 64.

> For example, 49276d206b696c6c696e6720796f757220627261696e20
> 6c696b65206120706f69736f6e6f7573206d757368726f6f6d
>
> should produce
> SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t

It's a confusion in terminology, and in levels of abstraction.

You (or that site) says the source is "hex", but hexadecimal notation
is just a sometimes convenient way of visualizing numbers as text.

It's common to think of memory as a sequence of bytes ("raw bytes),
and to visualize them as hex, but that doesn't mean memory /is/ hex.

> I'm confused by the instruction: "Always operate on raw bytes, never
> on encoded strings. Only use hex and base64 for pretty-printing."

Me too. I can only guess what that means (unless it's Python-specific
like someone implied).

Let's formulate a better exercise:

Base64 encodes a sequence of 8-bit bytes[0] as ASCII[1] text in a
fairly compact manner, according to RFC <something>. Implement it,
as one of the functions:

void encode(std::ostream& os, const void* data, std::size_t len);

// *it must be something that can be cast to unsigned char
template<class FwdIterator>
void encode(std::ostream& os, FwdIterator begin, FwdIterator end);

Also write unit tests.

/Jorgen

[0] IIRC you can encode a sequence of bits too, e.g. four or 81 bits,
but I think you can ignore that possibility.

[1] Perhaps one shouldn't assume ASCII ...

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Pavel

unread,

Nov 4, 2018, 1:44:10 PM11/4/18

to

Alf P. Steinbach wrote:
> On 04.11.2018 05:49, Pavel wrote:
>> In practice, organizations still use cow version of "std::string" which is less
>> efficient than vector in both memory and time especially for short strings --
>> unless its cow feature is needed.
>
> You mean, less efficient unless at some point it's copied.

Almost, to be precised "unless it's copied and not changed thereafter". For
example, some people pass string by value instead of const reference in function
parameters to save an indirection (which is a big part of why they got stuck
with cow string).

>
> Have you timed this, for an optimized build?

Not recently, no. I did some 9-10 years ago while building some symbol store. I
only recall that the results were largely consistent with my expectations; but
the winner was neither vector nor string but a custom-built fixed-size string,
copied by value (again, I was mainly concerned with short strings at the time).

>
>
> Cheers!,
>
> - Alf

-Pavel

Pavel

unread,

Nov 4, 2018, 1:52:28 PM11/4/18

to

Paul wrote:
> A cryptopals.com problem involves translating hex to base 64.
> For example, 49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d
>
> should produce
> SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
>
> I'm confused by the instruction: "Always operate on raw bytes, never on encoded strings. Only use hex and base64 for pretty-printing."

My guess is that they meant to emphasize that the solution should base64-encode
the bytes decoded from hex encoding rather than hex-encoded bytes themselves.
This seems like stating the obvious; but is it possible some takers just jumped
into 64-bit encoding and forgot to hex-decode?

>
> What does "raw bytes" mean in terms of the input/output parameters.
> Presumably it means I shouldn't have a const std::string& as input
> and a std::string as output?
>
> Does anyone know what it means to translate hex to base 64 "by operating
> on raw bytes" in a C++ context?
>
> Thank you,
>
> Paul
>

HTH
-Pavel

Paul

unread,

Nov 5, 2018, 7:58:17 AM11/5/18

to

Ok, the following code should satisfy requirements but it hasn't been
extensively tested. Feedback is welcome. I decided to code from
scratch without using library functions.

Thanks,

Paul

// Problem is https://cryptopals.com/sets/1/challenges/1
//49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d
// should produce
// SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
#include <iostream>
#include <unordered_map>
#include <cctype>
#include <cmath>
#include <utility>
#include <vector>

std::unordered_map<char, int> buildHexMap()
{
std::unordered_map<char, int> hexMap;
for(char letter = 'a'; letter <= 'f'; ++letter)
hexMap[std::toupper(letter)] = hexMap[letter] = 10 - 'a' + letter;

for(char letter = '1'; letter <= '9'; ++letter)
hexMap[letter] = 1 - '1' + letter;

return hexMap;
}

// Use this map to build a std::vector<int> from a string
std::vector<int> hex(const std::string& hexString, std::unordered_map<char, int> hexMap = buildHexMap())
{
std::vector<int> result(hexString.size());
for(int i = 0; i < result.size(); ++i)
result[i] = hexMap[hexString[i]];

return result;
}

// https://en.wikipedia.org/wiki/Base64 is reference
std::unordered_map<int, char> build64Map()
{
constexpr int alphabetSize = 26;
std::unordered_map<int, char> base64Map;
for(char letter = 'A'; letter <= 'Z'; ++letter)
{
base64Map[letter - 'A'] = letter;
base64Map [alphabetSize - 'a' + std::tolower(letter)] = std::tolower(letter);
}

for(char letter = '0'; letter <= '9'; ++letter)
base64Map[52 - '0' + letter] = letter;

base64Map[62] = '+' ;
base64Map[63] = '/';

return base64Map;
}

// A naive conversion can result in excessive zeros at the front.
// These are now removed.
std::vector<int> trim(const std::vector<int>& vec)
{
int i = 0;
while(i < vec.size() && !vec[i])
++i;

if(i == vec.size())
return {0};

return std::vector<int>(vec.begin() + i, vec.end());
}

// A block of 3 hex digits is equivalent to a block of two hex digits
// Use this equivalence to transform a vector of hex digits to a vector of base 64 digits
std::vector<int> hexToBase64(const std::vector<int>& hex)
{
if(hex.empty())
return hex;

constexpr int hexBlock = 3;
constexpr int base64Block = 2;
constexpr int convertBase = 64;
constexpr int hexBase = 16;

const double hexSize = hex.size(); // cast to double for ceiling operation

std::vector<int> result( std::ceil(hexSize/hexBlock) * base64Block);
int finalComponent = result.size() - 1;
for(int i = hexSize - 1; i>= 0; i -= hexBlock)
{
int hexValue = hex[i];
if(i)
hexValue += hexBase * hex[i - 1];
if(i >= 2)
hexValue += hexBase * hexBase * hex[i - 2];

result[finalComponent--] = hexValue % convertBase;
result[finalComponent--] = hexValue / convertBase;
}

return trim(result);
}

std::string hexToBase64(const std::string& hexString)
{
const std::vector<int>& base64 = hexToBase64(hex(hexString));
std::unordered_map<int, char> base64Map = build64Map();
std::string result;
for(int i : base64)
result += base64Map[i];

return result;
}

int main()
{
const std::string hex = "49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d";
const std::string answer = "SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t";

std::cout << ( hexToBase64(hex) == answer ? "Test passed" : "Test failed");
}

Juha Nieminen

unread,

Nov 5, 2018, 9:59:41 AM11/5/18

to

Paul <peps...@gmail.com> wrote:
> A cryptopals.com problem involves translating hex to base 64.
> For example, 49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d
>
> should produce
> SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
>
> I'm confused by the instruction: "Always operate on raw bytes, never on encoded strings. Only use hex and base64 for pretty-printing."
>
> What does "raw bytes" mean in terms of the input/output parameters.

It simply means that if the input is in hexadecimal, you first decode it
into the correspondent bytes, eg. into an std::vector<unsigned char>
(hexadecimal "00" corresponds to the byte value 0, "01" corresponds to
the byte value 1 and so on, up to "ff" corresponding to the byte value
255), and then you output those bytes in base64.

Converting from ascii hexadecimal representation into bytes is quite
easy: For each pair of ascii characters, see if it's between '0' and
'9', and if it is, subtract '0' from it. If it's between 'a' and 'f',
subtract 'a' from it and add 10. This gives you the upper 4 bits.
Do the same for the second characters, and it gives you the lower
4 bits. If you calculated them eg. into the variables ub and lb,
the byte value will be ub*16+lb.

Converting bytes into base64 is a bit more complicated but there
are easy tutorials out there.

Alf P. Steinbach

unread,

Nov 5, 2018, 11:35:18 AM11/5/18

to

Are you sure that CHAR_BIT, the number of bits per byte, equals 8?

> Converting bytes into base64 is a bit more complicated but there
> are easy tutorials out there.

Cheers!,

- Alf

Paul

unread,

Nov 5, 2018, 3:01:42 PM11/5/18

to

Thanks a lot, but you're giving me advice on how to do something that
I thought I had already done. Is there any reason why the code I presented
is not a valid solution?

Paul

Jorgen Grahn

unread,

Nov 5, 2018, 3:15:44 PM11/5/18

to

On Mon, 2018-11-05, Paul wrote:
> On Sunday, November 4, 2018 at 6:52:28 PM UTC, Pavel wrote:
>> Paul wrote:
>> > A cryptopals.com problem involves translating hex to base 64.
>> > For example, 49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d
>> >
>> > should produce
>> > SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
>> >
>> > I'm confused by the instruction: "Always operate on raw bytes, never on encoded strings. Only use hex and base64 for pretty-printing."
>> My guess is that they meant to emphasize that the solution should base64-encode
>> the bytes decoded from hex encoding rather than hex-encoded bytes themselves.
>> This seems like stating the obvious; but is it possible some takers just jumped
>> into 64-bit encoding and forgot to hex-decode?
>> >
>> > What does "raw bytes" mean in terms of the input/output parameters.
>> > Presumably it means I shouldn't have a const std::string& as input
>> > and a std::string as output?
>> >
>> > Does anyone know what it means to translate hex to base 64 "by operating
>> > on raw bytes" in a C++ context?
>> >
>> > Thank you,
>> >
>> > Paul
>> >
>>
>> HTH
>> -Pavel
>
> Ok, the following code should satisfy requirements but it hasn't been
> extensively tested. Feedback is welcome. I decided to code from
> scratch without using library functions.

What library functions? You use plenty of the standard library (not
doing so would be crazy) but of course you don't use someone else's
Base64 encoder.

> // Problem is https://cryptopals.com/sets/1/challenges/1

I still don't understand it, but I accept your interpretation that the
input is a string of hex digits, even though that's problematic (see
below).

> //49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d
> // should produce
> // SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t

That just repeats what main() says better.

...

Your hex decoder below is one reason I don't like the interpretation
of the exercise: you have to do tedious input validation and error
handling. "Hello world!" isn't a hex string. "Abc" is probably a typo
rather than 1 1/2 byte. "01 f0 ff" is a hex string that's
human-readable, but you don't handle that one well.

For reference, this is such a function I've written, and used a lot.
I pretty much need all of the documented features for it to be useful
in practice.

/**
* Decode [begin .. end) from a hex dump (e.g. "f0 00 ba 12")
* into octet buffer 'buf', which is assumed to be large enough.
*
* Tolerated input is hex digits and whitespace. Any amount of whitespace
* is ok, except it must not appear between nybbles:
*
* "12 34 56" - ok
* "1234 56" - also fine; same thing
* "123456" - also fine
* "123 456" - not ok; 0x12 is returned and "3 456" remains
* unencoded
*
* Returns the number of octets read, and updates 'begin' to
* the first undecoded character much like strtoul(3) does.
*/
size_t hexread(uint8_t* const buf,
const char** begin, const char* const end);

> std::vector<int> hex(const std::string& hexString,
> std::unordered_map<char, int> hexMap = buildHexMap())

Why would you ever want to pass in a different "hexmap"?

[snip]

/Jorgen

Juha Nieminen

unread,

Nov 5, 2018, 3:22:02 PM11/5/18

to

Alf P. Steinbach <alf.p.stein...@gmail.com> wrote:
> Are you sure that CHAR_BIT, the number of bits per byte, equals 8?

Does any computer system where CHAR_BIT isn't 8 even running
anymore?

woodb...@gmail.com

unread,

Nov 5, 2018, 3:51:08 PM11/5/18

to

On Saturday, November 3, 2018 at 7:21:47 PM UTC-5, Ben Bacarisse wrote:
> Paul <peps...@gmail.com> writes:
>
> > A cryptopals.com problem involves translating hex to base 64.
> > For example, 49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d
> >
> > should produce
> > SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
> >
> > I'm confused by the instruction: "Always operate on raw bytes, never
> > on encoded strings. Only use hex and base64 for pretty-printing."
>
> It's probably because the site is language agnostic. You really would
> care what this mean if you were using, say, Python.
>
> > What does "raw bytes" mean in terms of the input/output parameters.
> > Presumably it means I shouldn't have a const std::string& as input
> > and a std::string as output?
>
> I disagree with the advice you've had that std::string is OK for this
> sort of work. You might get away with it for this first task, but zero
> bytes can be a problem in std::string objects.

I agree. Just because a tech giant does something
doesn't mean they know what they are doing.
https://duckduckgo.com is proving that everyday, right?

Brian
Ebenezer Enterprises
https://github.com/Ebenezer-group/onwards

Scott Lurndal

unread,

Nov 5, 2018, 3:52:26 PM11/5/18

to

Yes. The Unisys clearpath dorado systems decended from the Sperry/Univac 1100
come to mind immediately.

woodb...@gmail.com

unread,

Nov 5, 2018, 4:16:07 PM11/5/18

to

I think for servers. desktops and phones CHAR_BIT is
almost always 8, but embedded devices are another story.

Brian
Ebenezer Enterprises
http://webEbenezer.net

Juha Nieminen

unread,

Nov 6, 2018, 2:13:45 AM11/6/18

to

Does it even have a modern C++ compiler? Does anybody even know how to use
such a thing?

David Brown

unread,

Nov 6, 2018, 4:19:16 AM11/6/18

to

There are three kinds of systems that have CHAR_BIT > 8 :

1. Dinosaur mainframes. There are not many of these still in action,
but they do exist. I don't know if there is much C++ on such systems -
COBOL is more likely.

2. Very niche processors and DSP's with very odd byte sizes. If you are
not in the business of custom ASIC systems, you will never come across
them - and your code will never be used on them. Even in the embedded
world, these things are unusual. You'll be using a specialised C
compiler or assembly, not C++.

3. General purpose DSP's with 16-bit or 32-bit char, like TI TMS320F2xxx
or Analog Devices SHARC. I believe some of these can be programmed in
C++ these days, but they have fallen out of style for anything but
specialised DSP applications like audio or video codecs.

I think it is fair to say that if your code would ever be running on
something that has CHAR_BIT not equal to 8, you would know it.

Alf P. Steinbach

unread,

Nov 6, 2018, 6:27:31 AM11/6/18

to

Embedded computing stuff. E.g. Texas Instruments digital signal processors.

Cheers!,

- Alf

Juha Nieminen

unread,

Nov 6, 2018, 8:33:55 AM11/6/18

to

David Brown <david...@hesbynett.no> wrote:
> I think it is fair to say that if your code would ever be running on
> something that has CHAR_BIT not equal to 8, you would know it.

That being said, the way I presented the solution would still work
even if CHAR_BIT is larger than 8. (Every element of the vector is
handled as it if were 8 bits in size, but there's nothing that would
break if it were larger.)

It would only malfunction if CHAR_BIT is less than 8. The assumption
I made in the solution I presented is that chars are *at least* 8 bits
in size.

Scott Lurndal

unread,

Nov 6, 2018, 9:02:03 AM11/6/18

to

1) Unknown, but unlikely. Does have a C compiler IIRC.
2) Yes, of course. They're still running production for many large
companies (including airline reservation systems).

David Brown

unread,

Nov 6, 2018, 9:03:57 AM11/6/18

to

I hadn't actually read that bit of the thread!

CHAR_BIT is /always/ at least 8, so you are safe there.

Scott Lurndal

unread,

Nov 6, 2018, 10:18:37 AM11/6/18

to

Wasn't always. 6-bit characters were very common (DEC, Burroughs) a few
decades ago (e.g. at the time C was developed).

Manfred

unread,

Nov 6, 2018, 1:55:39 PM11/6/18

to

But the ISO standard requires CHAR_BIT to be equal or greater than 8.

Robert Wessel

unread,

Nov 7, 2018, 10:18:22 AM11/7/18

to

IFAIK, it does not. It does have a C89 compiler.

There are also some DSPs still out there with CHAR_BIT as 16 or 32.

James Kuyper

unread,

Nov 7, 2018, 9:48:20 PM11/7/18

to

On 11/6/18 10:18, Scott Lurndal wrote:
> David Brown <david...@hesbynett.no> writes:

....

>> CHAR_BIT is /always/ at least 8, so you are safe there.
>>
>
> Wasn't always. 6-bit characters were very common (DEC, Burroughs) a few
> decades ago (e.g. at the time C was developed).

That's only relevant to the claim above if there was an implementation
of <limits.h> for those machines which had CHAR_BIT < 8.

Any fully conforming implementation of any version of C must have a
<limits.h> which specifies CHAR_BIT >= 8. That's been a requirement for
as long as there has been a C standard.