C++ way to convert ASCII digits to Integer?

Peter Olcott

unread,

May 26, 2009, 9:18:55 PM5/26/09

to

I remember that there is a clean C++ way to do this, but, I
forgot what it was.

andreas....@googlemail.com

unread,

May 26, 2009, 9:30:01 PM5/26/09

to

On May 27, 9:18 am, "Peter Olcott" <NoS...@SeeScreen.com> wrote:
> I remember that there is a clean C++ way to do this, but, I
> forgot what it was.

I don't know what you mean by 'clean C++ way' but one way to do it is:
int ascii_digit_to_int ( const char asciidigit ) {
if ( asciidigit < '0' ||
asciidigit > '9' ) {
throw NotADigitException();
}
return (int) asciidigit - 48; // 48 => '0'
}
Or you can use atoi or similar.
Or you use the std::stringstream:

std::stringstream sstr("3");
int value;
sstr >> value;

Hope that helps
Andreas

Peter Olcott

unread,

May 26, 2009, 10:04:58 PM5/26/09

to

Yes that was it, and here are the two functions that I
derived from this:

#include <string>
#include <sstream>

int stringToInteger(std::string s)
{
int Integer;
std::stringstream sstr;
sstr << s;
sstr >> Integer;
return Integer;
}

double stringToDouble(std::string s)
{
double Double;
std::stringstream sstr;
sstr << s;
sstr >> Double;
return Double;
}

<andreas....@googlemail.com> wrote in message
news:5bb33290-694a-4942...@x1g2000prh.googlegroups.com...

Sam

unread,

May 26, 2009, 10:27:56 PM5/26/09

to

Peter Olcott writes:

> Yes that was it, and here are the two functions that I
> derived from this:
>
> #include <string>
> #include <sstream>
>
> int stringToInteger(std::string s)
> {
> int Integer;
> std::stringstream sstr;
> sstr << s;
> sstr >> Integer;
> return Integer;
> }

Right. Except if you pass a parameter such as "foo", what you'll get,
instead, is a pseudo-random number generator.

For robustness, you must also check fail(), and do something meaningful in
that case.

> double stringToDouble(std::string s)
> {
> double Double;
> std::stringstream sstr;
> sstr << s;
> sstr >> Double;
> return Double;
> }

Ditto.

blargg

unread,

May 27, 2009, 12:07:48 AM5/27/09

to

andreas.koestler wrote:
> On May 27, 9:18�am, "Peter Olcott" <NoS...@SeeScreen.com> wrote:

> > I remember that there is a clean C++ way to do this [convert
> > ASCII digits to Integer], but, I forgot what it was.

>
> I don't know what you mean by 'clean C++ way' but one way to do it is:
>
> int ascii_digit_to_int ( const char asciidigit ) {
> if ( asciidigit < '0' ||
> asciidigit > '9' ) {
> throw NotADigitException();
> }
> return (int) asciidigit - 48; // 48 => '0'
> }
>
> Or you can use atoi or similar.
> Or you use the std::stringstream:

Not if the machine doesn't use ASCII; only a function like yours above
is fully portable. But perhaps the original poster wanted a function
that would convert from the host's native textual representation to an
integer, in which case the above function would not be a good idea, and
atoi or stringstream would.

andreas....@googlemail.com

unread,

May 27, 2009, 3:26:38 AM5/27/09

to

> Not if the machine doesn't use ASCII; only a function like yours above
> is fully portable. But perhaps the original poster wanted a function
> that would convert from the host's native textual representation to an
> integer, in which case the above function would not be a good idea, and
> atoi or stringstream would.

Blargg, please explain... :)

Bart van Ingen Schenau

unread,

May 27, 2009, 3:06:37 AM5/27/09

to

andreas....@googlemail.com wrote:

> On May 27, 9:18 am, "Peter Olcott" <NoS...@SeeScreen.com> wrote:
>> I remember that there is a clean C++ way to do this, but, I
>> forgot what it was.
>
> I don't know what you mean by 'clean C++ way' but one way to do it is:
> int ascii_digit_to_int ( const char asciidigit ) {
> if ( asciidigit < '0' ||
> asciidigit > '9' ) {
> throw NotADigitException();
> }
> return (int) asciidigit - 48; // 48 => '0'

To make the function usable also on non-ASCII implementations, and to
remove the need for a comment, you should write that last line as:
return asciidigit - '0';

> }
> Or you can use atoi or similar.

Better use strtol rather than atoi. It provides much better behaviour in
error situations.

> Or you use the std::stringstream:
>
> std::stringstream sstr("3");
> int value;
> sstr >> value;

Or you use boost::lexical_cast<> (which uses stringstreams internally,
but with proper error handling).

>
> Hope that helps
> Andreas

Bart v Ingen Schenau
--
a.c.l.l.c-c++ FAQ: http://www.comeaucomputing.com/learn/faq
c.l.c FAQ: http://c-faq.com/
c.l.c++ FAQ: http://www.parashift.com/c++-faq-lite/

Pascal J. Bourguignon

unread,

May 27, 2009, 4:51:44 AM5/27/09

to

"andreas....@googlemail.com" <andreas....@googlemail.com> writes:

> From: blargg <blarg...@gishpuppy.com>
> andreas.koestler wrote:
> > On May 27, 9:18�am, "Peter Olcott" <NoS...@SeeScreen.com> wrote:
> > > I remember that there is a clean C++ way to do this [convert
> > > ASCII digits to Integer], but, I forgot what it was.
> >
> > I don't know what you mean by 'clean C++ way' but one way to do it is:
> >
> > int ascii_digit_to_int ( const char asciidigit ) {
> > if ( asciidigit < '0' ||
> > asciidigit > '9' ) {
> > throw NotADigitException();
> > }
> > return (int) asciidigit - 48; // 48 => '0'
> > }
> >
> > Or you can use atoi or similar.
> > Or you use the std::stringstream:

Actually Blargg's code is wrong. On a machine using EBCDIC '0' = 248, not 48.

#include <iso646.h>

struct ASCII {
enum ASCII {
NUL = 0, SOH, STX, ETX, EOT, ENQ, ACK, BELL, BACKSPACE, TAB,
NEWLINE, VT, PAGE, RETURN, SO, SI, DLE, DC1, DC2, DC3, DC4, NAK,
SYN, ETB, CAN, EM, SUB, ESCAPE, FS, GS, RS, US, SPACE,
EXCLAMATION_MARK, QUOTATION_MARK, NUMBER_SIGN, DOLLAR_SIGN,
PERCENT_SIGN, AMPERSAND, APOSTROPHE, LEFT_PARENTHESIS,
RIGHT_PARENTHESIS, ASTERISK, PLUS_SIGN, COMMA, HYPHEN_MINUS,
FULL_STOP, SOLIDUS, DIGIT_ZERO, DIGIT_ONE, DIGIT_TWO,
DIGIT_THREE, DIGIT_FOUR, DIGIT_FIVE, DIGIT_SIX, DIGIT_SEVEN,
DIGIT_EIGHT, DIGIT_NINE, COLON, SEMICOLON, LESS_THAN_SIGN,
EQUALS_SIGN, GREATER_THAN_SIGN, QUESTION_MARK, COMMERCIAL_AT,
LATIN_CAPITAL_LETTER_A, LATIN_CAPITAL_LETTER_B,
LATIN_CAPITAL_LETTER_C, LATIN_CAPITAL_LETTER_D,
LATIN_CAPITAL_LETTER_E, LATIN_CAPITAL_LETTER_F,
LATIN_CAPITAL_LETTER_G, LATIN_CAPITAL_LETTER_H,
LATIN_CAPITAL_LETTER_I, LATIN_CAPITAL_LETTER_J,
LATIN_CAPITAL_LETTER_K, LATIN_CAPITAL_LETTER_L,
LATIN_CAPITAL_LETTER_M, LATIN_CAPITAL_LETTER_N,
LATIN_CAPITAL_LETTER_O, LATIN_CAPITAL_LETTER_P,
LATIN_CAPITAL_LETTER_Q, LATIN_CAPITAL_LETTER_R,
LATIN_CAPITAL_LETTER_S, LATIN_CAPITAL_LETTER_T,
LATIN_CAPITAL_LETTER_U, LATIN_CAPITAL_LETTER_V,
LATIN_CAPITAL_LETTER_W, LATIN_CAPITAL_LETTER_X,
LATIN_CAPITAL_LETTER_Y, LATIN_CAPITAL_LETTER_Z,
LEFT_SQUARE_BRACKET, REVERSE_SOLIDUS, RIGHT_SQUARE_BRACKET,
CIRCUMFLEX_ACCENT, LOW_LINE, GRAVE_ACCENT, LATIN_SMALL_LETTER_A,
LATIN_SMALL_LETTER_B, LATIN_SMALL_LETTER_C, LATIN_SMALL_LETTER_D,
LATIN_SMALL_LETTER_E, LATIN_SMALL_LETTER_F, LATIN_SMALL_LETTER_G,
LATIN_SMALL_LETTER_H, LATIN_SMALL_LETTER_I, LATIN_SMALL_LETTER_J,
LATIN_SMALL_LETTER_K, LATIN_SMALL_LETTER_L, LATIN_SMALL_LETTER_M,
LATIN_SMALL_LETTER_N, LATIN_SMALL_LETTER_O, LATIN_SMALL_LETTER_P,
LATIN_SMALL_LETTER_Q, LATIN_SMALL_LETTER_R, LATIN_SMALL_LETTER_S,
LATIN_SMALL_LETTER_T, LATIN_SMALL_LETTER_U, LATIN_SMALL_LETTER_V,
LATIN_SMALL_LETTER_W, LATIN_SMALL_LETTER_X, LATIN_SMALL_LETTER_Y,
LATIN_SMALL_LETTER_Z, LEFT_CURLY_BRACKET, VERTICAL_LINE,
RIGHT_CURLY_BRACKET, TILDE, RUBOUT
}}

int ascii_digit_to_int ( const char asciidigit ) {

if((asciidigit<ASCII.DIGIT_ZERO)or(ASCII.DIGIT_NINE<asciidigit)){
throw NotADigitException();
}else{
return((int)asciidigit - ASCII.DIGIT_ZERO);
}
}

--
__Pascal Bourguignon__

James Kanze

unread,

May 27, 2009, 6:04:12 AM5/27/09

to

On May 27, 3:30 am, "andreas.koest...@googlemail.com"

<andreas.koest...@googlemail.com> wrote:
> On May 27, 9:18 am, "Peter Olcott" <NoS...@SeeScreen.com> wrote:

> > I remember that there is a clean C++ way to do this, but, I
> > forgot what it was.

> I don't know what you mean by 'clean C++ way' but one way to
> do it is: int ascii_digit_to_int ( const char asciidigit ) {
> if ( asciidigit < '0' ||
> asciidigit > '9' ) {
> throw NotADigitException();
> }
> return (int) asciidigit - 48; // 48 => '0'

That's wrong. There's no guarantee that '0' is 48. I've worked
on machines where it is 240. (Of course, the term asciidigit is
very misleading on such machines, because you're really dealing
with an ebcdicdigit.)

You are guaranteed that the decimal digits are consecutive, so
digit - '0' works. Of course, as soon as you do that, someone
will ask for support for hexadecimal. The simplest solution is
just to create a table, correctly initialize it, and then:

return table[ digit ] < 0
? throw NotADigitException()
: table[ digit ] ;
> }

> Or you can use atoi or similar.
> Or you use the std::stringstream:

> std::stringstream sstr("3");
> int value;
> sstr >> value;

That's the normal way of converting a stream of digits into a
number in internal format.

--
James Kanze (GABI Software) email:james...@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Michael Doubez

unread,

May 27, 2009, 7:12:59 AM5/27/09

to

On 27 mai, 12:04, James Kanze <james.ka...@gmail.com> wrote:
> On May 27, 3:30 am, "andreas.koest...@googlemail.com"
>
> <andreas.koest...@googlemail.com> wrote:
> > On May 27, 9:18 am, "Peter Olcott" <NoS...@SeeScreen.com> wrote:
> > > I remember that there is a clean C++ way to do this, but, I
> > > forgot what it was.

[snip]

> > Or you can use atoi or similar.

atoi cannot report errors and should be avoided if the format has not
been previously validated.

> > Or you use the std::stringstream:
> > std::stringstream sstr("3");
> > int value;
> > sstr >> value;
>
> That's the normal way of converting a stream of digits into a
> number in internal format.

An equivalent indigest way to do it is to use directly the num_get()
member of the facet:

using namespace std;
locale loc;
ios::iostate state;
int value;
istream is;
string number ("3");

use_facet<num_get<char,string::iterator> >(loc).get (
number.begin(),number.end(),
is,state,
value
);

You can modify the formatting of the number by setting fmtflags in
'is'.

You can wrap it in something like the numeric_cast<>() of Boost.

--
Michael

blargg

unread,

May 27, 2009, 3:05:19 PM5/27/09

to

blargg wrote:
> andreas.koestler wrote:
> > On May 27, 9:18�am, "Peter Olcott" <NoS...@SeeScreen.com> wrote:
> > > I remember that there is a clean C++ way to do this [convert
> > > ASCII digits to Integer], but, I forgot what it was.
> >
> > I don't know what you mean by 'clean C++ way' but one way to do it is:
> >
> > int ascii_digit_to_int ( const char asciidigit ) {
> > if ( asciidigit < '0' ||
> > asciidigit > '9' ) {
> > throw NotADigitException();
> > }
> > return (int) asciidigit - 48; // 48 => '0'
> > }

[...]

> Not if the machine doesn't use ASCII; only a function like yours above
> is fully portable.

Whoops, that's wrong too, as the above function uses '0' and '9', which
won't be ASCII on a non-ASCII machine. So the above should really use 48
and 57 in place of those character constants, to live up to its name.
Otherwise, on a machine using ASCII, it'll work, but on another, it'll be
broken and neither convert from ASCII nor the machine's native character
set!

Default User

unread,

May 27, 2009, 6:42:04 PM5/27/09

to

blargg wrote:

The requirements for numerals in the character set specify that the
values be consecutive and in increasing value.

So digit - '0' will always give you the numeric value of the numeral
the character represents, regardless of whether it is ASCII or not. The
same is not true for digit - 48.

The original problem specified conversion from ASCII, but that's not
likely what the OP really wanted. If so, then a preliminary step to
convert to ASCII could be performed, but that's probably not what was
really desired.

Brian

blargg

unread,

May 28, 2009, 1:27:25 AM5/28/09

to

And for ASCII the numerals are fixed at 48 through 57, also consecutive
and increasing.

> So digit - '0' will always give you the numeric value of the numeral
> the character represents, regardless of whether it is ASCII or not. The
> same is not true for digit - 48.

But the function is to convert from ASCII to an integer. The input WILL
always be ASCII (or else the caller has violated the contract). It
should not subtract '0', as that would break the function on a non-ASCII
machine.

> The original problem specified conversion from ASCII, but that's not
> likely what the OP really wanted.

But that's what ascii_digit_to_int should implement, or else at the very
least it's named wrong.

> If so, then a preliminary step to convert to ASCII could be performed,

[...]

Or maybe the input data are known to always be ASCII. This is very
common when parsing binary file formats which embed text data.

Pete Becker

unread,

May 28, 2009, 7:59:14 AM5/28/09

to

Default User wrote:
>
> The original problem specified conversion from ASCII, but that's not
> likely what the OP really wanted.

If you write code that you think your boss wants instead of the code
your boss said he wants you won't last long in your job. If you think
the specification is wrong, ask the person who is responsible for it.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of
"The Standard C++ Library Extensions: a Tutorial and Reference"
(www.petebecker.com/tr1book)

Default User

unread,

May 28, 2009, 12:05:34 PM5/28/09

to

Pete Becker wrote:

> Default User wrote:
> >
> > The original problem specified conversion from ASCII, but that's not
> > likely what the OP really wanted.
>
> If you write code that you think your boss wants instead of the code
> your boss said he wants you won't last long in your job. If you think
> the specification is wrong, ask the person who is responsible for it.

That really depends on circumstances. For the most part in my career as
a software engineer, supervisors have not created specifications to
that level.

Brian

Pete Becker

unread,

May 28, 2009, 1:11:29 PM5/28/09

to

If you ignore a specification because you think it's wrong you're simply
wrong. If you're winging it without specifications you have a completely
different set of issues.

Default User

unread,

May 28, 2009, 2:59:06 PM5/28/09

to

Pete Becker wrote:

> Default User wrote:
> > Pete Becker wrote:

> > That really depends on circumstances. For the most part in my
> > career as a software engineer, supervisors have not created
> > specifications to that level.
>
> If you ignore a specification because you think it's wrong you're
> simply wrong. If you're winging it without specifications you have a
> completely different set of issues.

I'm not following. My superviors have not typically set requirements to
that level. That is to say, the broad strokes of task are set, and the
engineers will define and implement the lower-level requirements.

Brian

Drew Lawson

unread,

May 28, 2009, 3:06:40 PM5/28/09

to

In article <78857qF...@mid.individual.net>

I take it that Pete's intrpretation is that, if your supervisor
says "ASCII," then ASCII is a carved-in-stone requirement that you
need to meet. My take, and possibly yours, is that some of my
supervisors (like maky Usenet posters) would say "ASCII" but mean
"text."

Having the code handle what was said, but not what was intended
wouldn't (in my experience) save my job.

Pete Becker

unread,

May 28, 2009, 3:32:58 PM5/28/09

to

And, presumably, if you feel that the broad strokes are wrong you
discuss them with whoever set them out rather than ignoring them and
doing what you think is right.

The original question was about parsing ASCII characters representing
numbers. Someone responded that that's probably not what was wanted (and
I agree), and wrote code that did something different. The point I made
was simply that writing different code from what was asked for is not
the right approach. If you think what was asked for (regardless of the
"level" of what was asked for) is wrong, point out why you think it's
wrong and ask for clarification.

Pete Becker

unread,

May 28, 2009, 3:34:38 PM5/28/09

to

Drew Lawson wrote:
> In article <78857qF...@mid.individual.net>
> "Default User" <defaul...@yahoo.com> writes:
>> Pete Becker wrote:
>>
>>> Default User wrote:
>>>> Pete Becker wrote:
>>>> That really depends on circumstances. For the most part in my
>>>> career as a software engineer, supervisors have not created
>>>> specifications to that level.
>>> If you ignore a specification because you think it's wrong you're
>>> simply wrong. If you're winging it without specifications you have a
>>> completely different set of issues.
>> I'm not following. My superviors have not typically set requirements to
>> that level. That is to say, the broad strokes of task are set, and the
>> engineers will define and implement the lower-level requirements.
>
> I take it that Pete's intrpretation is that, if your supervisor
> says "ASCII," then ASCII is a carved-in-stone requirement that you
> need to meet. My take, and possibly yours, is that some of my
> supervisors (like maky Usenet posters) would say "ASCII" but mean
> "text."
>

No, that's obvious nonsense. What I said is that if you think ASCII is
wrong, ask about it.

Pete Becker

unread,

May 28, 2009, 3:43:43 PM5/28/09

to

Pete Becker wrote:
> Drew Lawson wrote:
>> In article <78857qF...@mid.individual.net>
>> "Default User" <defaul...@yahoo.com> writes:
>>> Pete Becker wrote:
>>>
>>>> Default User wrote:
>>>>> Pete Becker wrote:
>>>>> That really depends on circumstances. For the most part in my
>>>>> career as a software engineer, supervisors have not created
>>>>> specifications to that level.
>>>> If you ignore a specification because you think it's wrong you're
>>>> simply wrong. If you're winging it without specifications you have a
>>>> completely different set of issues.
>>> I'm not following. My superviors have not typically set requirements to
>>> that level. That is to say, the broad strokes of task are set, and the
>>> engineers will define and implement the lower-level requirements.
>>
>> I take it that Pete's intrpretation is that, if your supervisor
>> says "ASCII," then ASCII is a carved-in-stone requirement that you
>> need to meet. My take, and possibly yours, is that some of my
>> supervisors (like maky Usenet posters) would say "ASCII" but mean
>> "text."
>>
>
> No, that's obvious nonsense. What I said is that if you think ASCII is
> wrong, ask about it.
>

Sorry, dangling reference. "that's obvious nonsense" refers to
"carved-in-stone".

blargg

unread,

May 28, 2009, 9:13:18 PM5/28/09

to

Pete Becker wrote:
> Drew Lawson wrote:

> > Default User writes:
> >> Pete Becker wrote:
> >>> Default User wrote:
> >>>> Pete Becker wrote:
> >>>> That really depends on circumstances. For the most part in my
> >>>> career as a software engineer, supervisors have not created
> >>>> specifications to that level.
> >>> If you ignore a specification because you think it's wrong you're
> >>> simply wrong. If you're winging it without specifications you have a
> >>> completely different set of issues.
> >> I'm not following. My superviors have not typically set requirements to
> >> that level. That is to say, the broad strokes of task are set, and the
> >> engineers will define and implement the lower-level requirements.
> >
> > I take it that Pete's intrpretation is that, if your supervisor
> > says "ASCII," then ASCII is a carved-in-stone requirement that you
> > need to meet. My take, and possibly yours, is that some of my
> > supervisors (like maky Usenet posters) would say "ASCII" but mean
> > "text."
>
> No, that's obvious nonsense. What I said is that if you think ASCII is
> wrong, ask about it.

And by asking, you avoid taking things into your own hands, help them use
the correct terminology in the future (if text is what they really meant),
and show that you are paying attention. I wouldn't want to work with
someone who went and implemented something differently than what we agreed
on, just because he thought he knew better (or thought it was wrong but
went and implemented it without mentioning the apparent problem to me).

James Kanze

unread,

May 29, 2009, 4:36:34 AM5/29/09

to

On May 28, 1:59 pm, Pete Becker <p...@versatilecoding.com> wrote:
> Default User wrote:

> > The original problem specified conversion from ASCII, but
> > that's not likely what the OP really wanted.

> If you write code that you think your boss wants instead of
> the code your boss said he wants you won't last long in your
> job.

In most places I've worked, writing what the boss wants, rather
than what he says, is good for your career. Of course...

> If you think the specification is wrong, ask the person
> who is responsible for it.

If you think that what he is actually asking for is not what he
wants, you are better off asking, just to be sure.

In the end, it depends on the boss.

And in this case, Default User is probably right: far too many
newbies use "ascii" to mean text. (Given that ASCII is for all
intents and purposes dead, it's highly unlikely that they really
want ASCII.)

Gerhard Fiedler

unread,

May 29, 2009, 9:08:28 AM5/29/09

to

James Kanze wrote:

> (Given that ASCII is for all intents and purposes dead, it's highly
> unlikely that they really want ASCII.)

I'm not sure, but I think in the USA there is quite a number of
programmers who don't think beyond ASCII when thinking of text
manipulation.

Gerhard

Joe Greer

unread,

May 29, 2009, 11:04:34 AM5/29/09

to

Gerhard Fiedler <gel...@gmail.com> wrote in news:2ijwirmpswzq
$.d...@gelists.gmail.com:

I believe you are correct for shops developing in house software or
possibly just using text for debugging/logging (that is, internal use
only). However, the last couple of places I worked were certainly at least
trying to develop for an international market. :)

joe

James Kanze

unread,

May 31, 2009, 5:25:55 AM5/31/09

to

In just about every country, there are quite a number of
programmers who don't think:-). The fact remains that the
default encoding used by the system, even when configured for
the US, is not ASCII. Even if you're not "thinking" beyond
ASCII, your program must be capable of reading non-ASCII
characters (if only to recognize them and signal the error).

osmium

unread,

May 31, 2009, 10:22:07 AM5/31/09

to

James Kanze wrote:

> On May 29, 3:08 pm, Gerhard Fiedler <geli...@gmail.com> wrote:
>> James Kanze wrote:
>>> (Given that ASCII is for all intents and purposes dead, it's
>>> highly unlikely that they really want ASCII.)
>
>> I'm not sure, but I think in the USA there is quite a number
>> of programmers who don't think beyond ASCII when thinking of
>> text manipulation.
>
> In just about every country, there are quite a number of
> programmers who don't think:-). The fact remains that the
> default encoding used by the system, even when configured for
> the US, is not ASCII. Even if you're not "thinking" beyond
> ASCII, your program must be capable of reading non-ASCII
> characters (if only to recognize them and signal the error).

Is it your point that an ASCII compliant environment would have to signal an
error if the topmost bit in a byte was something other than 0? Or do you
have something else in mind? I don't have the *actual* ASCII standard
available but I would be surprised if that was expressed as a *requirement*.
After all, the people that wrote the standard were well aware that there was
no such thing as a seven-bit machine.

Alf P. Steinbach

unread,

May 31, 2009, 10:52:01 AM5/31/09

to

* osmium:

> James Kanze wrote:
>
>> On May 29, 3:08 pm, Gerhard Fiedler <geli...@gmail.com> wrote:
>>> James Kanze wrote:
>>>> (Given that ASCII is for all intents and purposes dead, it's
>>>> highly unlikely that they really want ASCII.)
>>> I'm not sure, but I think in the USA there is quite a number
>>> of programmers who don't think beyond ASCII when thinking of
>>> text manipulation.
>> In just about every country, there are quite a number of
>> programmers who don't think:-). The fact remains that the
>> default encoding used by the system, even when configured for
>> the US, is not ASCII. Even if you're not "thinking" beyond
>> ASCII, your program must be capable of reading non-ASCII
>> characters (if only to recognize them and signal the error).
>
> Is it your point that an ASCII compliant environment would have to signal an
> error if the topmost bit in a byte was something other than 0?

I think James is perhaps referring to routines like isdigit family.

Some of them take int argument and have UB if the argument value is outside
0...(unsigned char)(-1).

So with most implementations you get UB if you simply pass a char directly as
argument and that char is beyond ASCII range, because then it will be negative.

> Or do you
> have something else in mind? I don't have the *actual* ASCII standard
> available but I would be surprised if that was expressed as a *requirement*.

See above.

> After all, the people that wrote the standard were well aware that there was
> no such thing as a seven-bit machine.

On the contrary, the seven bit nature of ASCII was to facilitate communication
over e.g. serial links with software parity check, where each byte was
effectively seven bits (since one bit was used for parity).

Cheers & hth.,

- Alf

--
Due to hosting requirements I need visits to <url: http://alfps.izfree.com/>.
No ads, and there is some C++ stuff! :-) Just going there is good. Linking
to it is even better! Thanks in advance!

Alf P. Steinbach

unread,

May 31, 2009, 10:54:13 AM5/31/09

to

* Alf P. Steinbach:

Forgot to add, one of the early PDPs had, as I recall, configurable byte size... :-)

James Kanze

unread,

Jun 1, 2009, 7:03:55 AM6/1/09

to

On May 31, 4:22 pm, "osmium" <r124c4u...@comcast.net> wrote:
> James Kanze wrote:
> > On May 29, 3:08 pm, Gerhard Fiedler <geli...@gmail.com> wrote:
> >> James Kanze wrote:
> >>> (Given that ASCII is for all intents and purposes dead, it's
> >>> highly unlikely that they really want ASCII.)

> >> I'm not sure, but I think in the USA there is quite a number
> >> of programmers who don't think beyond ASCII when thinking of
> >> text manipulation.

> > In just about every country, there are quite a number of
> > programmers who don't think:-). The fact remains that the
> > default encoding used by the system, even when configured for
> > the US, is not ASCII. Even if you're not "thinking" beyond
> > ASCII, your program must be capable of reading non-ASCII
> > characters (if only to recognize them and signal the error).

> Is it your point that an ASCII compliant environment would
> have to signal an error if the topmost bit in a byte was
> something other than 0?

My point is that the actual bytes you'll be reading may contain
non-ASCII characters, whether you like it or not, and that your
program has to handle them in order to be correct. (Of course,
lots of programs limit their input. None of my programs which
deal with text, for example, allow control characters like STX
or DC1; such a character in the input will trigger an error. As
will an illegal UTF-8 sequence, if the program is inputting
UTF-8.)

> Or do you have something else in mind? I don't have the
> *actual* ASCII standard available but I would be surprised if
> that was expressed as a *requirement*. After all, the people
> that wrote the standard were well aware that there was no such
> thing as a seven-bit machine.

ASCII defined code points in the range 0-127. Any other value
is not ASCII. (And the usual arrangement on a PDP-10 was 5
seven bit bytes in a 36 bit word, with one bit left over.)

James Kanze

unread,

Jun 1, 2009, 7:10:43 AM6/1/09

to

On May 31, 4:52 pm, "Alf P. Steinbach" <al...@start.no> wrote:
> * osmium:
> > James Kanze wrote:

> >> On May 29, 3:08 pm, Gerhard Fiedler <geli...@gmail.com> wrote:
> >>> James Kanze wrote:
> >>>> (Given that ASCII is for all intents and purposes dead, it's
> >>>> highly unlikely that they really want ASCII.)
> >>> I'm not sure, but I think in the USA there is quite a number
> >>> of programmers who don't think beyond ASCII when thinking of
> >>> text manipulation.
> >> In just about every country, there are quite a number of
> >> programmers who don't think:-). The fact remains that the
> >> default encoding used by the system, even when configured for
> >> the US, is not ASCII. Even if you're not "thinking" beyond
> >> ASCII, your program must be capable of reading non-ASCII
> >> characters (if only to recognize them and signal the error).

> > Is it your point that an ASCII compliant environment would
> > have to signal an error if the topmost bit in a byte was
> > something other than 0?

> I think James is perhaps referring to routines like isdigit
> family.

> Some of them take int argument and have UB if the argument
> value is outside 0...(unsigned char)(-1).

s/Some/All/

The standard says 0...UCHAR_MAX or EOF. But UCHAR_MAX and
(unsigned char)(-1) are, of course, guaranteed to be equal. And
EOF is guaranteed to be negative, so there can never be any
ambiguity between one of the characters and EOF.

> So with most implementations you get UB if you simply pass a
> char directly as argument and that char is beyond ASCII range,
> because then it will be negative.

That is, of course, something that you always have to consider.
The "official" answer, in C++, is to use the corresponding
functions in <locale>. Which have been carefully designed to be
even more verbose than the C function with the cast, and to run
several orders of magnitude slower. (But other than that,
they're fine.)

> > After all, the people that wrote the standard were well
> > aware that there was no such thing as a seven-bit machine.

> On the contrary, the seven bit nature of ASCII was to
> facilitate communication over e.g. serial links with software
> parity check, where each byte was effectively seven bits
> (since one bit was used for parity).

I'm not sure what the original rationale was. One mustn't
forget that at the time, six bit codes were quite commun.
Moving to seven bits probably seemed to be the minimum solution
to support both upper case and lower. And that given the
transmission speeds at the time (110 baud for a teletype), every
bit gained helped. But the fact that you could put the
character with parity into an octet was probably a consideration
as well.

James Kanze

unread,

Jun 1, 2009, 7:14:32 AM6/1/09

to

On May 31, 4:54 pm, "Alf P. Steinbach" <al...@start.no> wrote:
> * Alf P. Steinbach:

> Forgot to add, one of the early PDPs had, as I recall,
> configurable byte size... :-)

Programmable. The PDP-10 was word addressed, with special
instructions to access bytes. The "byte address" used some of
the high order bits to specify the bit offset of the byte in the
word, and the number of bits in the byte. Incrementing a "byte
pointer" added the number of bits to the bit offset, and if the
result was more than 36-number of bits, incremented base address
and set the bit offset to 0.

The fact that the bit offset was in the high bits led to another
interesting effect:
assert( (unsigned)( p + 1 ) > (unsigned)( p ) ) ;
would often fail if p was a char*.

Alf P. Steinbach

unread,

Jun 1, 2009, 8:09:37 AM6/1/09

to

* James Kanze:

It's the first time I can say we're on completely same wavelength. He he. :-)

>>> After all, the people that wrote the standard were well
>>> aware that there was no such thing as a seven-bit machine.
>
>> On the contrary, the seven bit nature of ASCII was to
>> facilitate communication over e.g. serial links with software
>> parity check, where each byte was effectively seven bits
>> (since one bit was used for parity).
>
> I'm not sure what the original rationale was. One mustn't
> forget that at the time, six bit codes were quite commun.
> Moving to seven bits probably seemed to be the minimum solution
> to support both upper case and lower. And that given the
> transmission speeds at the time (110 baud for a teletype), every
> bit gained helped. But the fact that you could put the
> character with parity into an octet was probably a consideration
> as well.

Yeah, I think perhaps that was 20-20 hindsight rationalization. Now that I try
to shake and rattle old memory cubes a little, they spit out some vague
recollection of using 7-bit serial comms without any parity.

Cheers,

Pascal J. Bourguignon

unread,

Jun 2, 2009, 4:19:25 AM6/2/09

to

"osmium" <r124c...@comcast.net> writes:

ASCII is a encoding designed for Information Interchange. The
availability of a seven-bit machine was irrelevant. But there were
indeed transfer protocols based on 7-bit data (plus 1-bit parity).

So when you considered octets holding ASCII bytes, the most
significant bit coult be always 0, always 1, or odd or even parity.

Even today, you can configure a terminal (such as xterm) to encode in
the most significant bit of an octet the Meta key, for use by programs
such as emacs, leaving only 7-bit for the ASCII code of the key
pressed. (But again, this is a transfer protocol thing, not relevant
to how emacs or any system or application encodes its characters).

Now, concerning these applications, and relevant to C and C++, the
point is that char may be signed or unsigned, and often it's signed.
Which means that non-ASCII octets are often interpreted as negative
values. Whether the application is able to handle such 'char' or not
is a matter of good practices. Very few are the programs who use
unsigned char consistently and comprehensively. Well some code
reviews occured when UTF-8 was introduced, so things have improved
slightly.

--
__Pascal Bourguignon__