Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Strange way to convert char to int (?)

0 views
Skip to first unread message

jo...@wezayzo.com

unread,
Sep 13, 2005, 4:19:59 AM9/13/05
to
I found this code-snippet in a book (Killer Game Programming in Java, O'Reilly) :

int i = ch - '0'; // We assume that ch is a digit ranging from 0 to 9.

How does this work, why/how does subtraction of
two char's result in an int ??

Pep

unread,
Sep 13, 2005, 4:23:55 AM9/13/05
to
jo...@wezayzo.com wrote:

Primitive data types. A char is a int is a char :)

jo...@wezayzo.com

unread,
Sep 13, 2005, 4:31:40 AM9/13/05
to

Ah, ok. Thanks !

Thomas Hawtin

unread,
Sep 13, 2005, 4:34:01 AM9/13/05
to

Any arithmetic on bytes, shorts, chars and ints is always done by first
widening the type to an int. For char the unsigned 16-bit unicode value
is used. Implicitly converting a character to a number was probably a
mistake in the language design, but its not about to change now.

Anyway, in most western language the characters representing '0' to '9'
have values, IIRC, 48, 49, 50, ... 57. So if you do say '1' - '0' then
that is the equivalent of 49 - 48, i.e. 1.

The code isn't the most reliable way of doing it. Unicode as a number of
ranges of numbers. Character.digit is better.

Tom Hawtin
--
Unemployed English Java programmer
http://jroller.com/page/tackline/

jo...@wezayzo.com

unread,
Sep 13, 2005, 4:49:44 AM9/13/05
to
>> I found this code-snippet in a book (Killer Game Programming in Java, O'Reilly) :
>>
>> int i = ch - '0'; // We assume that ch is a digit ranging from 0 to 9.
>>
>> How does this work, why/how does subtraction of
>> two char's result in an int ??
>
> Any arithmetic on bytes, shorts, chars and ints is always done by first
> widening the type to an int. For char the unsigned 16-bit unicode value
> is used. Implicitly converting a character to a number was probably a
> mistake in the language design, but its not about to change now.
>
> Anyway, in most western language the characters representing '0' to '9'
> have values, IIRC, 48, 49, 50, ... 57. So if you do say '1' - '0' then
> that is the equivalent of 49 - 48, i.e. 1.
>
> The code isn't the most reliable way of doing it. Unicode as a number of
> ranges of numbers. Character.digit is better.

Ok, thanks for the insight.

Roedy Green

unread,
Sep 13, 2005, 6:08:23 AM9/13/05
to

chars are automatically promoted to ints before doing arithmetic. So
are bytes. So are shorts. The JVM has a 32 bit stack and 32 bit
arithmetic only.
so '2' - '0'
becomes
50 - 48 = 2

This is a fast way of converting a single char digit to binary int.

He is computing the relative difference in the codes for "2" and "0",
which conveniently is the binary for 2 because of the logical pattern
of code assignment. See http://mindprod.com/jgloss/unicode.html

--
Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.

jo...@wezayzo.com

unread,
Sep 13, 2005, 8:37:53 AM9/13/05
to
>> I found this code-snippet in a book (Killer Game Programming in Java, O'Reilly) :
>>
>> int i = ch - '0'; // We assume that ch is a digit ranging from 0 to 9.
>>
>> How does this work, why/how does subtraction of
>> two char's result in an int ??
>
> chars are automatically promoted to ints before doing arithmetic. So
> are bytes. So are shorts. The JVM has a 32 bit stack and 32 bit
> arithmetic only.
> so '2' - '0'
> becomes
> 50 - 48 = 2
>
> This is a fast way of converting a single char digit to binary int.
>
> He is computing the relative difference in the codes for "2" and "0",
> which conveniently is the binary for 2 because of the logical pattern
> of code assignment. See http://mindprod.com/jgloss/unicode.html

Roedy, thanks.

Oliver Wong

unread,
Sep 13, 2005, 11:09:08 AM9/13/05
to
<jo...@wezayzo.com> wrote in message
news:quhdi1ho8qtk1uc1j...@4ax.com...

BTW, I think this is bad way to convert characters to the numbers they
represent. I've written about a safer alternative on my blog at
http://nebupookins.net/entry.php?id=260 which correctly converts the unicode
characters for Roman numerals and Chinese/Japanese characters to the integer
they represent, for example.

I see from the book title that this is for a game, and one argue that
since this is game, the code should be very fast. My counter argument to
that is that I seriously doubt that converting chars to integers is going to
be the bottleneck in your game.

- Oliver


Roedy Green

unread,
Sep 13, 2005, 4:03:22 PM9/13/05
to
On Tue, 13 Sep 2005 15:09:08 GMT, "Oliver Wong" <ow...@castortech.com>
wrote or quoted :

> I see from the book title that this is for a game, and one argue that
>since this is game, the code should be very fast. My counter argument to
>that is that I seriously doubt that converting chars to integers is going to
>be the bottleneck in your game.

on the other hand, your game is not defined to work with roman
numerals. That would be considered an error.

I think there is room for both. The strongest argument for using your
way is it leaves programs open to easier internationalisation. English
speaking programmers tend to forget their code, if successful, will be
internationalised.

Oliver Wong

unread,
Sep 13, 2005, 4:50:14 PM9/13/05
to
"Roedy Green" <loo...@mindprod.com.invalid> wrote in message
news:40cei115ojgri93av...@4ax.com...

> on the other hand, your game is not defined to work with roman
> numerals. That would be considered an error.

If the design document doesn't specify a behaviour for roman numeral
input one way or another, I think actually parsing those roman numerals
would be "good" in the sense of "least surprising for the user" and "more
robust", as opposed to say, crashing, or returning an undefined value (and
then later crashing).

If the design document DOES say that upon detecting a roman numeral, an
error should be reported (or more likely "On any value other than 0, 1, 2,
3, 4, 5, 6, 7, 8 or 9, an error should be reported"), then obviously my
solution would be violating the requirements of the program.

- Oliver


Roedy Green

unread,
Sep 13, 2005, 9:58:08 PM9/13/05
to
On Tue, 13 Sep 2005 20:50:14 GMT, "Oliver Wong" <ow...@castortech.com>
wrote or quoted :

> If the design document doesn't specify a behaviour for roman numeral

>input one way or another, I think actually parsing those roman numerals
>would be "good" in the sense of "least surprising for the user" and "more
>robust", as opposed to say, crashing, or returning an undefined value (and
>then later crashing).
>
> If the design document DOES say that upon detecting a roman numeral, an
>error should be reported (or more likely "On any value other than 0, 1, 2,
>3, 4, 5, 6, 7, 8 or 9, an error should be reported"), then obviously my
>solution would be violating the requirements of the program.

On the other paw, perhaps one in 10,000 people entering a roman
numeral into your program would do it on purpose. So the principle of
least astonishment suggests the best thing to do is reject it.

Oliver Wong

unread,
Sep 14, 2005, 10:58:10 AM9/14/05
to
"Roedy Green" <loo...@mindprod.com.invalid> wrote in message
news:dr0fi1tvhec92bs0f...@4ax.com...

> On the other paw, perhaps one in 10,000 people entering a roman
> numeral into your program would do it on purpose. So the principle of
> least astonishment suggests the best thing to do is reject it.

When I say that Character.getNumericValue() parses roman numerals, I
don't mean the string "VIII", but the actually unicode character whose
codepoint in hexadecimal is 0x2167. So I personally think it'd be unlikely
that someone would "accidentally" enter that character in.

Also, I don't know if this is the case for digits, but there are
distinct alphabetic characters in unicode which, in every font I've seen,
look identical. The Cyrillic character \u0430 and Latin character \u0061
both look like 'a' in most fonts. If this ever happens for digits as well,
the user's keyboard might be mapped to a local in which the the character
that the key labelled '9' generates looks identical to '9', but '0' minus
that character equals 400 or something. This would be an example of an
accidental usage of international character, but which should be accepted to
generate the least astonishment.

- Oliver


0 new messages