Numeric literals in other than base 10 - was Annoying octal notation

James Harris

unread,

Aug 22, 2009, 5:54:41 PM8/22/09

to

On 22 Aug, 10:27, David <71da...@libero.it> wrote:

... (snipped a discussion on languages and other systems interpreting
numbers with a leading zero as octal)

> > Either hexadecimal should have been 0h or octal should
> > have been 0t :=)
>
>
> I have seen the use of Q/q instead in order to make it clearer. I still
> prefer Smalltalk's 16rFF and 8r377.
>
>
> Two interesting options. In a project I have on I have also considered
> using 0q as indicating octal. I maybe saw it used once somewhere else
> but I have no idea where. 0t was a second choice and 0c third choice
> (the other letters of oct). 0o should NOT be used for obvious reasons.
>
> So you are saying that Smalltalk has <base in decimal>r<number> where
> r is presumably for radix? That's maybe best of all. It preserves the
> syntactic requirement of starting a number with a digit and seems to
> have greatest flexibility. Not sure how good it looks but it's
> certainly not bad.
>
>
> > 0xff & 0x0e | 0b1101
> > 16rff & 16r0e | 2r1101
>
> > Hmm. Maybe a symbol would be better than a letter.

...

> > Or Ada's 16#FF#, 8#377#...

> > I forget if DEC/VMS FORTRAN or Xerox Sigma FORTRAN used x'FF' or
> > 'FF'x, and o'377' or '377'o

...

>
> What about 2_1011, 8_7621, 16_c26h or 2;1011, 8;7621, 16;c26h ?

They look good - which is important. The trouble (for me) is that I
want the notation for a new programming language and already use these
characters. I have underscore as an optional separator for groups of
digits - 123000 and 123_000 mean the same. The semicolon terminates a
statement. Based on your second idea, though, maybe a colon could be
used instead as in

2:1011, 8:7621, 16:c26b

I don't (yet) use it as a range operator.

I could also use a hash sign as although I allow hash to begin
comments it cannot be preceded by anything other than whitespace so
these would be usable

2#1011, 8#7621, 16#c26b

I have no idea why Ada which uses the # also apparently uses it to end
a number

2#1011#, 8#7621#, 16#c26b#

Copying this post also to comp.lang.misc. Folks there may either be
interested in the discussion or have comments to add.

James

Mel

unread,

Aug 22, 2009, 7:16:39 PM8/22/09

to

James Harris wrote:

> I have no idea why Ada which uses the # also apparently uses it to end
> a number
>
> 2#1011#, 8#7621#, 16#c26b#

Interesting. They do it because of this example from
<http://archive.adaic.com/standards/83rat/html/ratl-02-01.html#2.1>:

2#1#E8 -- an integer literal of value 256

where the E prefixes a power-of-2 exponent, and can't be taken as a digit of
the radix. That is to say

16#1#E2

would also equal 256, since it's 1*16**2 .

Mel.

Richard Harter

unread,

Aug 22, 2009, 11:38:59 PM8/22/09

to

On Sat, 22 Aug 2009 14:54:41 -0700 (PDT), James Harris
<james.h...@googlemail.com> wrote:

>On 22 Aug, 10:27, David <71da...@libero.it> wrote:
>
>... (snipped a discussion on languages and other systems interpreting
>numbers with a leading zero as octal)
>
>> > Either hexadecimal should have been 0h or octal should

>> > have been 0t :=3D)

>>
>>
>> I have seen the use of Q/q instead in order to make it clearer. I still
>> prefer Smalltalk's 16rFF and 8r377.
>>
>>
>> Two interesting options. In a project I have on I have also considered
>> using 0q as indicating octal. I maybe saw it used once somewhere else
>> but I have no idea where. 0t was a second choice and 0c third choice
>> (the other letters of oct). 0o should NOT be used for obvious reasons.
>>
>> So you are saying that Smalltalk has <base in decimal>r<number> where
>> r is presumably for radix? That's maybe best of all. It preserves the
>> syntactic requirement of starting a number with a digit and seems to
>> have greatest flexibility. Not sure how good it looks but it's
>> certainly not bad.

I opine that a letter is better; special characters are a
valuable piece of real estate. However for floating point you
need at least three letters because a floating point number has
three parts: the fixed point point, the exponent base, and the
exponent. Now we can represent the radices of the individual
parts with the 'r'scheme, e.g., 2r101001, but we need separate
letters to designate the exponent base and the exponent. B and E
are the obvious choices, though we want to be careful about a
confusion with 'b' in hex. For example, using 'R',

3R20.1B2E16Rac

is 20.1 in trinary (6 1/3) times 2**172 (hex ac).

I grant that this example looks a bit gobbledegookish, but normal
usage would be much simpler. The notation doesn't handle
balanced trinary; however I opine that balanced trinary requires
special notation.

Richard Harter, c...@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
No one asks if a tree falls in the forest
if there is no one there to see it fall.

Dmitry A. Kazakov

unread,

Aug 23, 2009, 4:21:52 AM8/23/09

to

On Sat, 22 Aug 2009 14:54:41 -0700 (PDT), James Harris wrote:

> They look good - which is important. The trouble (for me) is that I
> want the notation for a new programming language and already use these
> characters. I have underscore as an optional separator for groups of
> digits - 123000 and 123_000 mean the same. The semicolon terminates a
> statement. Based on your second idea, though, maybe a colon could be
> used instead as in
>
> 2:1011, 8:7621, 16:c26b
>
> I don't (yet) use it as a range operator.
>
> I could also use a hash sign as although I allow hash to begin
> comments it cannot be preceded by anything other than whitespace so
> these would be usable
>
> 2#1011, 8#7621, 16#c26b
>
> I have no idea why Ada which uses the # also apparently uses it to end
> a number
>
> 2#1011#, 8#7621#, 16#c26b#

If you are going Unicode, you could use the mathematical notation, which is

10112, 76218, c26b16

(subscript specification of the base). Yes, it might be difficult to type
(:-)), and would require some look-ahead in the parser. One of the
advantages of Ada notation, is that a numeric literal always starts with
decimal digit. That makes things simple for a descent recursive parser. I
guess this choice was intentional, back in 1983 a complex parser would eat
too much resources...

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

garabik-ne...@kassiopeia.juls.savba.sk

unread,

Aug 23, 2009, 6:08:43 AM8/23/09

to

In comp.lang.python James Harris <james.h...@googlemail.com> wrote:
> On 22 Aug, 10:27, David <71da...@libero.it> wrote:

...
>>

>> What about 2_1011, 8_7621, 16_c26h or 2;1011, 8;7621, 16;c26h ?
>
> They look good - which is important. The trouble (for me) is that I
> want the notation for a new programming language and already use these
> characters. I have underscore as an optional separator for groups of
> digits - 123000 and 123_000 mean the same.

Why not just use the space? 123 000 looks better than 123_000, and
is not syntactically ambiguous (at least in python). And as it
already works for string literals, it could be applied to numbers, too…

--
-----------------------------------------------------------
| Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!

Ben Finney

unread,

Aug 23, 2009, 10:01:46 AM8/23/09

to

garabik-ne...@kassiopeia.juls.savba.sk writes:

> Why not just use the space? 123 000 looks better than 123_000, and is
> not syntactically ambiguous (at least in python). And as it already
> works for string literals, it could be applied to numbers, too…

+1 to all this. I think this discussion was had many months ago, but
can't recall how it ended back then.

--
\ “Only the educated are free.” —Epictetus, _Discourses_ |
`\ |
_o__) |
Ben Finney

bartc

unread,

Aug 23, 2009, 11:57:57 AM8/23/09

to

<garabik-ne...@kassiopeia.juls.savba.sk> wrote in message
news:h6r4fb$18a$1...@aioe.org...

> In comp.lang.python James Harris <james.h...@googlemail.com> wrote:
>> On 22 Aug, 10:27, David <71da...@libero.it> wrote:
>
> ...
>>>
>>> What about 2_1011, 8_7621, 16_c26h or 2;1011, 8;7621, 16;c26h ?
>>
>> They look good - which is important. The trouble (for me) is that I
>> want the notation for a new programming language and already use these
>> characters. I have underscore as an optional separator for groups of
>> digits - 123000 and 123_000 mean the same.
>
> Why not just use the space? 123 000 looks better than 123_000, and
> is not syntactically ambiguous (at least in python).

If the purpose is to allow "_" to introduce a non-base ten literal, using
this to enter a hexadecimal number might result in:

16_1234 ABCD

I'd say that that was ambiguous (depending on whether a name can follow a
number; if you have a operator called ABCD, then that would be a problem).
Unless each block of digits used it's own base:

16_1234 16_ABCD

> And as it
> already works for string literals, it could be applied to numbers, too…

String literals are conveniently surround by quotes, so they're a bit easier
to recognise.

--
Bart

James Harris

unread,

Aug 23, 2009, 4:55:19 PM8/23/09

to

On 23 Aug, 04:38, c...@tiac.net (Richard Harter) wrote:
> On Sat, 22 Aug 2009 14:54:41 -0700 (PDT), James Harris
>
>
>
>
>
> <james.harri...@googlemail.com> wrote:
> >On 22 Aug, 10:27, David <71da...@libero.it> wrote:
>
> >... (snipped a discussion on languages and other systems interpreting
> >numbers with a leading zero as octal)
>
> >> > Either hexadecimal should have been 0h or octal should
> >> > have been 0t :=3D)
>
> >> I have seen the use of Q/q instead in order to make it clearer. I still
> >> prefer Smalltalk's 16rFF and 8r377.
>
> >> Two interesting options. In a project I have on I have also considered
> >> using 0q as indicating octal. I maybe saw it used once somewhere else
> >> but I have no idea where. 0t was a second choice and 0c third choice
> >> (the other letters of oct). 0o should NOT be used for obvious reasons.
>
> >> So you are saying that Smalltalk has <base in decimal>r<number> where
> >> r is presumably for radix? That's maybe best of all. It preserves the
> >> syntactic requirement of starting a number with a digit and seems to
> >> have greatest flexibility. Not sure how good it looks but it's
> >> certainly not bad.
>
> I opine that a letter is better; special characters are a
> valuable piece of real estate.

Very very true.

> However for floating point you
> need at least three letters because a floating point number has
> three parts: the fixed point point, the exponent base, and the
> exponent. Now we can represent the radices of the individual
> parts with the 'r'scheme, e.g., 2r101001, but we need separate
> letters to designate the exponent base and the exponent. B and E
> are the obvious choices, though we want to be careful about a
> confusion with 'b' in hex. For example, using 'R',
>
> 3R20.1B2E16Rac

Ooh err!

> is 20.1 in trinary (6 1/3) times 2**172 (hex ac).
>
> I grant that this example looks a bit gobbledegookish,

You think? :-)

> but normal
> usage would be much simpler. The notation doesn't handle
> balanced trinary; however I opine that balanced trinary requires
> special notation.

When the programmer needs to construct such values how about allowing
him or her to specify something like

(20.1 in base 3) times 2 to the power of 0xac

Leaving out how to specify (20.1 in base 3) for now this could be

(20.1 in base 3) * 2 ** 0xac

The compiler could convert this to a constant.

James

James Harris

unread,

Aug 23, 2009, 5:42:16 PM8/23/09

to

On 23 Aug, 00:16, Mel <mwil...@the-wire.com> wrote:
> James Harris wrote:
> > I have no idea why Ada which uses the # also apparently uses it to end
> > a number
>
> > 2#1011#, 8#7621#, 16#c26b#
>
> Interesting. They do it because of this example from
> <http://archive.adaic.com/standards/83rat/html/ratl-02-01.html#2.1>:

Thanks for providing an explanation.

>
> 2#1#E8 -- an integer literal of value 256
>
> where the E prefixes a power-of-2 exponent, and can't be taken as a digit of
> the radix. That is to say
>
> 16#1#E2
>
> would also equal 256, since it's 1*16**2 .

Here's another suggested number literal format. First, keep the
familar 0x and 0b of C and others and to add 0t for octal. (T is the
third letter of octal as X is the third letter of hex.) The numbers
above would be

0b1011, 0t7621, 0xc26b

Second, allow an arbitrary number base by putting base and number in
quotes after a zero as in

0"2:1011", 0"8:7621", 0"16:c26b"

This would work for arbitrary bases and allows an exponent to be
tagged on the end. It only depends on zero followed by a quote mark
not being used elsewhere. Finally, although it uses a colon it doesn't
take it away from being used elsewhere in the language.

Another option:

0.(2:1011), 0.(8:7621), 0.(16:c26b)

where the three characters "0.(" begin the sequence.

Comments? Improvements?

James

James Harris

unread,

Aug 23, 2009, 5:45:39 PM8/23/09

to

On 23 Aug, 21:55, James Harris <james.harri...@googlemail.com> wrote:

...

> > However for floating point you
> > need at least three letters because a floating point number has
> > three parts: the fixed point point, the exponent base, and the
> > exponent. Now we can represent the radices of the individual
> > parts with the 'r'scheme, e.g., 2r101001, but we need separate
> > letters to designate the exponent base and the exponent. B and E
> > are the obvious choices, though we want to be careful about a
> > confusion with 'b' in hex. For example, using 'R',
>
> > 3R20.1B2E16Rac
>
> Ooh err!
>
> > is 20.1 in trinary (6 1/3) times 2**172 (hex ac).
>
> > I grant that this example looks a bit gobbledegookish,
>
> You think? :-)
>
> > but normal
> > usage would be much simpler. The notation doesn't handle
> > balanced trinary; however I opine that balanced trinary requires
> > special notation.
>
> When the programmer needs to construct such values how about allowing
> him or her to specify something like
>
> (20.1 in base 3) times 2 to the power of 0xac
>
> Leaving out how to specify (20.1 in base 3) for now this could be
>
> (20.1 in base 3) * 2 ** 0xac

Using the suggestion from another post would convert this to

0.(3:20.1) * 2 ** 0xac

Scott David Daniels

unread,

Aug 23, 2009, 6:50:29 PM8/23/09

to

James Harris wrote:...

> Another option:
>
> 0.(2:1011), 0.(8:7621), 0.(16:c26b)
>
> where the three characters "0.(" begin the sequence.
>
> Comments? Improvements?

I did a little interpreter where non-base 10 numbers
(up to base 36) were:

.7.100 == 64 (octal)
.9.100 == 100 (decimal)
.F.100 == 256 (hexadecimal)
.1.100 == 4 (binary)
.3.100 == 9 (trinary)
.Z.100 == 46656 (base 36)
Advantages:
Tokenizer can recognize chunks easily.
Not visually too confusing,
No issue of what base the base indicator is expressed in.

--Scott David Daniels
Scott....@Acm.Org

bartc

unread,

Aug 23, 2009, 7:04:37 PM8/23/09

to

"Scott David Daniels" <Scott....@Acm.Org> wrote in message
news:kN2dnSZR5b0BWAzX...@pdx.net...

It can be assumed however that .9. isn't in binary?

That's a neat idea. But an even simpler scheme might be:

.octal.100
.decimal.100
.hex.100
.binary.100
.trinary.100

until it gets to this anyway:

.thiryseximal.100

--
Bartc

Piet van Oostrum

unread,

Aug 24, 2009, 3:51:37 AM8/24/09

to

>>>>> Scott David Daniels <Scott....@Acm.Org> (SDD) wrote:

>SDD> James Harris wrote:...

>>> Another option:
>>>
>>> 0.(2:1011), 0.(8:7621), 0.(16:c26b)
>>>
>>> where the three characters "0.(" begin the sequence.
>>>
>>> Comments? Improvements?

>SDD> I did a little interpreter where non-base 10 numbers
>SDD> (up to base 36) were:

>SDD> .7.100 == 64 (octal)
>SDD> .9.100 == 100 (decimal)
>SDD> .F.100 == 256 (hexadecimal)
>SDD> .1.100 == 4 (binary)
>SDD> .3.100 == 9 (trinary)
>SDD> .Z.100 == 46656 (base 36)

I wonder how you wrote that interpreter, given that some answers are wrong.
--
Piet van Oostrum <pi...@cs.uu.nl>
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: pi...@vanoostrum.org

James Harris

unread,

Aug 24, 2009, 4:25:35 AM8/24/09

to

On 24 Aug, 09:05, Erik Max Francis <m...@alcyone.com> wrote:

...

> >> Here's another suggested number literal format. First, keep the
> >> familar 0x and 0b of C and others and to add 0t for octal. (T is the
> >> third letter of octal as X is the third letter of hex.) The numbers
> >> above would be
>
> >> 0b1011, 0t7621, 0xc26b
>
> >> Second, allow an arbitrary number base by putting base and number in
> >> quotes after a zero as in
>
> >> 0"2:1011", 0"8:7621", 0"16:c26b"
>

> > Why not just put the base first, followed by the value in quotes:
>
> > 2"1011", 8"7621", 16"c26b"
>
> It's always a bit impressive how syntax suggestions get more and more
> involved and, if you'll forgive me for saying, ridiculous as the
> conversation continues. This is starting to get truly nutty.

Why do you say that here? MRAB's suggestion is one of the clearest
there has been. And it incorporates the other requirements: starts
with a digit, allows an appropriate alphabet, has no issues with
spacing digit groups, shows clearly where the number ends and could
take an exponent suffix.

James

Erik Max Francis

unread,

Aug 24, 2009, 4:30:11 AM8/24/09

to

In your opinion. Obviously not in others. Which is pretty obviously
what I meant, so the rhetorical question is a bit weird here.

There's a reason that languages designed by committee end up horrific
nightmares.

--
Erik Max Francis && m...@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 18 N 121 57 W && AIM/Y!M/Skype erikmaxfrancis
Do not seek death. Death will find you.
-- Dag Hammarskjold

James Harris

unread,

Aug 24, 2009, 4:47:44 AM8/24/09

to

On 24 Aug, 09:30, Erik Max Francis <m...@alcyone.com> wrote:
> James Harris wrote:
> > On 24 Aug, 09:05, Erik Max Francis <m...@alcyone.com> wrote:
> >>>> Here's another suggested number literal format. First, keep the
> >>>> familar 0x and 0b of C and others and to add 0t for octal. (T is the
> >>>> third letter of octal as X is the third letter of hex.) The numbers
> >>>> above would be
> >>>> 0b1011, 0t7621, 0xc26b
> >>>> Second, allow an arbitrary number base by putting base and number in
> >>>> quotes after a zero as in
> >>>> 0"2:1011", 0"8:7621", 0"16:c26b"
> >>> Why not just put the base first, followed by the value in quotes:
> >>> 2"1011", 8"7621", 16"c26b"
> >> It's always a bit impressive how syntax suggestions get more and more
> >> involved and, if you'll forgive me for saying, ridiculous as the
> >> conversation continues. This is starting to get truly nutty.
>
> > Why do you say that here? MRAB's suggestion is one of the clearest
> > there has been. And it incorporates the other requirements: starts
> > with a digit, allows an appropriate alphabet, has no issues with
> > spacing digit groups, shows clearly where the number ends and could
> > take an exponent suffix.
>
> In your opinion. Obviously not in others. Which is pretty obviously
> what I meant, so the rhetorical question is a bit weird here.

Don't get defensive.... Yes, in my opinion, if you like, but you can't
say "obviously not in others" as no one else but you has commented on
MRAB's suggestion.

Also, when you say "This is starting to get truly nutty" would you
accept that that's in your opinion?

> There's a reason that languages designed by committee end up horrific
> nightmares.

True but I would suggest that mistakes are also made by designers who
do not seek the opinions of others. There's a balance to be struck
between a committee and an ivory tower.

James

NevilleDNZ

unread,

Aug 24, 2009, 8:22:42 AM8/24/09

to

On Aug 23, 9:42 pm, James Harris <james.harri...@googlemail.com>
wrote:

> The numbers above would be
>
> 0b1011, 0t7621, 0xc26b

Algol68 has the type BITS, that is converted to INT with the ABS
operator.
The numbers above would be:
> 2r1011, 8r7621, 16rc26b

"r" is for radix: http://en.wikipedia.org/wiki/Radix

The standard supports 2r, 4r, 8r & 16r only.

The standard supports LONG BITS, LONG LONG BITS etc, but does not
include UNSIGNED.

Compare gcc's:

bash$ cat num_lit.c
#include <stdio.h>
main(){
printf("%d %d %d %d\n",0xffff,07777,9999,0b1111);
}

bash$ ./num_lit
65535 4095 9999 15

With Algol68's: https://sourceforge.net/projects/algol68/

bash$ cat num_lit.a68
main:(
printf(($g$,ABS 16rffff,ABS 8r7777,9999,ABS 2r1111,$l$))
)

bash$ algol68g ./num_lit.a68
+65535 +4095 +9999 +15

Enjoy
N

Scott David Daniels

unread,

Aug 24, 2009, 12:18:20 PM8/24/09

to

Piet van Oostrum wrote:
>>>>>> Scott David Daniels <Scott....@Acm.Org> (SDD) wrote:
>
>> SDD> James Harris wrote:...
>>>> Another option:
>>>>
>>>> 0.(2:1011), 0.(8:7621), 0.(16:c26b)
>>>>
>>>> where the three characters "0.(" begin the sequence.
>>>>
>>>> Comments? Improvements?
>
>> SDD> I did a little interpreter where non-base 10 numbers
>> SDD> (up to base 36) were:
>
>> SDD> .7.100 == 64 (octal)
>> SDD> .9.100 == 100 (decimal)
>> SDD> .F.100 == 256 (hexadecimal)
>> SDD> .1.100 == 4 (binary)
>> SDD> .3.100 == 9 (trinary)
>> SDD> .Z.100 == 46656 (base 36)
>
> I wonder how you wrote that interpreter, given that some answers are wrong.

Obviously I started with a different set of examples and edited after
starting to make a table that could be interpretted in each base. After
doing that, I forgot to double check, and lo and behold .F.1000 = 46656,
while .F.100 = 1296. Since it has been decades since I've had access
to that interpreter, this is all from memory.

--Scott David Daniels
Scott....@Acm.Org

robin

unread,

Aug 27, 2009, 12:07:20 PM8/27/09

to

"James Harris" <james.h...@googlemail.com> wrote in message
news:bc3607b3-7fdd-43fd...@32g2000yqj.googlegroups.com...

On 22 Aug, 10:27, David <71da...@libero.it> wrote:

>They look good - which is important. The trouble (for me) is that I
>want the notation for a new programming language and already use these
>characters. I have underscore as an optional separator for groups of
>digits - 123000 and 123_000 mean the same. The semicolon terminates a
>statement. Based on your second idea, though, maybe a colon could be
>used instead as in

XPL uses "(2)1011" for base 4,
"(3)03212" for octal,
"(4)0741" for base 16.

PL/I uses 8FXN for numeric hex and X suffix for a hex character constant.