Alphabetizing Numbers

John Kristian

unread,

Feb 10, 2010, 8:59:41 PM2/10/10

to

How can numbers be represented as character strings, in such a way
that the alphabetical order of the strings is the same as the order of
the corresponding numbers?

For example, the integers 1, 2, ... can be represented as the letters
A, B, ... but this system doesn't extend beyond a limited range of
integers. Ordinary Hindu-Arabic notation works for rational numbers
in the interval [0, 10), but "10" < "2" alphabetically. A more
general system would extend to all real numbers.

For the sake of discussion, let's use English characters, ordered by
US-ASCII code. The numerals come first, followed by upper case
letters and then lower case letters, with punctuation interspersed.

It's encouraging to know alphabetical order is a total order, and
dense over some infinite sets of character strings. An example is a
set of strings that don't end with the first letter of the alphabet,
such as strings of letters that don't end with A.

John Kristian

unread,

Feb 11, 2010, 12:58:22 AM2/11/10

to

Sorry I neglected to define alphabetical order. I mean, for two
distinct strings X and Y, X < Y iff X is zero length, or the first
character of X < the first character of Y, or the first characters are
equal and the rest of X < the rest of Y. For example, these strings
are in alphabetical order: "A", "AB", "ABC", "BA", "BAAA", "BAC".

James Dow Allen

unread,

Feb 11, 2010, 2:59:52 AM2/11/10

to

On Feb 11, 8:59 am, John Kristian <jk2...@engineer.com> wrote:
> How can numbers be represented as character strings, in such a way
> that the alphabetical order of the strings is the same as the order of
> the corresponding numbers?

A simple scheme that comes to mind is the Elias Gamma code,
rather popular in data compression. There are many versions
of that code; here's one adjusted to have the desired sorting
property. (Elias Gamma code operates on the positive integers.
After describing it, I'll show how to then get the reals.)

Express the positive integer in binary, drop the leading one, count
the bits remaining (call the count N), prefix N 1's followed by zero.

For example, to express 25, follow the above instructions to
get
25 -> 11001 -> 1001 (N=4) -> (11110)1001 -> 111101001

Since 0 cannot be coded (and 1 will cause trouble, see below),
increase the integer by 2 before using the above code.
Elias' code has an "instaneous decoding" property; you don't
need that but it has the benefit that you won't need a decimal
point to separate the fractional part.

For a positive real number, say 10111.000111 in binary, your code
will be
111101001000111
Starting with the integer part, 10111 = 23, I added 2 as mentioned
above, took the Elias code for 25, concatenated the fractional part.

For negative number (-x), simply use the inverse (x), but change all
0's to 1's and vice versa. Zero will be no problem (e.g. use "1").

This all uses a 2-sized alphabet, instead of 26-sized.
Adjust to suit. YMMV.

* * * * * * * *

Another almost-example that comes to mind is the (normalized)
floating-point number format used on certain old computers like CDC's.
They *did* have your desired property for the numbers they could
represent. However, they couldn't represent *all* real numbers.

The Hackenbush number system also comes to mind.

James Dow Allen

Mensanator

unread,

Feb 11, 2010, 2:40:14 PM2/11/10

to

On Feb 10, 7:59 pm, John Kristian <jk2...@engineer.com> wrote:
> How can numbers be represented as character strings, in such a way
> that the alphabetical order of the strings is the same as the order of
> the corresponding numbers?

Use fixed width padded with spaces.

In MS-Access, I use the function format$([number],"@@@@@@@@") which
returns

" 9"
" 10"
" 934567"

whose alphabetical sort matches its numerical sort.

>
> For example, the integers 1, 2, ... can be represented as the letters
> A, B, ... but this system doesn't extend beyond a limited range of
> integers. Ordinary Hindu-Arabic notation works for rational numbers
> in the interval [0, 10), but "10" < "2" alphabetically. A more
> general system would extend to all real numbers.

Split the characteristic from the mantissa, use padded fixed width
for the characteristic and append the mantissa. This will align the
decimal points and again, alphabetical sort matches numerical sort.

Examples:

" 1.12345"
" 933.011"
" 1000.3"
" 1000.534567"

John Kristian

unread,

Feb 11, 2010, 4:49:01 PM2/11/10

to

Fixed width can represent only a limited interval, roughly [0, R**W)
where R is the number of characters in the alphabet and W is the width
of the string that represents the integer part.

mike

unread,

Feb 11, 2010, 6:08:20 PM2/11/10

to

In article <421d22bf-306e-4818-a7ab-
58f272...@u5g2000prd.googlegroups.com>, jk2...@engineer.com says...

How about adding prepending the Hindu-Arabic number string with one 'a'
for each digit that appears before the decimal point. So 2 turns into
'a2', and 10 turns into 'aa10'. Now 3.0 < 3.14159 < 12.345 and 'a3.0' is
alphabetically after 'a3.14159' which similarly is alphabetically after
'aa12.345' etc. I think that should work for all +ve reals. If you want
to include the -ve reals then find a character (lets call it #) that is
alphabetically after any of (a,0,1....9) and prepend negatives with a
suitable number of #'s. I.e. 10 > 3 > -1 > -11 and 'aa10', 'a3', '#-1',
'##-10' are all in alphabetical order (assuming '#' appears later in the
alphabet than 'a' or '0'...'9').

Cheers
Mike

Tim Little

unread,

Feb 11, 2010, 7:03:01 PM2/11/10

to

On 2010-02-11, John Kristian <jk2...@engineer.com> wrote:
> How can numbers be represented as character strings, in such a way
> that the alphabetical order of the strings is the same as the order of
> the corresponding numbers?

Quite easily: e.g. "A", "AA", "AAA", etc. It is easy to devise more
compact encodings.

However, you cannot use all the possible strings to represent natural
numbers, since between any two distinct strings there are infinitely
many other strings in their ordering: a property not true of natural
numbers.

> For the sake of discussion, let's use English characters, ordered by
> US-ASCII code. The numerals come first, followed by upper case
> letters and then lower case letters, with punctuation interspersed.

For example: Let N be the number of initial '~' characters (the last
printable US-ASCII character). They are followed by 2^N characters
(not beginning with '~') representing a number in base 95. Only the
shortest representation for a given number shall be considered valid.

There are a great many variations on this sort of rule.

- Tim

Mensanator

unread,

Feb 11, 2010, 7:39:48 PM2/11/10

to

On Feb 11, 3:49 pm, John Kristian <jk2...@engineer.com> wrote:
> Fixed width can represent only a limited interval, roughly [0, R**W)
> where R is the number of characters in the alphabet and W is the width
> of the string that represents the integer part.
>

Is that of any practical concern? Certainly not in my work.

John Kristian

unread,

Feb 12, 2010, 2:03:35 PM2/12/10

to

On Feb 11, 3:08 pm, mike <m....@irl.cri.replacethiswithnz> wrote:
> How about adding prepending the Hindu-Arabic number string
> with one 'a' for each digit that appears before the decimal point.

That works for non-negative real numbers. James Dow Allen pointed out
a way to get the negative integers; that is to construct the string
for the absolute value (a positive integer), but with the alphabet
reversed. For example, 2, 1, 0, -1, -2 would be represented as "a2",
"a1", "a0", "98", "97". Negative zero is "9a". But I can't think how
to do the negative non-integers. -1 > -1.1, but "98" <
"98.8" (different alphabetical order).

Peter Webb

unread,

Feb 12, 2010, 9:07:33 PM2/12/10

to

"John Kristian" <jk2...@engineer.com> wrote in message
news:cc0ff8d4-f22e-42ca...@y7g2000prc.googlegroups.com...

Fixed width can represent only a limited interval, roughly [0, R**W)
where R is the number of characters in the alphabet and W is the width
of the string that represents the integer part.

_______________________________________
Computer floating points can also only represent a limitted interval. Just
make sure that the string range is larger than the double point range.

Tim Little

unread,

Feb 13, 2010, 10:33:42 PM2/13/10

to

On 2010-02-12, John Kristian <jk2...@engineer.com> wrote:
> That works for non-negative real numbers. James Dow Allen pointed out
> a way to get the negative integers; that is to construct the string
> for the absolute value (a positive integer), but with the alphabet
> reversed. For example, 2, 1, 0, -1, -2 would be represented as "a2",
> "a1", "a0", "98", "97". Negative zero is "9a". But I can't think how
> to do the negative non-integers. -1 > -1.1, but "98" <
> "98.8" (different alphabetical order).

You can fairly easily map all of Q into strings.

First, use any of the usual order-preserving bijections of Q with
Q /\ (0,1), e.g. piecewise linear mappings from [-(n+1), -n] to
[1/(n+3), 1/(n+2)] and [n, n+1] to [1 - 1/(n+2), 1 - 1/(n+3)].

Then use a "factorial" fraction representation. All rationals
terminate in this form, though large prime factors in the denominator
will entail a long representation. A simple example:
4/7 = 1/2! + 0/3! + 1/4! + 3/5! + 3/6! + 3/7!.

Finally you can encode each digit: e.g. (in letters) "BABDDD" for 4/7.
If you get to 96! you will need character pairs to represent all the
possible digits in printable US-ASCII, triples at (95^2+1)!, etc.
Some of the earlier digits could be combined to slightly better fill
the available character set, e.g. 4/7 = 13/4! + 21/6! + 24/8!.

- Tim