Best search algorithm to find condition within a range

jonas.t...@gmail.com

unread,

Apr 7, 2015, 5:44:43 AM4/7/15

to

I want todo faster baseconversion for very big bases like base 1 000 000, so instead of adding up digits i search it.

I need the fastest algorithm to find the relation to a decimal number.
Digmult is an instance of base at a digitplace (base^x) what i try to find is the digit for the below condition is true and the loop break.

*********************************
for (digit=0;digit<=base;digit++) {
if((digit+1)*digmult>decNumber)break;
}
*********************************

So i am looking for the digit where following condition true.

if((digit)*digmult<decNumber) AND if((digit+1)*digmult>decNumber) then BREAK;

One could start at half base searching, but then i Think i've read that using 1/3 closing in faster?

I Think also i remember that if the search space so big that at least 22 or 23 guesses, needed.A random Oracle may even faster?

Just pick up a number and get lucky, is it any truth to that?

Below the actual algorithm.

<SCRIPT LANGUAGE="Javascript">
//CONVERT A DECIMAL NUMBER INTO ANYBASE
function newbase(decNumber,base){
digits=1;
digmult=1;
while(digmult*base<=decNumber){
digmult=digmult*base
digits++;
}
digsave=digmult;
while(decNumber>0 || digits>0){
loop=1;
digit=0;
for (digit=0;digit<=base;digit++) {
if((digit+1)*digmult>decNumber)break;
}
out[digits]=digit;
digmult=digmult*digit;
decNumber=decNumber-digmult;
digsave=digsave/base;
digmult=digsave;
digits--;
}
return out;
}

var out= [];
base=256;
number=854544;
out=newbase(number,base);
out.reverse();
document.write("Number = ",out,"<BR>");
</script>

Dave Angel

unread,

Apr 7, 2015, 9:30:36 AM4/7/15

to pytho...@python.org

On 04/07/2015 05:44 AM, jonas.t...@gmail.com wrote:
>
>
> I want todo faster baseconversion for very big bases like base 1 000 000, so instead of adding up digits i search it.

For this and most of the following statements: I can almost guess what
you're trying to say. However, I cannot. No idea why you're adding up
digits, that sounds like casting out nines. And in base-N, that would
be casting out (N-1)'s.

What's the it you're trying to search?

How do you know the baseconversion is the bottleneck, if you haven't
written any Python code yet?

>
> I need the fastest algorithm to find the relation to a decimal number.

What relation would that be? Between what and what?

> Digmult is an instance of base at a digitplace (base^x) what i try to find is the digit for the below condition is true and the loop break.
>

You haven't defined a class "Base" yet. In fact, I don't see any Python
code in the whole message.

>
> *********************************
> for (digit=0;digit<=base;digit++) {
> if((digit+1)*digmult>decNumber)break;
> }
> *********************************

>
> So i am looking for the digit where following condition true.
>
> if((digit)*digmult<decNumber) AND if((digit+1)*digmult>decNumber) then BREAK;

You could try integer divide. That's just something like
digit = decNumber // digmult
But if you think hard enough you'd realize that

If that code were in Python, I could be more motivated to critique it.
The whole algorithm could be much simpler. But perhaps there is some
limitation of javascript that's crippling the code.

How would you do it if you were converting the base by hand? I
certainly wouldn't be doing any trial and error. For each pass, I'd
calculate quotient and remainder, where remainder is the digit, and
quotient is the next value you work on.

--
DaveA

jonas.t...@gmail.com

unread,

Apr 7, 2015, 10:11:07 AM4/7/15

to

I am doing it just like i would do it by hand finding the biggest digit first. To do that i need to know nearest base^exp that is less than the actual number. Add up the digit (multiply) it to the nearest smaller multiple. Subtract that number (base^exp*multiple).

Divide / Scale down the exponent with base. And record the digit.
And start looking for next digit doing same manipulation until remainder = 0.

And that is what i am doing.

Dave Angel

unread,

Apr 7, 2015, 10:27:20 AM4/7/15

to pytho...@python.org

On 04/07/2015 10:10 AM, jonas.t...@gmail.com wrote:
> Den tisdag 7 april 2015 kl. 15:30:36 UTC+2 skrev Dave Angel:

<snip>

>>
>> If that code were in Python, I could be more motivated to critique it.
>> The whole algorithm could be much simpler. But perhaps there is some
>> limitation of javascript that's crippling the code.
>>
>> How would you do it if you were converting the base by hand? I
>> certainly wouldn't be doing any trial and error. For each pass, I'd
>> calculate quotient and remainder, where remainder is the digit, and
>> quotient is the next value you work on.
>>
>>
>> --
>> DaveA
>
> I am doing it just like i would do it by hand finding the biggest digit first. To do that i need to know nearest base^exp that is less than the actual number. Add up the digit (multiply) it to the nearest smaller multiple. Subtract that number (base^exp*multiple).
>
> Divide / Scale down the exponent with base. And record the digit.
> And start looking for next digit doing same manipulation until remainder = 0.
>
> And that is what i am doing.
>

Then I don't know why you do the call to reverse() in the top-level code.

If I were doing it, I'd have no trial and error in the code at all.
Generate the digits right to left, then reverse them before returning.

For example, if you want to convert 378 to base 10 (it's binary
internally), you'd divide by 10 to get 37, remainder 8. Save the 8, and
loop again. Divide 37 by 10 and get 3, remainder 7. Save the 7. Divide
again by 10 and get 0, remainder 3. Save the 3

Now you have '8', '7', '3' So you reverse the list, and get
'3', '7', '8'

--
DaveA

Denis McMahon

unread,

Apr 7, 2015, 10:30:15 AM4/7/15

to

On Tue, 07 Apr 2015 09:29:59 -0400, Dave Angel wrote:

> On 04/07/2015 05:44 AM, jonas.t...@gmail.com wrote:

>> I want todo faster baseconversion for very big bases like base 1 000
>> 000, so instead of adding up digits i search it.

> How do you know the baseconversion is the bottleneck, if you haven't
> written any Python code yet?

He doesn't. He doesn't comprehend that as far as a computer is concerned
an integer has no specific 'base', it's only when presented in a form for
humans to read that it gets base information added in the representation.

He's making these and other similar errors in the javascript groups too.

I suspect he's one of those people that spends his time thinking up
elaborate solutions that he has no idea how to implement as a response to
dreamt up non existent problems.

--
Denis McMahon, denismf...@gmail.com

Ian Kelly

unread,

Apr 7, 2015, 10:32:56 AM4/7/15

to Python

On Tue, Apr 7, 2015 at 3:44 AM, <jonas.t...@gmail.com> wrote:
>
>
> I want todo faster baseconversion for very big bases like base 1 000 000, so instead of adding up digits i search it.
>
> I need the fastest algorithm to find the relation to a decimal number.
> Digmult is an instance of base at a digitplace (base^x) what i try to find is the digit for the below condition is true and the loop break.
>
>
> *********************************
> for (digit=0;digit<=base;digit++) {
> if((digit+1)*digmult>decNumber)break;
> }
> *********************************
>
> So i am looking for the digit where following condition true.
>
> if((digit)*digmult<decNumber) AND if((digit+1)*digmult>decNumber) then BREAK;

I'm not sure that I understand what it is that you're trying to
accomplish. Are you trying to find the digits without using division
because "division is slow"? If that's the case, then let me show you
something.

$ python -m timeit -s "n = 1523837293" "n // 1000000"
1000000 loops, best of 3: 0.286 usec per loop

$ python -m timeit -s "n = 1523" "n * 1000000; (n+1) * 1000000"
1000000 loops, best of 3: 0.455 usec per loop

In Python, one addition and two multiplications are already slower
than a single division. The only way you're going to beat division by
using trial multiplication is if the first digit that you try is
always correct. To do that, you would need an oracle feeding your
search algorithm, and then you might as well just use the oracle.

> One could start at half base searching, but then i Think i've read that using 1/3 closing in faster?

Do you mean binary search? That would be an improvement over the
linear search algorithm you've shown. Whether a trinary search might
be faster would depend on the distribution of the numbers you expect.
If they're evenly distributed, it will be slower.

> I Think also i remember that if the search space so big that at least 22 or 23 guesses, needed.A random Oracle may even faster?
>
> Just pick up a number and get lucky, is it any truth to that?

On average, a random Oracle with a search space of 1000000 will need
1000000 guesses.

Chris Angelico

unread,

Apr 7, 2015, 10:34:44 AM4/7/15

to pytho...@python.org

On Wed, Apr 8, 2015 at 12:26 AM, Dave Angel <da...@davea.name> wrote:
> For example, if you want to convert 378 to base 10 (it's binary internally),
> you'd divide by 10 to get 37, remainder 8. Save the 8, and loop again.
> Divide 37 by 10 and get 3, remainder 7. Save the 7. Divide again by 10 and
> get 0, remainder 3. Save the 3
>
> Now you have '8', '7', '3' So you reverse the list, and get
> '3', '7', '8'

Technically, it doesn't matter that it's stored in binary. All that
matters is that it's stored in some way that you can perform division
on. I used to do this kind of thing in assembly language, pushing
digits onto the stack, then popping them off afterward. (It's actually
easier to make a column of numbers right-justified, as you can simply
create a buffer of the right size - assuming you're working with a
machine word and can know the maximum size - and populate it from the
far end.) But yes, this is the standard way to do base conversions.

ChrisA

jonas.t...@gmail.com

unread,

Apr 7, 2015, 10:36:31 AM4/7/15

to

Bullshit declare two integers in any language one 7 and one 4 and then write x=7+4; if you find a programming language where that does not yield 11 tell me.

Integers are internally assumed to be base 10 otherwise you could not calculate without giving the base.

All operations on integers addition, subtraction, multiplication and division assume base 10.

Chris Angelico

unread,

Apr 7, 2015, 10:49:18 AM4/7/15

to pytho...@python.org

On Wed, Apr 8, 2015 at 12:36 AM, <jonas.t...@gmail.com> wrote:
> Bullshit declare two integers in any language one 7 and one 4 and then write x=7+4; if you find a programming language where that does not yield 11 tell me.
>
> Integers are internally assumed to be base 10 otherwise you could not calculate without giving the base.
>
> All operations on integers addition, subtraction, multiplication and division assume base 10.

You misunderstand how computers and programming languages work. What
you're seeing there is that *integer literals* are usually in base 10;
and actually, I can point to plenty of assembly languages where the
default isn't base 10 (it's usually base 16 (hexadecimal) on IBM PCs,
and probably base 8 (octal) on big iron). This is nothing to do with
the internal representation, and all to do with source code.

ChrisA

MRAB

unread,

Apr 7, 2015, 11:00:53 AM4/7/15

to pytho...@python.org

On 2015-04-07 15:36, jonas.t...@gmail.com wrote:
> Den tisdag 7 april 2015 kl. 16:30:15 UTC+2 skrev Denis McMahon:
>> On Tue, 07 Apr 2015 09:29:59 -0400, Dave Angel wrote:
>>
>>> On 04/07/2015 05:44 AM, jonas.t...@gmail.com wrote:
>>
>>>> I want todo faster baseconversion for very big bases like base
>>>> 1 000 000, so instead of adding up digits i search it.
>>
>>> How do you know the baseconversion is the bottleneck, if you
>>> haven't written any Python code yet?
>>
>> He doesn't. He doesn't comprehend that as far as a computer is
>> concerned an integer has no specific 'base', it's only when
>> presented in a form for humans to read that it gets base
>> information added in the representation.
>>
>> He's making these and other similar errors in the javascript groups
>> too.
>>
>> I suspect he's one of those people that spends his time thinking
>> up elaborate solutions that he has no idea how to implement as a
>> response to dreamt up non existent problems.
>>

> Bullshit declare two integers in any language one 7 and one 4 and
> then write x=7+4; if you find a programming language where that does
> not yield 11 tell me.
>
> Integers are internally assumed to be base 10 otherwise you could not
> calculate without giving the base.
>
> All operations on integers addition, subtraction, multiplication and
> division assume base 10.
>

Sorry to say this, but that's nonsense.

It doesn't matter what base it's working in internally; usually it's
base 2 (binary), because that's simpler to implement.

It's only when you're converting from or to text that you need specify
a base. Humans prefer base 10, so they've make that the default.

Dave Angel

unread,

Apr 7, 2015, 11:05:07 AM4/7/15

to pytho...@python.org

On 04/07/2015 10:36 AM, jonas.t...@gmail.com wrote:

> All operations on integers addition, subtraction, multiplication and division assume base 10.
>

There have been machines where that was true, but I haven't worked on
such for about 30 years. On any machines I've programmed lately, the
arithmetic is done in binary by default, and only converted to decimal
for printing.

Not that the internal base is usually relevant, of course.

--
DaveA

Grant Edwards

unread,

Apr 7, 2015, 11:05:33 AM4/7/15

to

On 2015-04-07, Chris Angelico <ros...@gmail.com> wrote:
> On Wed, Apr 8, 2015 at 12:36 AM, <jonas.t...@gmail.com> wrote:
>
>> Integers are internally assumed to be base 10 otherwise you could not
>> calculate without giving the base.
>>
>> All operations on integers addition, subtraction, multiplication and
>> division assume base 10.
>
> You misunderstand how computers and programming languages work. What
> you're seeing there is that *integer literals* are usually in base
> 10; and actually, I can point to plenty of assembly languages where
> the default isn't base 10 (it's usually base 16 (hexadecimal) on IBM
> PCs, and probably base 8 (octal) on big iron).

I'd be curious to see some of those assemblers. I've used dozens of
assemblers over the years for everything from microprocessors with a
few hundred bytes of memory to mini-computers and mainframes. I've
never seen one that didn't default to base 10 for integer literals.

I'm not saying they don't exist, just that it would be interesting to
see an example of one.

--
Grant Edwards grant.b.edwards Yow! I'm a fuschia bowling
at ball somewhere in Brittany
gmail.com

Ian Kelly

unread,

Apr 7, 2015, 11:07:36 AM4/7/15

to Python

You're conflating the internal representation of the integer with the
formatting that is done to display the integer. When you do
"print(x)", the computer doesn't just dump the internal representation
of x onto the display. It formats x as character data and displays
*that*. For integers, the vast majority of programming languages will
do the formatting as base 10 by default, since that is the format
preferred by most humans.

Dave Angel

unread,

Apr 7, 2015, 11:24:19 AM4/7/15

to pytho...@python.org

I can't "show" it to you, but the assembler used to write microcode on
the Wang labs 200VP and 2200MVP used hex for all its literals. I wrote
the assembler (and matching debugger-assembler), and if we had needed
other bases I would have taken an extra day to add them in.

That assembler was not available to our customers, as the machine
shipped with the microcode in readonly form. Not quite as readonly as
the Intel processors of today, of course.

Additionally, the MSDOS DEBUG program used hex to enter in its literals,
if i recall correctly. Certainly when it disassembled code, it was in hex.

--
DaveA

Chris Angelico

unread,

Apr 7, 2015, 11:38:12 AM4/7/15

to pytho...@python.org

On Wed, Apr 8, 2015 at 1:23 AM, Dave Angel <da...@davea.name> wrote:
> Additionally, the MSDOS DEBUG program used hex to enter in its literals, if
> i recall correctly. Certainly when it disassembled code, it was in hex.

Indeed, and that's where I learned 80x86 assembly coding (I didn't
have an actual assembler at the time). DEBUG didn't even have the
option of using other bases; other assemblers might give you "decimal
by default, or adorn them for other bases" (eg MASM's &H notation),
but DEBUG forced you to convert to hex manually.

Although, to be fair, DEBUG was said to have a "mini-assembler" built
in. It was never designed to replace actual assemblers, AFAIK. I just
happened to use it that way. :)

ChrisA

jonas.t...@gmail.com

unread,

Apr 7, 2015, 11:41:12 AM4/7/15

to

Well of course you use same principles like a binary search setting min and max, closing in on the digit. In this case the searched numbers > base^exp and number< base^exp+1.

But since the search is within large bases upto 32-bit space, so base 4294967295 is the biggest allowed. I need to find the nearest less exp in base for each (lets call them pseudo digits). But as you see there it will take time to add them up. So better doing a binary search, you know min-max half (iteration). You can do the same for a random oracle min max within range, and if the number of tries in general over 22 i think a random oracle do it better than a binary search.

It was a long time since i did this, but i do know there is a threshold where searching min max with the oracle will be faster than the binary search.

jonas.t...@gmail.com

unread,

Apr 7, 2015, 11:43:28 AM4/7/15

to

No that is not what i am saying, i am saying if you do operations on two integers the machine will assume base 10.

jonas.t...@gmail.com

unread,

Apr 7, 2015, 11:47:53 AM4/7/15

to

No i don't i say the operations assume base ten other wise INT A=7,B=4 Calculate C=A+B would not yield 11 as an answer. The programming languages assume integer operations are done in base 10, at least the one where you can not specify in what base the integer arithmetic is done. And there probably is such languages, Python maybe?

MRAB

unread,

Apr 7, 2015, 12:02:36 PM4/7/15

to pytho...@python.org

I have a book called "Choosing and using 4 Bit Microcontrollers" by
Philip McDowell. In it is an example assembly listing for an OKI 6351
microcontroller that uses unadorned hexadecimal literals.

jonas.t...@gmail.com

unread,

Apr 7, 2015, 12:18:17 PM4/7/15

to

Den tisdag 7 april 2015 kl. 16:30:15 UTC+2 skrev Denis McMahon:

Well Denis if it hasn't a base then how can you add integers, the integer operations assume base 10.

Dave Angel

unread,

Apr 7, 2015, 12:34:32 PM4/7/15

to pytho...@python.org

On 04/07/2015 11:40 AM, jonas.t...@gmail.com wrote:
> Den tisdag 7 april 2015 kl. 16:32:56 UTC+2 skrev Ian:
>> On Tue, Apr 7, 2015 at 3:44 AM, <jonas.t...@gmail.com> wrote:
>>>
>>>
>>> I want todo faster baseconversion for very big bases like base 1 000 000, so instead of adding up digits i search it.
>>>
>>> I need the fastest algorithm to find the relation to a decimal number.
>>> Digmult is an instance of base at a digitplace (base^x) what i try to find is the digit for the below condition is true and the loop break.
>>>
>>>
>>> *********************************
>>> for (digit=0;digit<=base;digit++) {
>>> if((digit+1)*digmult>decNumber)break;
>>> }
>>> *********************************
>>>

<snip>

>>
>>> One could start at half base searching, but then i Think i've read that using 1/3 closing in faster?
>>
>> Do you mean binary search? That would be an improvement over the
>> linear search algorithm you've shown. Whether a trinary search might
>> be faster would depend on the distribution of the numbers you expect.
>> If they're evenly distributed, it will be slower.
>>
>>> I Think also i remember that if the search space so big that at least 22 or 23 guesses, needed.A random Oracle may even faster?
>>>
>>> Just pick up a number and get lucky, is it any truth to that?
>>
>> On average, a random Oracle with a search space of 1000000 will need
>> 1000000 guesses.
>
> Well of course you use same principles like a binary search setting min and max, closing in on the digit. In this case the searched numbers > base^exp and number< base^exp+1.
>
> But since the search is within large bases upto 32-bit space, so base 4294967295 is the biggest allowed. I need to find the nearest less exp in base for each (lets call them pseudo digits). But as you see there it will take time to add them up. So better doing a binary search, you know min-max half (iteration). You can do the same for a random oracle min max within range, and if the number of tries in general over 22 i think a random oracle do it better than a binary search.
>
> It was a long time since i did this, but i do know there is a threshold where searching min max with the oracle will be faster than the binary search.
>

Once again, there's no point in doing a search, when a simple integer
divide can give you the exact answer. And there's probably no point in
going left to right when right to left would yield a tiny, fast program.

I haven't seen one line of Python from you yet, so perhaps you're just
yanking our chain. I'm not here to optimize Javascript code.

Using only Python 3.4 and builtin functions, this function can be
implemented straightforwardly in 7 lines, assuming number is nonnegative
integer, and base is positive integer. It definitely could be done
smaller, but then the code might be more confusing.

--
DaveA

jonas.t...@gmail.com

unread,

Apr 7, 2015, 1:08:12 PM4/7/15

to

So you can tell me the first (higest) digit of the integer 2932903594368438384328325832983294832483258958495845849584958458435439543858588435856958650865490

Using base 429496729?

How long time did it take to find it?

Ian Kelly

unread,

Apr 7, 2015, 1:46:36 PM4/7/15

to Python

>>> def to_base(number, base):
... digits = []
... while number > 0:
... digits.append(number % base)
... number //= base
... return digits or [0]
...
>>> to_base(2932903594368438384328325832983294832483258958495845849584958458435439543858588435856958650865490, 429496729)
[27626525, 286159541, 134919277, 305018215, 329341598, 48181777,
79384857, 112868646, 221068759, 70871527, 416507001, 31]

> How long time did it take to find it?

About 15 microseconds.

Terry Reedy

unread,

Apr 7, 2015, 2:56:51 PM4/7/15

to pytho...@python.org

On 4/7/2015 1:44 PM, Ian Kelly wrote:

>>>> def to_base(number, base):
> ... digits = []
> ... while number > 0:
> ... digits.append(number % base)
> ... number //= base
> ... return digits or [0]
> ...
>>>> to_base(2932903594368438384328325832983294832483258958495845849584958458435439543858588435856958650865490, 429496729)
> [27626525, 286159541, 134919277, 305018215, 329341598, 48181777,
> 79384857, 112868646, 221068759, 70871527, 416507001, 31]

> About 15 microseconds.

% and probably // call divmod internally and toss one of the results.
Slightly faster (5.7 versus 6.1 microseconds on my machine) is

def to_base_dm(number, base):
digits = []
while number > 0:
number, rem = divmod(number, base)
digits.append(rem)
return digits or [0]

--
Terry Jan Reedy

Ian Kelly

unread,

Apr 7, 2015, 3:20:50 PM4/7/15

to Python

On Tue, Apr 7, 2015 at 12:55 PM, Terry Reedy <tjr...@udel.edu> wrote:
> On 4/7/2015 1:44 PM, Ian Kelly wrote:
>
>>>>> def to_base(number, base):
>>
>> ... digits = []
>> ... while number > 0:
>> ... digits.append(number % base)
>> ... number //= base
>> ... return digits or [0]
>> ...
>>>>>
>>>>>
>>>>> to_base(2932903594368438384328325832983294832483258958495845849584958458435439543858588435856958650865490,
>>>>> 429496729)
>>
>> [27626525, 286159541, 134919277, 305018215, 329341598, 48181777,
>> 79384857, 112868646, 221068759, 70871527, 416507001, 31]
>> About 15 microseconds.
>
>
> % and probably // call divmod internally and toss one of the results.
> Slightly faster (5.7 versus 6.1 microseconds on my machine) is

Not on my box.

$ python3 -m timeit -s "n = 1000000; x = 42" "n % x; n // x"
10000000 loops, best of 3: 0.105 usec per loop
$ python3 -m timeit -s "n = 1000000; x = 42" "divmod(n,x)"
10000000 loops, best of 3: 0.124 usec per loop

Chris Angelico

unread,

Apr 7, 2015, 3:25:51 PM4/7/15

to Python

On Wed, Apr 8, 2015 at 5:19 AM, Ian Kelly <ian.g...@gmail.com> wrote:
>> % and probably // call divmod internally and toss one of the results.
>> Slightly faster (5.7 versus 6.1 microseconds on my machine) is
>
> Not on my box.
>
> $ python3 -m timeit -s "n = 1000000; x = 42" "n % x; n // x"
> 10000000 loops, best of 3: 0.105 usec per loop
> $ python3 -m timeit -s "n = 1000000; x = 42" "divmod(n,x)"
> 10000000 loops, best of 3: 0.124 usec per loop

Nor mine, though eliminating the global lookup cuts some of the time:

rosuav@sikorsky:~$ python3 -m timeit -s "n = 1000000; x = 42" "n % x; n // x"
1000000 loops, best of 3: 0.231 usec per loop
rosuav@sikorsky:~$ python3 -m timeit -s "n = 1000000; x = 42" "divmod(n,x)"
1000000 loops, best of 3: 0.305 usec per loop
rosuav@sikorsky:~$ python3 -m timeit -s "n = 1000000; x = 42; dm =
divmod" "dm(n,x)"
1000000 loops, best of 3: 0.26 usec per loop

But this is microbenchmarking. It's pretty pointless.

ChrisA

Ben Bacarisse

unread,

Apr 7, 2015, 3:27:20 PM4/7/15

to

I get similar results, but the times switch over when n is large enough
to become a bignum.

--
Ben.

Ian Kelly

unread,

Apr 7, 2015, 3:29:24 PM4/7/15

to Python

On Tue, Apr 7, 2015 at 1:19 PM, Ian Kelly <ian.g...@gmail.com> wrote:
> On Tue, Apr 7, 2015 at 12:55 PM, Terry Reedy <tjr...@udel.edu> wrote:
>> On 4/7/2015 1:44 PM, Ian Kelly wrote:
>>
>>>>>> def to_base(number, base):
>>>
>>> ... digits = []
>>> ... while number > 0:
>>> ... digits.append(number % base)
>>> ... number //= base
>>> ... return digits or [0]
>>> ...
>>>>>>
>>>>>>
>>>>>> to_base(2932903594368438384328325832983294832483258958495845849584958458435439543858588435856958650865490,
>>>>>> 429496729)
>>>
>>> [27626525, 286159541, 134919277, 305018215, 329341598, 48181777,
>>> 79384857, 112868646, 221068759, 70871527, 416507001, 31]
>>> About 15 microseconds.
>>
>>
>> % and probably // call divmod internally and toss one of the results.
>> Slightly faster (5.7 versus 6.1 microseconds on my machine) is
>
> Not on my box.
>
> $ python3 -m timeit -s "n = 1000000; x = 42" "n % x; n // x"
> 10000000 loops, best of 3: 0.105 usec per loop
> $ python3 -m timeit -s "n = 1000000; x = 42" "divmod(n,x)"
> 10000000 loops, best of 3: 0.124 usec per loop

But curiously, if I time the whole function, then my results mirror
yours; I wonder why that is. I don't see anything obvious in the
disassembly that would explain it.

Ian Kelly

unread,

Apr 7, 2015, 4:16:33 PM4/7/15

to Python

On Tue, Apr 7, 2015 at 9:40 AM, <jonas.t...@gmail.com> wrote:
> Well of course you use same principles like a binary search setting min and max, closing in on the digit. In this case the searched numbers > base^exp and number< base^exp+1.
>
> But since the search is within large bases upto 32-bit space, so base 4294967295 is the biggest allowed. I need to find the nearest less exp in base for each (lets call them pseudo digits). But as you see there it will take time to add them up. So better doing a binary search, you know min-max half (iteration). You can do the same for a random oracle min max within range, and if the number of tries in general over 22 i think a random oracle do it better than a binary search.

Okay, so you're talking about doing a binary search but picking the
partition point randomly instead of evenly?

> It was a long time since i did this, but i do know there is a threshold where searching min max with the oracle will be faster than the binary search.

I don't know where you got this idea from, but it's fiction. If you
split the space in half, then you are always reducing the amount of
work remaining by a factor of 2, and the total work is O(log n). If
you do an uneven split, then while you will sometimes make a good
partition and reduce the amount of work by more than a factor of two,
more often the target will be in the larger subspace and you will have
reduced the amount of work by less than a factor of two. In the worst
case of this, you only remove one element from the search space at a
time and end up with a linear search.

As a demonstration of this, here's a function which does a binary
search for a number in a range and returns the number of partitions it
had to perform to find the number:

>>> def binsearch(n, min, max):
... if min == max: return 0
... med = (max + min) // 2
... if n <= med:
... return 1 + binsearch(n, min, med)
... else:
... return 1 + binsearch(n, med+1, max)
...

And here's the average number of partitions needed to find a number in
various ranges:

>>> sum(binsearch(i, 1, n) for i in range(1, n+1)) / n
16.68928
>>> n = 1000000
>>> sum(binsearch(i, 1, n) for i in range(1, n+1)) / n
19.951424
>>> n = 10000000
>>> sum(binsearch(i, 1, n) for i in range(1, n+1)) / n
23.3222784

Now let's try a variant function that splits the range at one third
instead of evenly:

>>> def trisearch(n, min, max):
... if min == max: return 0
... tern = (min + min + max) // 3
... if n <= tern:
... return 1 + trisearch(n, min, tern)
... else:
... return 1 + trisearch(n, tern+1, max)
...
>>> n = 100000
>>> sum(trisearch(i, 1, n) for i in range(1, n+1)) / n
17.88387
>>> n = 1000000
>>> sum(trisearch(i, 1, n) for i in range(1, n+1)) / n
21.502276
>>> n = 10000000
>>> sum(trisearch(i, 1, n) for i in range(1, n+1)) / n
25.1157592

The ternary version invariably takes more partitions on average to
find the goal number. Now let's try a third variant that partitions
the range randomly:

>>> def randsearch(n, min, max):
... if min == max: return 0
... part = randrange(min, max)
... if n <= part:
... return 1 + randsearch(n, min, part)
... else:
... return 1 + randsearch(n, part+1, max)
...
>>> n = 100000
>>> sum(randsearch(i, 1, n) for i in range(1, n+1)) / n
22.17863
>>> n = 1000000
>>> sum(randsearch(i, 1, n) for i in range(1, n+1)) / n
26.78371
>>> n = 10000000
>>> sum(randsearch(i, 1, n) for i in range(1, n+1)) / n
31.3916557

As you can see, this one does a lot worse on average, and at n=1e6
it's already performing worse than either of the previous versions did
at n=1e7. I'm not even going to try this for n=1e8 because it would
take too long to run.

Mark Lawrence

unread,

Apr 7, 2015, 4:42:33 PM4/7/15

to pytho...@python.org

On 07/04/2015 17:33, Dave Angel wrote:
> On 04/07/2015 11:40 AM, jonas.t...@gmail.com wrote:
>

> I haven't seen one line of Python from you yet, so perhaps you're just
> yanking our chain.
>

This wouldn't surprise me, he's the guy who refused point blank to
modify his use of gg some 18 months ago.

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

Serhiy Storchaka

unread,

Apr 7, 2015, 4:58:18 PM4/7/15

to pytho...@python.org

Try large numbers.

jonas.t...@gmail.com

unread,

Apr 7, 2015, 6:35:18 PM4/7/15

to

I am not sure you guys realised, that althoug the size of the factors to muliply expands according to base^(exp+1) for each digitplace the number of comparissons needed to reach the digit place (multiple of base^exp+1) is constant with my approach/method.

Steven D'Aprano

unread,

Apr 7, 2015, 6:37:11 PM4/7/15

to

On Wed, 8 Apr 2015 12:36 am, jonas.t...@gmail.com wrote:

> Bullshit declare two integers in any language one 7 and one 4 and then
> write x=7+4; if you find a programming language where that does not yield
> 11 tell me.

In Forth, you can set the base to any arbitrary integer (within reason).

steve@orac:/home/steve$ gforth
Gforth 0.7.0, Copyright (C) 1995-2008 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
7 4 + . 11 ok
hex ok
7 4 + . B ok
7 4 + ok
2 base ! ok
. 1011 ok

The dot . prints the value on the top of the stack. In Python terms, the
closest equivalent would be:

print (7 + 4) # prints 11 by default
# set the base to hex
print (7 + 4) # prints B
x = 7 + 4
# set the base to 2
print (x) # prints 1011

except that Python doesn't allow you to change the base used by ints, it is
always decimal.

In Forth, however, setting the base doesn't just change the *display* of
integers, it also changes how you enter them:

3
:7: Undefined word
>>>3<<<
Backtrace:
$B7252EDC throw
$B725F638 no.extensions
$B7253054 interpreter-notfound1
ok

So in base 2 mode, it doesn't recognise 3 as a number. If I want to enter
three, I have to enter it in binary:

11 ok
decimal ok
. 3 ok

> Integers are internally assumed to be base 10 otherwise you could not
> calculate without giving the base.

That is *absolutely not* the case in Forth. It's not even the case in
Python: integers are actually stored in either binary (base 2) or some very
large base, I think equivalent to base 256, depending on the version of
Python and the size of the int.

> All operations on integers addition, subtraction, multiplication and
> division assume base 10.

That's just ridiculous.

In Python, like most other languages, the only thing which assumes base 10
is entry and printing of integers.

--
Steven

Cameron Simpson

unread,

Apr 7, 2015, 6:51:58 PM4/7/15

to Steven D'Aprano, pytho...@python.org

On 08Apr2015 08:36, Steven D'Aprano <steve+comp....@pearwood.info> wrote:
>On Wed, 8 Apr 2015 12:36 am, jonas.t...@gmail.com wrote:
>> Bullshit declare two integers in any language one 7 and one 4 and then
>> write x=7+4; if you find a programming language where that does not yield
>> 11 tell me.
>
>In Forth, you can set the base to any arbitrary integer (within reason).

[...snip...]

In dc also (UNIX's reverse polish arbitrary precision "decimal calculator").
You can set the input and output representation bases, and proceed without
leading base indicators.

Cheers,
Cameron Simpson <c...@zip.com.au>

Hofstadter's Law: It always takes longer than you expect, even when you take
into account Hofstadter's Law.
- Douglas Hosfstadter, Godel, Escher, Bach: an Eternal Golden Braid

Steven D'Aprano

unread,

Apr 7, 2015, 6:52:05 PM4/7/15

to

On Wed, 8 Apr 2015 01:43 am, jonas.t...@gmail.com wrote:

> No that is not what i am saying, i am saying if you do operations on two
> integers the machine will assume base 10.

We understand what you are saying. You are simply WRONG.

There hasn't been a machine in common use that used other than binary for
integers for probably forty years now, and there has *never* been a PC that
has used other than binary for integers. When you write 12345 as an integer
in a programming language, it is NOT stored internally as five decimal
digits 1 2 3 4 5 in *nearly all languages*. There might be one or two
exceptions, e.g. Hypertalk would store it as a string of bytes 0x3132333435
in hexadecimal, or in binary 0011000100110010001100110011010000110101.

(As you can guess, Hypertalk is not the most efficient of languages.)

But most languages will store the decimal int 12345 in binary:

0011000000111001 # 16 bit word size

00000000000000000011000000111001 # 32 bit word size

(Ignoring Big Endian/Little Endian issues -- some machines store the bytes
the other way around.)

The fact is computer languages automatically convert source code from
(usually) decimal base to whatever their internal storage format uses,
which is normally base 2. They don't work in decimal, using the same
decimal routines you learned in school.

--
Steven

Steven D'Aprano

unread,

Apr 7, 2015, 6:55:32 PM4/7/15

to

On Wed, 8 Apr 2015 12:32 am, Ian Kelly wrote:

> On average, a random Oracle with a search space of 1000000 will need
> 1000000 guesses.

Surely on average it will only take 500000 guesses?

Best case is that it gets lucky the first time (1 guess). Worst case is that
it guesses every wrong answer until there is only one answer it hasn't
given, which is right (1000000 guesses). I assume that the Oracle is smart
enough to never repeat a guess.

--
Steven

Steven D'Aprano

unread,

Apr 7, 2015, 6:57:27 PM4/7/15

to

On Tue, 7 Apr 2015 07:44 pm, jonas.t...@gmail.com wrote:

> I want todo faster baseconversion for very big bases like base 1 000 000,
> so instead of adding up digits i search it.

What digits would you use for base one-million?

Base 2 uses 0 1.
Base 3 uses 0 1 2.
Base 10 uses 0 1 2 3 4 5 6 7 8 9.
Base 16 uses 0 1 2 3 4 5 6 7 8 9 A B C D E F.

Base one million uses what?

How would you write down 12345 in base one-million?

--
Steven

Ian Kelly

unread,

Apr 7, 2015, 7:32:05 PM4/7/15

to Python

I didn't make that assumption. With replacement, the mean is 1000000,
per the binomial distribution.

Message has been deleted

Steven D'Aprano

unread,

Apr 7, 2015, 8:18:35 PM4/7/15

to

On Wed, 8 Apr 2015 03:07 am, jonas.t...@gmail.com wrote:

> So you can tell me the first (higest) digit of the integer
>
2932903594368438384328325832983294832483258958495845849584958458435439543858588435856958650865490

First digit:
py> s
= '2932903594368438384328325832983294832483258958495845849584958458435439543858588435856958650865490'
py> s[0]
'2'

Highest digit:

py> max(s)
'9'

> Using base 429496729?

That makes no sense. With base 429496729 you need to have 429496729
different digits. Of course it is possible that, just by chance, you end up
with something that only includes the digits 0 through 9. Of course that is
possible, just like there are decimal numbers which only use digits 0
through 1, e.g. 10001.

If you want to go the other way, and convert s FROM decimal INTO base
429496729, you need to tell us what digits to use. You need 429496729
different digits.

> How long time did it take to find it?

A tiny fraction of a second.

py> from timeit import Timer
py> setup = "s
= '2932903594368438384328325832983294832483258958495845849584958458435439543858588435856958650865490'"
py> t = Timer("s[0]; max(s)", setup)
py> min(t.repeat())
7.158415079116821

That's just over 7 seconds to calculate both the first and the highest digit
one million times, or about 7 microseconds each time.

--
Steven

Dave Angel

unread,

Apr 7, 2015, 8:38:35 PM4/7/15

to pytho...@python.org

Baloney.

But even if it were true, a search is slower than a divide.

--
DaveA

Steven D'Aprano

unread,

Apr 7, 2015, 8:38:57 PM4/7/15

to

On Wed, 8 Apr 2015 03:44 am, Ian Kelly wrote:

>>>>
to_base(2932903594368438384328325832983294832483258958495845849584958458435439543858588435856958650865490,
>>>> 429496729)
> [27626525, 286159541, 134919277, 305018215, 329341598, 48181777,
> 79384857, 112868646, 221068759, 70871527, 416507001, 31]

They're not exactly *digits* though, are they? Without an easy to use set of
429496729 different symbols, the whole exercise is rather pointless. It's
not more compact: 97 decimal digits, versus 121 characters in the list
representation. 110 if you strip out the spaces between items. It's
certainly not more memory efficient: the long int 293...490 takes 56 bytes,
compared to 80 bytes for just the list, not including the memory used by
its 12 int items. (Results may vary in other versions of Python.) You can't
do arithmetic on it faster than Python's built-ins.

Besides, it isn't clear to me whether Jonas wants to convert decimal
293...490 *into* base 429496729 as you have done, or *base 429496729*
293...490 into decimal.

--
Steven

Gregory Ewing

unread,

Apr 7, 2015, 8:49:22 PM4/7/15

to

jonas.t...@gmail.com wrote:
> No i don't i say the operations assume base ten other wise INT A=7,B=4
> Calculate C=A+B would not yield 11 as an answer.

The answer, when converted to base 10, will still be
11 regardless of the base in which the arithmetic is
performed.

For example, in base 2:

A = 0111 (decimal 7)
B = 0100 (decimal 4)
----
C = 1011

Now convert binary 1011 to decimal and see what
you get.

--
Greg

Gregory Ewing

unread,

Apr 7, 2015, 9:00:12 PM4/7/15

to

Steven D'Aprano wrote:

> What digits would you use for base one-million?

Interesting question. Unicode currently has about
75,000 CJK characters, so we would need to find about
12 more independently developed cultures with similar
writing systems to get enough characters. I suggest
revisiting the issue when interstellar travel and
exploration have been better developed.

--
Greg

Steven D'Aprano

unread,

Apr 7, 2015, 9:18:24 PM4/7/15

to

On Wed, 8 Apr 2015 10:38 am, Steven D'Aprano wrote:

> On Wed, 8 Apr 2015 03:44 am, Ian Kelly wrote:
>
>>>>>
>
to_base(2932903594368438384328325832983294832483258958495845849584958458435439543858588435856958650865490,
>>>>> 429496729)
>> [27626525, 286159541, 134919277, 305018215, 329341598, 48181777,
>> 79384857, 112868646, 221068759, 70871527, 416507001, 31]
>
>
> They're not exactly *digits* though, are they?

Oh, I forgot... I think this is why Python long ints effectively uses a base
256 internal storage. If memory serves me correctly, internally a long int
is stored as an array of bytes using digits:

\x0 \x1 \x2 ... \xFF

(in decimal, 0 to 255). Each digit takes a single byte, so it's nicely
compact, and Python includes a bunch of fast algorithms for doing
arithmetic on these.

If the array is always a multiple of four bytes, then that is logically
equivalent to using base 4294967296 with 32-bit digits:

\x0\x0\x0\x0 \x0\x0\x0\x1 \x0\x0\x0\x2 ... \xFF\xFF\xFF\xFF

(in decimal, 0 to 4294967295). It's still fundamentally binary though, but I
suppose it's not entirely pointless to talk about base 2**32 or base 2**64
numbers, in the context of a high-performance Big Num implementation. But
it wouldn't be practical to use *arbitrary bases* up to 2**32 or 2**64, and
it certainly isn't practical to have human beings read or write numbers in
such large bases.

--
Steven

Mark Lawrence

unread,

Apr 7, 2015, 9:39:30 PM4/7/15

to pytho...@python.org

If it's available during PyCon can't we borrow the time machine? :)

Chris Angelico

unread,

Apr 7, 2015, 9:46:43 PM4/7/15

to pytho...@python.org

On Wed, Apr 8, 2015 at 8:51 AM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> There hasn't been a machine in common use that used other than binary for
> integers for probably forty years now, and there has *never* been a PC that
> has used other than binary for integers. When you write 12345 as an integer
> in a programming language, it is NOT stored internally as five decimal
> digits 1 2 3 4 5 in *nearly all languages*. There might be one or two
> exceptions, e.g. Hypertalk would store it as a string of bytes 0x3132333435
> in hexadecimal, or in binary 0011000100110010001100110011010000110101.
>
> (As you can guess, Hypertalk is not the most efficient of languages.)

FWIW, REXX does the same thing as Hypertalk - the integer 12345 and
the string "12345" are identical. If you add the string "1" and the
string "12345", you get the string "12346". (Concatenation is a
separate operation, and would result in the string "112345".) However,
Steven is still correct in that it is not stored as five decimal
digits; it's stored as five bytes in your system encoding. On my
systems, those were all ASCII-compatible, so it'd be 0x3132333435
(plus some metadata about string length and stuff, not germane to the
discussion), though I suspect that an EBCDIC system would store it as
0xF1F2F3F4F5 instead.

ChrisA

Chris Angelico

unread,

Apr 7, 2015, 9:49:50 PM4/7/15

to pytho...@python.org

You could use base 1,114,112 fairly readily in any decent modern
programming language. That'll happily represent base one-million.

ChrisA

Marko Rauhamaa

unread,

Apr 8, 2015, 12:31:26 AM4/8/15

to

Steven D'Aprano <steve+comp....@pearwood.info>:

> On Wed, 8 Apr 2015 03:07 am, jonas.t...@gmail.com wrote:
>> Using base 429496729?
>
> That makes no sense. With base 429496729 you need to have 429496729
> different digits.

You can use integers as digits. We already do that when we express times
and angles in base-60. I think even the Babylonians might have used such
structured digits.

>>> x = 2932903594368438384328325832983294832483258958495845849584958458435439543858588435856958650865490
>>> while x >= 429496729: x //= 429496729
...
>>> x
31

Marko

jonas.t...@gmail.com

unread,

Apr 8, 2015, 1:51:03 AM4/8/15

to

Well digit places have to be comma separated i think i told you, in low bases we have unique digits so there is no problems write out numbers without separaation of digitplaces. But when the base is higher than the number of digits you have a choice to make. Either you invent a new numeral(number digit.

I think you realise that BB = 11,11

So your example would of course be ,12345

I hope you see A,B,C,D,E,F is only replacement for 10,11,12,13,14,15

wxjm...@gmail.com

unread,

Apr 8, 2015, 2:08:04 AM4/8/15

to

=========

Why should a "digit" contain a single/unique character?

Representation of the number 257 in base 256:

257 (base 10) --> FF 02 (base 256)

jonas.t...@gmail.com

unread,

Apr 8, 2015, 2:12:18 AM4/8/15

to

No need Greg it is an easy task creating visual interpretable symbols from low to high using lines in a point/pixel space. I could create the set in a day.

And if i use colour space it is even easier.

The hard part is making the point space visible and interpretabel without calculation and holding remainders in head.

But then again when numbers have more than 5 digitplaces people tend to interpretate them digitwise anyway.

10010 is interpretated as a single digit by brain.
7582560 people mostly interpretate as 7 m 582 000 and 560.

You can tell how people group and think about numbers by having them reading up a very long telephone number.

wxjm...@gmail.com

unread,

Apr 8, 2015, 2:18:24 AM4/8/15

to

======

Oops, typo, erratum

*** 257 (base 10) --> 01 01 (base 256) ***

instead of

jonas.t...@gmail.com

unread,

Apr 8, 2015, 2:20:00 AM4/8/15

to

Well Steven you just draw a line under connecting them all, and now you are allowed to call whatever combination you write down a single digit.

And you have just created 429496729 unique symbols ;), in a pencil stroke.
I know at least one intergalactic poster that will be impressed ;)

Ian Kelly

unread,

Apr 8, 2015, 3:16:24 AM4/8/15

to Python

On Tue, Apr 7, 2015 at 4:35 PM, <jonas.t...@gmail.com> wrote:
> I am not sure you guys realised, that althoug the size of the factors to muliply expands according to base^(exp+1) for each digitplace the number of comparissons needed to reach the digit place (multiple of base^exp+1) is constant with my approach/method.

No it isn't. You do one comparison on every iteration of your while
loop, and you do one iteration for every digit. How is that constant?

jonas.t...@gmail.com

unread,

Apr 8, 2015, 3:28:23 AM4/8/15

to

Ok the number of comparissons is linear for each digitplace not exponential.

jonas.t...@gmail.com

unread,

Apr 8, 2015, 3:35:59 AM4/8/15

to

Den onsdag 8 april 2015 kl. 09:16:24 UTC+2 skrev Ian:

The for loop should be replaced with a version of binary search making comparisson instead of adding to digit. It will be alot faster with a base using 32-bits.

But the point is the number of operations is linear for each digitplace with my approach/method, you will multiply bigger numbers. But the numbers of comparissons and multiplications is linear over the digitspace.

Mel Wilson

unread,

Apr 8, 2015, 9:40:46 AM4/8/15

to

On Tue, 07 Apr 2015 23:19:49 -0700, jonas.thornvall wrote:

> And you have just created 429496729 unique symbols ;), in a pencil
> stroke.

No. You did that, when you said base 429496729. Representing the
symbols in a computer is no problem, any Python long int can do that. To
display the symbols, you can use PIL to make up glyphs out of coloured
pixels 864x864. You can keep your glyphs in GIFs. Where you keep them
is up to you. Keeping them in Tumbolia and computing them out as
required will work well.

Ian Kelly

unread,

Apr 8, 2015, 10:36:46 AM4/8/15

to Python

In other words, the first part of your algorithm where you find the
largest digit place takes O(log n) operations. But the second part
where you do a search for each digit takes O(b * log n) operations, so
the overall number of operations is O(b * log n). If you replace the
linear search with a binary search, that's still O(log b * log n).

Meanwhile, the division method that I and others have repeatedly
suggested does only one comparison and one division per digit, which
is just O(log n).

Note that this analysis doesn't take into account the complexity of
the individual arithmetic operations, but that should affect either
algorithm equally.

Ian Kelly

unread,

Apr 8, 2015, 10:38:20 AM4/8/15

to Python

On Tue, Apr 7, 2015 at 7:18 PM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> On Wed, 8 Apr 2015 10:38 am, Steven D'Aprano wrote:
>
>> On Wed, 8 Apr 2015 03:44 am, Ian Kelly wrote:
>>
>>>>>>
>>
> to_base(2932903594368438384328325832983294832483258958495845849584958458435439543858588435856958650865490,
>>>>>> 429496729)
>>> [27626525, 286159541, 134919277, 305018215, 329341598, 48181777,
>>> 79384857, 112868646, 221068759, 70871527, 416507001, 31]
>>
>>
>> They're not exactly *digits* though, are they?
>
> Oh, I forgot... I think this is why Python long ints effectively uses a base
> 256 internal storage. If memory serves me correctly, internally a long int
> is stored as an array of bytes using digits:
>
> \x0 \x1 \x2 ... \xFF
>
> (in decimal, 0 to 255). Each digit takes a single byte, so it's nicely
> compact, and Python includes a bunch of fast algorithms for doing
> arithmetic on these.

According to the comments in longintrepr.h, Python uses either 15- or
30-bit digits, determined at configure time.

jonas.t...@gmail.com

unread,

Apr 8, 2015, 10:56:15 AM4/8/15

to

Well many interpretate a numeral digit as a single unique symbol representing a number written out in one pensttroke. I just pointed out that using handwriting or underscore a compination can be considered a numerical digit in itself.

There is no need for inventing a new set of characters representing 32-bit numbers. You will not be able to learn them by heart anyway, unless they build on a interpretation system binaries, decimals.

Of course you could rather easily create a new set building on Points and lines and their interpretation.

Well just saying the Babylonians and Sumerians already did this, even the old greek had some sort of system with 120 unique digits. But what would be the need for such system when we so easy can create a unique permutation representing any number, using our 10 digits.

Denis McMahon

unread,

Apr 8, 2015, 1:28:34 PM4/8/15

to

Bzzzzttttt. Wrong.

0101(256) is 0 * 256^3 + 1 * 256^2 + 0 * 256^1 + 1 * 256^0

= 65537

The whole point of "base x" is that any number in the range 0 .. x^1 is
represented with a single characterisation, otherwise you don't have
"base x".

This is the same fundamental issue as the OP is failing to understand -
base x notation is a human readability and representation thing, not an
inherent feature of numbers.

--
Denis McMahon, denismf...@gmail.com

Mel Wilson

unread,

Apr 8, 2015, 1:34:39 PM4/8/15

to

On Wed, 08 Apr 2015 07:56:05 -0700, jonas.thornvall wrote:

> There is no need for inventing a new set of characters representing
> 32-bit numbers. You will not be able to learn them by heart anyway,
> unless they build on a interpretation system binaries, decimals.

See Jorge Luis Borges, _Funes the Memorious_. Gotta keep up with the
literature.

wxjm...@gmail.com

unread,

Apr 8, 2015, 1:51:04 PM4/8/15

to

====

No, not really.

01 01 in a *decimal* representation, a set of 10 characters.

01 01 means (should be understood as) 1 x 256**1 + 1 X 256**0 = 257

In other words, the usual "human" set of characters 0..9 with a
base != 10.

The idea is just to show, one can work with a "limited" set of chars
and it's no necessary to have the cardinal of a set of characters
corresponding to the base.

jonas.t...@gmail.com

unread,

Apr 8, 2015, 3:28:34 PM4/8/15

to

One thing is true though the bigger the chunks the less operations doing add.
So arithmetic may turn out to be obsolete and replaced with search in lookuptables.

jonas.t...@gmail.com

unread,

Apr 8, 2015, 3:36:43 PM4/8/15

to

When doing by hand and working within the digitspace of a single decimal digit like 3+4 the operation is implicit. You do not really add anything you use a lookup table.

But if you take an expression like 193+169, most people perform the arithmetic. Unless they do not happen to be your favourit numbers and you learnt the sum by heart.

Top down or bottom up is basicly the choices.

Mark Lawrence

unread,

Apr 8, 2015, 4:28:39 PM4/8/15

to pytho...@python.org

Are you trolling or do you simply not have the faintest idea?

Mark Lawrence

unread,

Apr 8, 2015, 4:31:01 PM4/8/15

to pytho...@python.org

I suggest that you put this on python ideas before writing up a PEP, it
seems like a sure fire winner to me.

BartC

unread,

Apr 8, 2015, 5:23:05 PM4/8/15

to

On 07/04/2015 23:57, Steven D'Aprano wrote:
> On Tue, 7 Apr 2015 07:44 pm, jonas.t...@gmail.com wrote:
>
>
>> I want todo faster baseconversion for very big bases like base 1 000 000,
>> so instead of adding up digits i search it.
>
> What digits would you use for base one-million?
>
> Base 2 uses 0 1.
> Base 3 uses 0 1 2.
> Base 10 uses 0 1 2 3 4 5 6 7 8 9.
> Base 16 uses 0 1 2 3 4 5 6 7 8 9 A B C D E F.
>
> Base one million uses what?
>
> How would you write down 12345 in base one-million?
>

That's only necessary when printing out the numbers as text (or reading
them in). Even then, you don't need a million unique symbols; groups of
letters or digits will do. The simplest way to represent a digit is
write it out in normal base 10.

So the the number 3,012,345, in base 1000000, could represented in text
form as the two 'digit':

(3, 12345)

ie. 3*1000000 + 12345*1. In internal binary, each digit can just be
stored in the normal form, probably as one digit per 32-bit integer.

(I have a big integer library that works exactly this way, using a
decimal representation, but using base 1000000 with digits 0 to 999999,
rather than base 10 and digits 0 to 9.)

--
Bartc

Marko Rauhamaa

unread,

Apr 8, 2015, 6:06:48 PM4/8/15

to

BartC <b...@freeuk.com>:

> So the the number 3,012,345, in base 1000000, could represented in
> text form as the two 'digit':
>
> (3, 12345)
>
> ie. 3*1000000 + 12345*1. In internal binary, each digit can just be
> stored in the normal form, probably as one digit per 32-bit integer.
>
> (I have a big integer library that works exactly this way, using a
> decimal representation, but using base 1000000 with digits 0 to
> 999999, rather than base 10 and digits 0 to 9.)

It is common to implement big integers in base-2**16, base-2**32 or
base-2**64. Arithmetics is performed just like in elementary school.

Marko

Chris Kaynor

unread,

Apr 8, 2015, 8:02:54 PM4/8/15

to Marko Rauhamaa, pytho...@python.org

While basically true, its not quite true: there are generally more
efficient but more complex algorithms for the computations than are
typically taught in school. For example, multiplication will often be
done with the Karatsuba algorithm, which is O(n^1.585)[1], rather than
long multiplication, which is O(n^2). Processors internally often use
shift-and-add, which is a variation on long multiplication, and is
fairly easy to implement in hardware for fixed-size integers.

[1] http://en.wikipedia.org/wiki/Karatsuba_algorithm

Steven D'Aprano

unread,

Apr 8, 2015, 10:32:40 PM4/8/15

to

On Wed, 8 Apr 2015 11:49 am, Chris Angelico wrote:

> You could use base 1,114,112 fairly readily in any decent modern
> programming language. That'll happily represent base one-million.

Well, not really...

Here is the breakdown of Unicode code points by category, as of Python 3.3:

# Other
Cc: 65 (control characters)
Cf: 139 (format characters)
Cn: 864415 (unassigned)
Co: 137468 (private use)
Cs: 2048 (surrogates)

# Letters
Ll: 1751 (lowercase)
Lm: 237 (modifier)
Lo: 97553 (other)
Lt: 31 (titlecase)
Lu: 1441 (uppercase)

# Marks
Mc: 353 (spacing combining)
Me: 12 (enclosing)
Mn: 1280 (nonspacing)

# Numbers
Nd: 460 (decimal digit)
Nl: 224 (letter)
No: 464 (other)

# Punctuation
Pc: 10 (connector)
Pd: 23 (dash)
Pe: 71 (close)
Pf: 10 (final quote)
Pi: 12 (initial quote)
Po: 434 (other)
Ps: 72 (open)

# Symbols
Sc: 48 (currency)
Sk: 115 (modifier)
Sm: 952 (math)
So: 4404 (other)

# Separator
Zl: 1 (line)
Zs: 18 (paragraph)
Zp: 1 (space)

Clearly we shouldn't use control or format characters, surrogates,
separators, marks, etc. (At least, I hope it is clear why you don't want,
say, newlines, to be used as digits.) Punctuation is borderline, as are
symbols, since that won't interoperate well with anything else. How can you
parse number+number if the numbers themselves might contain + signs? I
wouldn't use unassigned code points, as that is all but guaranteed to lead
to future problems, but I might reluctantly allow private use. That leaves
us the following which *may* be suitable:

Co: 137468 (private use)
Ll: 1751 (lowercase)
Lo: 97553 (other)
Lt: 31 (titlecase)
Lu: 1441 (uppercase)
Nd: 460 (decimal digit)
Nl: 224 (letter)
No: 464 (other)
Sc: 48 (currency)
Sm: 952 (math)
So: 4404 (other)

which comes to a total of 244796, far short of a million. Add in the 632
punctuation marks if you like, and we're short.

There are other problems too:

- Confusables. Can you tell the difference between AΑА versus АAΑ,
or ВΒB versus BΒВ? Or even O versus 0?

- Lack of glyphs for the majority of those code points in most fonts.
Most numbers will look like a sequence of boxes.

- Difficulty of data entry.

- Some people's digits will not have the value that they expect,
e.g. digit '1' might not have the numeric value 1, for at least
all-but-ten of the 460 different decimal digits in use.

- Realistically, who is going to use this?

Even as an intellectual exercise, using huge bases for human input and
output isn't very useful. The idea of using massive implicit bases for the
internal implementation of BigNums is quite reasonable, but for human input
and output, it doesn't fly.

--
Steven

Chris Angelico

unread,

Apr 8, 2015, 10:40:14 PM4/8/15

to pytho...@python.org

On Thu, Apr 9, 2015 at 12:32 PM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> - Realistically, who is going to use this?

Nobody. I was never suggesting it as a serious option; just threw it
out there as another dumb alternative :)

ChrisA

wxjm...@gmail.com

unread,

Apr 9, 2015, 2:14:41 AM4/9/15

to

Le jeudi 9 avril 2015 04:40:14 UTC+2, Chris Angelico a écrit :

====

(Most) everything is doable.
There are even people who are recommending to
represent this decimal number 8364 with a
sequence of these three decimal numbers 226 130 172
;-)
jmf

BartC

unread,

Apr 9, 2015, 5:19:37 AM4/9/15

to

On 08/04/2015 23:06, Marko Rauhamaa wrote:
> BartC <b...@freeuk.com>:
>
>> So the the number 3,012,345, in base 1000000, could represented in
>> text form as the two 'digit':
>>
>> (3, 12345)
>>
>> ie. 3*1000000 + 12345*1. In internal binary, each digit can just be
>> stored in the normal form, probably as one digit per 32-bit integer.
>>
>> (I have a big integer library that works exactly this way, using a
>> decimal representation, but using base 1000000 with digits 0 to
>> 999999, rather than base 10 and digits 0 to 9.)
>
> It is common to implement big integers in base-2**16, base-2**32 or
> base-2**64.

This was a development of a version that used base-10 strings to encode
the numbers.

It might be a magnitude slower than, say, built-in Python big integers,
but a huge advantage had been in being able to print out the results
easily without needing division.

(I know that there is some algorithm to print an arbitrary precision
binary number using shifts and masks, but I didn't know it then.)

> Arithmetics is performed just like in elementary school.

Not in the school I went to; we used decimal!

--
Bartc

Marko Rauhamaa

unread,

Apr 9, 2015, 6:00:16 AM4/9/15

to

BartC <b...@freeuk.com>:

> On 08/04/2015 23:06, Marko Rauhamaa wrote:
>> Arithmetics is performed just like in elementary school.
>
> Not in the school I went to; we used decimal!

The basic arithmetic algorithms are independent of the base.

For example, here's how you can add two 128-bit integers in C using
64-bit digits:

typedef struct {
uint64_t lo, hi;
} uint128_t;

uint128_t add128(uint128_t x, uint128_t y)
{
uint128_t result = {
.lo = x.lo + y.lo,
.hi = x.hi + y.hi
};
if (result.lo < x.lo)
result.hi++;
return result;
}

Works both for signed and unsigned integers.

Marko

Alain Ketterlin

unread,

Apr 9, 2015, 7:57:59 AM4/9/15

to

Marko Rauhamaa <ma...@pacujo.net> writes:

> The basic arithmetic algorithms are independent of the base.

Right.

> For example, here's how you can add two 128-bit integers in C using
> 64-bit digits:
>
> typedef struct {
> uint64_t lo, hi;
> } uint128_t;
>
> uint128_t add128(uint128_t x, uint128_t y)
> {
> uint128_t result = {
> .lo = x.lo + y.lo,
> .hi = x.hi + y.hi
> };
> if (result.lo < x.lo)
> result.hi++;
> return result;
> }
>
> Works both for signed and unsigned integers.

(Slightly off-topic, but...)

No, it would not work for signed integers (i.e., with lo and hi of
int64_t type), because overflow is undefined behavior for signed.

-- Alain.

Marko Rauhamaa

unread,

Apr 9, 2015, 8:45:26 AM4/9/15

to

Alain Ketterlin <al...@dpt-info.u-strasbg.fr>:

> No, it would not work for signed integers (i.e., with lo and hi of
> int64_t type), because overflow is undefined behavior for signed.

All architectures I've ever had dealings with have used 2's-complement
integers. Overflow is well-defined, well-behaved and sign-independent
wrt addition, subtraction and multiplication (but not division).

Marko

Alain Ketterlin

unread,

Apr 9, 2015, 8:56:17 AM4/9/15

to

You are confused: 2's complement does not necessarily mean modular
arithmetic. See, e.g.,
http://stackoverflow.com/questions/16188263/is-signed-integer-overflow-still-undefined-behavior-in-c

-- Alain.

Dave Angel

unread,

Apr 9, 2015, 9:19:39 AM4/9/15

to pytho...@python.org

So the C standard can specify such things as undefined. The
architecture still will do something specific, right or wrong, and
that's what Marko's claim was about. The C compiler has separate types
for unsigned and for signed, while the underlying architecture of every
twos complement machine I have used did add, subtract, and multiply as
though the numbers were unsigned (what you call modular arithmetic).

In my microcoding days, the ALU did only unsigned arithmetic, while the
various status bits had to be interpreted to decide whether a particular
result was overflow or had a carry. It was in interpreting those status
bits that you had to use the knowledge of whether a particular value was
signed or unsigned.

Then the microcode had to present those values up to the machine
language level. And at that level, we had no carry bits, or overflow,
or anything directly related.

--
DaveA

Marko Rauhamaa

unread,

Apr 9, 2015, 9:28:50 AM4/9/15

to

Alain Ketterlin <al...@dpt-info.u-strasbg.fr>:

Ah, ok, I misunderstood your point. However, what I meant originally is
that the given uintX_t implementation would work even if it were given
intX_t arguments, and the returned uintX_t can be assigned to an intX_t
variable safely.

Marko

Marko Rauhamaa

unread,

Apr 9, 2015, 9:33:51 AM4/9/15

to

Dave Angel <da...@davea.name>:

> So the C standard can specify such things as undefined. The
> architecture still will do something specific, right or wrong, and
> that's what Marko's claim was about. The C compiler has separate types
> for unsigned and for signed, while the underlying architecture of
> every twos complement machine I have used did add, subtract, and
> multiply as though the numbers were unsigned (what you call modular
> arithmetic).

Alain did have a point. *If* I had used int64_t in my algorithm, it
might not have worked because the compiler could legally produce code
that crashes or does weird things in case of a signed integer overflow.

In this day and age, heavily optimized compilers (including gcc)
actively exploit undefined behavior in code generation.

However, I didn't use int64_t, I used uint64_t.

Marko

Marko Rauhamaa

unread,

Apr 9, 2015, 9:50:00 AM4/9/15

to

Marko Rauhamaa <ma...@pacujo.net>:

> In this day and age, heavily optimized compilers (including gcc)
> actively exploit undefined behavior in code generation.

Example: Find the value of INT_MAX programmatically.

========================================================================
#include <stdio.h>

int main()
{
int n = 1;
while (n + 1 > n)
n++;
printf("%d\n", n);
return 0;
}
========================================================================

========================================================================
$ # Use gcc-4.9.2
$ cc -o test test.c
$ ./test
2147483647
$ cc -O9 -o test test.c
$ ./test
^C
========================================================================

The command "cc -S -O9 test.c" shows that the compiler compiles the
while loop into:

========================================================================
.L2:
jmp .L2
========================================================================

Marko

Alain Ketterlin

unread,

Apr 9, 2015, 9:58:07 AM4/9/15

to

Marko Rauhamaa <ma...@pacujo.net> writes:

> Dave Angel <da...@davea.name>:
>
>> So the C standard can specify such things as undefined. The
>> architecture still will do something specific, right or wrong, and
>> that's what Marko's claim was about. The C compiler has separate types
>> for unsigned and for signed, while the underlying architecture of
>> every twos complement machine I have used did add, subtract, and
>> multiply as though the numbers were unsigned (what you call modular
>> arithmetic).
>
> Alain did have a point. *If* I had used int64_t in my algorithm, it
> might not have worked because the compiler could legally produce code
> that crashes or does weird things in case of a signed integer
> overflow.

Yes, exactly. Since the language semantics don't guarantee anything, the
compiler can do whatever it likes. Changing your example to use int64_t
instead of uint64_t would let the compiler remove the "overflow" test.
Because, in:

z = x+y; // all signed ints
if ( z < x )
...

either there was no overflow (and the condition is false), or there was,
and the result is undefined, i.e., "z<x" can be considered false also.

> In this day and age, heavily optimized compilers (including gcc)
> actively exploit undefined behavior in code generation.

They do. Moreover, it looks like it is very difficult to warn the user
(I'm not even sure compilers catch cases like the one above).

> However, I didn't use int64_t, I used uint64_t.

Yes, your code was ok.

-- Alain.

Chris Angelico

unread,

Apr 9, 2015, 10:08:48 AM4/9/15

to pytho...@python.org

On Thu, Apr 9, 2015 at 11:57 PM, Alain Ketterlin
<al...@dpt-info.u-strasbg.fr> wrote:
> Because, in:
>
> z = x+y; // all signed ints
> if ( z < x )
> ...
>
> either there was no overflow (and the condition is false), or there was,
> and the result is undefined, i.e., "z<x" can be considered false also.

Do you mean "all unsigned ints" here? Because if y is negative, the
condition will be true without overflow. If you didn't, then I'm
puzzled as to where the undefined behaviour is coming from.

ChrisA

Alain Ketterlin

unread,

Apr 9, 2015, 10:53:42 AM4/9/15

to

Ouch, you're right, I tried to stick with Marko's example and forgot the
basics. I meant "signed ints", but the "removable" condition should be
something like:

if ( x>0 && y>0 && z<x )
...

i.e., some condition that is never true (either false or undefined).
Thanks for the correction.

If the variables are all unsigned ints, the potential overflow has a
well-defined meaning, and the code is correct and doesn't trigger
undefined behavior. This was Marko's initial example.

-- Alain.

jonas.t...@gmail.com

unread,

Apr 9, 2015, 10:54:54 AM4/9/15

to

Aside from the base changer i've written a bignumb anybase generic multiplication, addition and subtraction routine. My goal is to make the arithmetic in the base of choice whatever size.

And i think i can do it, i already have mult, add , subt working in anybase but not bignumb.

Does this mult work for arbitrary size?
Or will it but out at some point?

<SCRIPT LANGUAGE="Javascript">
/* MULTIPLY TWO VARIABLES */

//CONVERT A DECIMAL NUMBER INTO ANYBASE
function newbase(decNumber,base){
var out =[];
digits=1;
digmult=1;
while(digmult*base<=decNumber){
digmult=digmult*base
digits++;
}
digsave=digmult;
while(decNumber>0 || digits>0){
loop=1;
digit=0;
for (digit=0;digit<=base;digit++) {
if((digit+1)*digmult>decNumber)break;
}
out[digits]=digit;
digmult=digmult*digit;
decNumber=decNumber-digmult;
digsave=digsave/base;
digmult=digsave;
digits--;
}

return out;

}

function naiveMult(base, multOne, multTwo)
{
var total = [];
var addResult = [];
total[0] = 0;

//Check which array bigger minimize multiplications
if (multOne.length>multTwo.length){ mshort=multOne.length; mlong=multTwo.length;}
else { mshort=multOne.length; mlong=multTwo.length;}

for (var i = 0; i < mshort; i ++ )
{
multresult = [];
remainder = 0;
tempVal = 0;
tempDig = 0;
j = 0;

if(i > 0)
{
for(zero = 0; zero < i; zero ++ ) multresult[zero] = 0;
}

while(j < mlong)
{
intMult++;

tempVal = (multOne[i] * multTwo[j]) + remainder;
remainder = 0;

if (tempVal >= base)
{
tempDig = tempVal % base;
remainder = (tempVal - tempDig) / base;
multresult[j + i] = tempDig;
}
else
{
multresult[j + i] = tempVal;
}
j ++ ;
}

if(remainder != 0)
{
multresult[j + i] = remainder;
}
addResult=naiveAdd(base, total, multresult);
total=addResult;
}
return total;
}

/* ADD TWO VARIABLES */

function naiveAdd(base, arrOne, arrTwo)
{

shortArr=[];
longArr=[];
addResult = [];
remainder = 0;
//Find shortest and longest array
if (arrOne.length >= arrTwo.length) {shortArr = arrTwo; longArr = arrOne;}
else {shortArr = arrOne;longArr = arrTwo;}
for (i = 0; i < shortArr.length; i ++ )
{
intAdd ++ ;
arrOne[i] = parseInt(arrOne[i]);
arrTwo[i] = parseInt(arrTwo[i]);
if (isNaN(arrOne[i])) arrOne[i] = 0;
if (isNaN(arrTwo[i])) arrTwo[i] = 0;
addResult[i] = arrOne[i] + arrTwo[i] + remainder;

if (addResult[i] >= base)
{
addResult[i] = addResult[i] - base;
remainder = 1;
}
else
{
remainder = 0;
}
}
//Copying values from the shorter string
while(i<longArr.length) {longArr[i]=parseInt(longArr[i]);addResult[i]=longArr[i]+remainder; remainder=0; i++;}
//If strings of equal lenght but there is a remainder;
if (remainder == 1) addResult[i++] = 1;
else addResult.splice(i, 1);
return addResult;
}

/* MAIN */
function fetchValues()
{
intMult=0;
intAdd=0;
base=10;
x=1;
y=1;
document.calc.result.value="";
document.calc.stats.value="";
document.calc.outbase.value="";
strEval = document.calc.eval.value;
genArr = strEval.split("*");
//fONE=newbase(genArr[0],base);
//document.write(fONE[2]);
//document.calc.outbase.value +=fONE+" + \n";
//fTWO=newbase(genArr[1],base);
//document.calc.outbase.value +=fTWO+"\n";

arrOne=genArr[0].split("").reverse();
arrTwo=genArr[1].split("").reverse();
base = document.calc.formbase.value;
out=naiveMult(base,arrOne,arrTwo);
document.calc.stats.value +="Multiplications = "+intMult+"\n";
document.calc.result.value +=" Product = " + out.reverse()+"\n";
}
</SCRIPT>
<!DOCTYPE html>
<html lang="en">
<head><meta charset="utf-8"/></head>
<body bgcolor="gold" onload="fetchValues()">
<H1>Big numbers</H1><P>
<FORM name=calc onSubmit="fetchValues(); return false;">
<B>INPUT DEC-><input type=submit name="calc" value="Calc"> <input type="text" name="eval" value="32432439*12" size="300"><br>
BASE <input type="text" name="formbase" value="10" size="10"><br>
This calculation<input type="text" name="outbase" value="" size="60"><br>
<FONT COLOR="white">
RESULT<br>
<textarea name="result" rows="20" cols="120"></textarea><br>
STATUS</B><br>
<textarea name="stats" cols="120" rows="40" value=""></textarea>
</FORM>
</body>
</html>

Chris Angelico

unread,

Apr 9, 2015, 11:02:24 AM4/9/15

to pytho...@python.org

On Fri, Apr 10, 2015 at 12:53 AM, Alain Ketterlin
<al...@dpt-info.u-strasbg.fr> wrote:
> Ouch, you're right, I tried to stick with Marko's example and forgot the
> basics. I meant "signed ints", but the "removable" condition should be
> something like:
>
> if ( x>0 && y>0 && z<x )
> ...
>
> i.e., some condition that is never true (either false or undefined).
> Thanks for the correction.

Ah, yep. And yes, that's absolutely right - the compiler's allowed to
treat that as never true.

ChrisA

Ian Kelly

unread,

Apr 9, 2015, 11:04:53 AM4/9/15

to Python

On Thu, Apr 9, 2015 at 8:54 AM, <jonas.t...@gmail.com> wrote:
> Aside from the base changer i've written a bignumb anybase generic multiplication, addition and subtraction routine. My goal is to make the arithmetic in the base of choice whatever size.
>
> And i think i can do it, i already have mult, add , subt working in anybase but not bignumb.
>
> Does this mult work for arbitrary size?
> Or will it but out at some point?

Dunno, I have no interest in verifying your code for you. Especially
since this is a Python list and said code is not Python.

If you want to build confidence that your code works, I suggest
writing some unit tests for it. (I'd also suggest keeping it separate
from your html in a .js file, which would make it a lot easier to test
and reuse.)

Steven D'Aprano

unread,

Apr 9, 2015, 11:12:09 AM4/9/15

to

On Thu, 9 Apr 2015 11:33 pm, Marko Rauhamaa wrote:

> In this day and age, heavily optimized compilers (including gcc)
> actively exploit undefined behavior in code generation.

Optimization and undefined behaviour reminds me of a story my wife told me,
of a time she was crossing the US with some roadies. As they were driving
across the desert highway, one of the roadies noticed that they were
driving in the wrong direction for their next gig.

"Who cares?" said the driver, "we're making fantastic time!"

--
Steven

jonas.t...@gmail.com

unread,

Apr 9, 2015, 11:12:42 AM4/9/15

to

I guess when working on highlevel arithmetic one does not need to think about such trivial? things, the expression is broken down into pairs fed into the functions and the parser handle the signs?

jonas.t...@gmail.com

unread,

Apr 9, 2015, 11:21:46 AM4/9/15

to

Well with the condition i can get the bignumb arithmetic fully working, i am on my way of working bignumb arithmetic in anybase.

I think this will be the first implementation ever to use arithmetic in the base of your choice.

And i know for a fact (i've factored RSA in the 200 digit range on a 486 in a couple of hours 98-)

So there is more to this than fancy overlay, it is about the digit representation at the binary level, and how the binary groups interpretated.

So in the end, even with none optimised type like Javascript integer it will save operations at high level as well as lowlevel.

That said highlevel and highbase arithmetic will work alot faster than binary.
And using modular arithmetic you may find out that some operations requiering exponential work like factoring will turn out to be done in polynomial time.

Steven D'Aprano

unread,

Apr 9, 2015, 11:21:58 AM4/9/15

to

Am I missing something? Did you over-trim the quoting?

How can the compiler possibly be allowed to treat x>0 && y>0 && z<x as
always false?

x = y = 1
z = 0

gives x>0 && y>0 && z<x as true.

--
Steven

Chris Angelico

unread,

Apr 9, 2015, 11:34:30 AM4/9/15

to pytho...@python.org

It's not really like that. Imagine you're loading up the stuff on the
back of your truck. Do you have to research the narrowest tunnel
you'll ever have to go through? Nah. You just make sure it's no wider
than the legal limit, because any tunnel that's narrower than that
isn't your problem. If you ever _do_ meet an overly-narrow tunnel,
you'll run into problems, because you assumed that such a thing wasn't
possible.

The C compiler is generally going to assume a whole slew of principles
that aren't technically guaranteed. For instance, it'll probably
assume that CPU registers will accurately retain what they were set
to, and likewise memory (unless it's marked 'volatile'). Put your code
out in space, and that might no longer be true, but you can't expect
the compiler to cope with that. As far as it's concerned, it's
impossible for a CPU register to arbitrarily change without notice.
It's equally impossible for the addition of two positive signed
integers to result in a negative integer.

ChrisA

Chris Angelico

unread,

Apr 9, 2015, 11:37:00 AM4/9/15

to pytho...@python.org

On Fri, Apr 10, 2015 at 1:21 AM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> On Fri, 10 Apr 2015 01:02 am, Chris Angelico wrote:
>
>> On Fri, Apr 10, 2015 at 12:53 AM, Alain Ketterlin
>> <al...@dpt-info.u-strasbg.fr> wrote:
>>> Ouch, you're right, I tried to stick with Marko's example and forgot the
>>> basics. I meant "signed ints", but the "removable" condition should be
>>> something like:
>>>
>>> if ( x>0 && y>0 && z<x )
>>> ...
>>>
>>> i.e., some condition that is never true (either false or undefined).
>>> Thanks for the correction.
>>
>> Ah, yep. And yes, that's absolutely right - the compiler's allowed to
>> treat that as never true.
>
> Am I missing something? Did you over-trim the quoting?

Yes, because the significant line was about three levels of quote
back, so I didn't bother to go retrieve it. Immediately before the if:

z = x + y;

ChrisA

Steven D'Aprano

unread,

Apr 9, 2015, 12:15:32 PM4/9/15

to

On Fri, 10 Apr 2015 01:34 am, Chris Angelico wrote:

> It's equally impossible for the addition of two positive signed
> integers to result in a negative integer.

Why oh why do C programmers tell such porkies??? *semi-wink*

[steve@ando c]$ gcc add.c
[steve@ando c]$ ./a.out
a = 1
b = 2147483647
a+b = -2147483648

Looks like a negative integer to me, but then, given that its undefined
behaviour, perhaps the compiler flipped some pixels on the screen so it
merely *looks* negative.

Here's the source code:

[steve@ando c]$ cat add.c
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
int a;
a = 1;
int b;
b = 2147483647;
printf("a = %d\n", a);
printf("b = %d\n", b);
printf("a+b = %d\n", a+b);
return 0;
}

--
Steven

Marko Rauhamaa

unread,

Apr 9, 2015, 12:25:24 PM4/9/15

to

Chris Angelico <ros...@gmail.com>:

> As far as it's concerned, it's impossible for a CPU register to
> arbitrarily change without notice. It's equally impossible for the
> addition of two positive signed integers to result in a negative
> integer.

The standard says that any program that takes a signed integer out of
its valid range is broken and deserves anything that happens to it.

I say it's the standard that is broken.

Marko

Ian Kelly

unread,

Apr 9, 2015, 12:30:53 PM4/9/15

to Python

On Thu, Apr 9, 2015 at 9:21 AM, <jonas.t...@gmail.com> wrote:
> And i know for a fact

What do you know for a fact?

> (i've factored RSA in the 200 digit range on a 486 in a couple of hours 98-)

Bullshit. RSA-200 wasn't factored until 2005, and the group that did
it required three months on a cluster of 80 2.2-GHz Opterons.

http://en.wikipedia.org/wiki/RSA_numbers#RSA-200

If you're going to spout lies in order to try to create credentials
for yourself that we don't care about anyway, at least try to keep it
in the realm of believability.

Chris Angelico

unread,

Apr 9, 2015, 12:36:52 PM4/9/15

to pytho...@python.org

On Fri, Apr 10, 2015 at 2:15 AM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> On Fri, 10 Apr 2015 01:34 am, Chris Angelico wrote:
>
>> It's equally impossible for the addition of two positive signed
>> integers to result in a negative integer.
>
> Why oh why do C programmers tell such porkies??? *semi-wink*
>
> [steve@ando c]$ gcc add.c
> [steve@ando c]$ ./a.out
> a = 1
> b = 2147483647
> a+b = -2147483648
>
>
> Looks like a negative integer to me, but then, given that its undefined
> behaviour, perhaps the compiler flipped some pixels on the screen so it
> merely *looks* negative.

Precisely. Another compiler, or another CPU architecture, is welcome
to behave differently. *According to the C standard*, you just did
something impossible, so it's equally valid for the output to be 5, 3,
7, or 25, all of which would probably be produced by Princess Ida's
compiler. (Mind you, her compiler might do that when asked to add 2
and 2, so I'm not sure it's all that helpful a comparison. [1])

> Here's the source code:
>
> [steve@ando c]$ cat add.c

> a = 1;
> b = 2147483647;

> printf("a+b = %d\n", a+b);

The first undefined part here is that this operation might not even
overflow. If the compiler uses a 64-bit int, this will simply produce
2147483648. But even assuming that it can't represent that total in a
signed int, there's nothing in the spec that says it has to become
negative.

ChrisA

[1] http://diamond.boisestate.edu/gas/princess_ida/webop/pi_10d.html