Runs of zeros in the decimal representation of powers of two

Clive Tooth

unread,

Mar 29, 2012, 5:43:26 AM3/29/12

to

Over the years I have been interested in finding runs of consecutive
zeros in the decimal representation of powers of two.

https://oeis.org/A006889
http://groups.google.com/group/sci.math/browse_frm/thread/ded1ca15ed91bace/ca4e8f58743ed138
http://groups.google.com/group/sci.math/browse_frm/thread/1fff1784a90513c4/e2f8ea5abd5e8694

In February 2007 Sacha Roscoe found the first run of 17 consecutive
zeros - in 2^918,583,174. So, by the five year rule ("Things that
seemed very difficult five years ago now seem merely difficult.") it
is time to look for 18 zeros. If anybody wants to try to beat me to
the 18 zeros - go for it! :)

a(n) is the smallest m such that 2^m has a run of n consecutive zeros
in its decimal form.

n a(n) Predicted
1 10 10
2 53 30
3 242 100
4 377 300
5 1,491 1,000
6 1,492 3,000
7 6,801 10,000
8 14,007 30,000
9 100,823 100,000
10 559,940 300,000
11 1,148,303 1,000,000
12 4,036,338 3,000,000
13 4,036,339 10,000,000
14 53,619,497 30,000,000
15 119,476,156 100,000,000
16 146,226,201 300,000,000
17 918,583,174 1,000,000,000
18 ? 3,000,000,000

The "Predicted" column is sqrt(10^(n+1)) (to 1 sig fig) which often
provides a good guess.

--
Clive Tooth

Spotlight

unread,

Mar 29, 2012, 7:15:53 AM3/29/12

to

In article <hib8n750e9s6s6nko...@4ax.com>,

Clive Tooth <cli...@gmail.com> wrote:

>Over the years I have been interested in finding runs of consecutive
>zeros in the decimal representation of powers of two.

Do you have any suggestions for efficient implementation? A naive
program would store one decimal digit per byte in an array, and
repeatedly loop over the array doubling it and looking for runs of
zeros as it does so.

Two obvious optimisations are:

- Use multiple digits per array element effectively using a base of,
say, 10^8 instead of 10. This has the disadvantage of making it
harder to spot runs of zeros. (However, a run of ten zeros implies
a big digit with at least three leading zeros, so we can use a
simple comparison to rule out most cases.)

- Since the length of a run of zeros can only increase by one for
each doubling, look at the length of the longest run and then
multiply by a power of two instead of two itself. This will
require a division per (big) digit because the carry will no longer
be only 0 or 1. Looking for 18-digit runs we will usually be able
to multiply by 2^9 since there will usually not be runs of greater
than 9 zeros in the expected size of number.

We're still looking at around 10^16 operations on the big digits...

-- Richard

Clive Tooth

unread,

Mar 29, 2012, 12:24:31 PM3/29/12

to

Here are some notes on my current program...

I am using a computer which has an Intel Core i7-2600 with 4 cores.
Thanks to hyper-threading this presents as (almost) 8 processors. The
computer has 16GB of RAM. (Although the program is not memory-hungry.)
The current power of two is about 2^1.06 billion. The power of two
increases by about 200 every second - but that rate is continually
decreasing.

The current power of two is held to base 10^6. Each base 10^6 "digit"
is held in a 32-bit integer. This array currently has 54 million
elements. Its length is increased by a million elements whenever it
gets full - about every 20 million powers of two.

The program is written in C# and has a Master thread and 8 Worker
threads.

The Master decides on every iteration whether to just double the
current power of two, or to multiply it by 2,048 (=2^11).

The Master partitions the array of integers into about 500
almost-equal sized chunks and then releases the Workers. The Workers
then go through the list of chunks and - for each chunk - either
double it or multiply it by 2,048. As each Worker processes a chunk it
looks out for zero (base 10^6) digits and records their position in a
global table. When all the chunks have been handled the Master wakes
up.

First of all, the Master handles the chunk-to-chunk carries and - if
the array of integers is now full, allocates another 10^6 integers. On
a x2048 iteration the chunk-to-chunk carry can propagate through
several elements of a chunk, possibly destroying digits previously
recorded as zero or creating new zero digits.

The Master then looks through the table of zero digits. [The Master
must handle the case where one of these digits was previously marked
as zero, but is no longer zero.] The program then examines the digits
adjacent to the zero digit and determines the exact number of
consecutive decimal zeros. If there is a run of 11 (or more) decimal
zeros anywhere then there must be a base 10^6 zero so any runs of at
least 11 decimal zeros will be found. A run of 10 decimal zeros will
be missed if it occurs as "m00000 00000n" as two base 10^6 digits. So,
if no run of 11 zeros is found this means that the longest run of
decimal zeros is ten or less. In this case the next iteration of the
program can multiply the current power of two by 2^11. This happens
about 99.2% of the time, at the moment.

The result of eleven doublings is illustrated here:

"aaaaaaaaaaaa5" is a string of digits which, every time it is doubled,
produces a new zero digit on the right. (If that string is n digits
long it will, in fact, be some odd multiple of 5^n.)
Z....Z is a string of zeros
cccccc is any string

(a fixed width font is best here)

Step Extra zeros

aaaaaaaaaaaa5Z......Z1cccccc

1: aaaaaaaaaaa50Z......Z2cccccc 1
2: aaaaaaaaaa500Z......Z4cccccc 2
3: aaaaaaaaa5000Z......Z8cccccc 3
4: aaaaaaaa50000Z.....Z16cccccc 3
5: aaaaaaa500000Z.....Z32cccccc 4
6: aaaaaa5000000Z.....Z64cccccc 5
7: aaaaa50000000Z....Z128cccccc 5
8: aaaa500000000Z....Z256cccccc 6
9: aaa5000000000Z.. .Z512cccccc 7
10: aa50000000000Z...Z1024cccccc 7
11: a500000000000Z...Z2048cccccc 8

Clearly, 8 zeros is the most be can get from 11 doublings.

So, if Z....Z is a string of ten zeros, and we are looking for 18
zeros, it is safe to multiply by 2^11 (=2048). Note that if we have a
run of 11 zeros and we multiply by 2^11 we might miss a run of 18
zeros which would have appeared at the 2^9 or 2^10 stage but which
then decreases to a run of only 17 zeros during the last one or two
doublings.

The program displays various progress metrics every few seconds. One
of these is a notional "picoseconds per digit". This being k times
10^12 divided by the notional number of digits of powers of two that
it has handled in the last k seconds (including those powers of two
that it skipped over by multiplying by 2048). At the moment this
number is sitting at about 16. This is a metric that can probably be
usefully applied to compare any algorithms which are looking for the
18 zeros. It is likely that the total number of digits to be examined
is of the order of 10^18 - so this picoseconds per digit figure can
lead to a crude time estimate for the complete task: about 6 months.

Other points:

The search for 18 consecutive zeros should start at about 2^918
million. However, actually computing that number in base million is a
challenge. I use a link to the Mathematica kernel running on my
computer to do this. Mathematica takes about 5 minutes to perform the
computation.

The program writes a restart file to disk every hour or if I manually
terminate the program. The program uses .NET serialization and
deserialization to save and restore (parts of) the C# object which
performs the computations. Before serializing the object the program
checks the current power of two for correctness. First it checks that
every base million digit is less than 1 million. Then it divides the
current power of two by a ten digit prime and stores the remainder. It
then computes (2^currentPower) mod thatPrime (by the usual method) and
compares the two results. If they are different we have a problem. In
fact it runs the check twice. Once with the certain prime, and once
with a random integer - just out of paranoia.

The number of chunks (about 500) which I mentioned above is not
determined very scientifically. I started off with just 8 chunks - but
then if one thread gets held back for some reason all the other
threads will just wait for it to finish (and not be able to help it)
before Master can be woken up. I thought that a smaller chunk size
would make this problem less serious.

A different threading approach would be to allocate various ranges of
powers of two to different threads and let each report back after a
day or two and be given a new range.

See...
http://groups.google.com/group/microsoft.public.dotnet.languages.csharp/browse_frm/thread/cf40546615f38239#
http://groups.google.com/group/microsoft.public.dotnet.languages.csharp/browse_frm/thread/477b516cc60977fd#
for some discussion of a slightly earlier version of the program.

--
Clive Tooth

1treePetrifiedForestLane

unread,

Mar 29, 2012, 1:50:15 PM3/29/12

to

wow, base-one-million;
not going to get a lot of zeroes in *that*.

Ross A. Finlayson

unread,

Mar 29, 2012, 3:21:45 PM3/29/12

to

On Mar 29, 9:24 am, Clive Tooth <cli...@gmail.com> wrote:

> A different threading approach would be to allocate various ranges of
> powers of two to different threads and let each report back after a
> day or two and be given a new range.
>

"Buckets."

> See...http://groups.google.com/group/microsoft.public.dotnet.languages.csha...http://groups.google.com/group/microsoft.public.dotnet.languages.csha...

> for some discussion of a slightly earlier version of the program.
>
> --
> Clive Tooth
>
>

Why wouldn't you look for the partial products with half that many
zeros?

If the algorithms is worked up so to maintained the product in partial
products instead of the "extended" or "indefinite" precision, then
some of the products that form the value would have more zeros in the
expansion then any product of that would have more too (if it had
more). Then just priority queue those.

Right then,

Ross Finlayson

christian.bau

unread,

Apr 2, 2012, 2:39:39 PM4/2/12

to

Here's what I would change:

I'd probably use base 10^18 or 10^19, with inline assembler for 64 x
64 -> 128 bit multiplication. Dividing a 128 bit product by 10^18 or
10^19 can be done using multiplications.

x86 is very fast when operations are independent, but much slower when
they are dependent, so I'd split the number into 4 parts and handle
them in parallel (as many as fit into registers).

You can multiply by more than 2^11: If you have 18 consecutive zeroes,
and you multiply by 2^39, then you still have 6 consecutive zeroes
left. 6 zeroes are rare. So where you have 6 zeroes in a row, you then
have to go back and calculate the intermediate results - but only for
those 6 digits and maybe 12 before (I'd have to work that out more
precisely).

Your speed may be bandwidth limited, especially with 8 processors
running simultaneously, so you might do a few passes on say 1 MB worth
of data, to keep everything cached.

Clive Tooth

unread,

Apr 3, 2012, 12:21:20 PM4/3/12

to

On Mon, 2 Apr 2012 11:39:39 -0700 (PDT), "christian.bau"
<christ...@cbau.wanadoo.co.uk> wrote:

>Here's what I would change:
>
>I'd probably use base 10^18 or 10^19, with inline assembler for 64 x
>64 -> 128 bit multiplication. Dividing a 128 bit product by 10^18 or
>10^19 can be done using multiplications.

Using such a large base makes it difficult to find runs of zeros. How
would you find 18 decimal zeros?

>x86 is very fast when operations are independent, but much slower when
>they are dependent, so I'd split the number into 4 parts and handle
>them in parallel (as many as fit into registers).

I am not sure I understand this, Christian. Care to elaborate a
little?

>You can multiply by more than 2^11: If you have 18 consecutive zeroes,
>and you multiply by 2^39, then you still have 6 consecutive zeroes
>left. 6 zeroes are rare. So where you have 6 zeroes in a row, you then
>have to go back and calculate the intermediate results - but only for
>those 6 digits and maybe 12 before (I'd have to work that out more
>precisely).

Spotting a run of 6 decimal zeros in base 10^18, 10^19 or even 10^6
seems tricky to me.

>Your speed may be bandwidth limited, especially with 8 processors
>running simultaneously, so you might do a few passes on say 1 MB worth
>of data, to keep everything cached.

This is an interesting idea. But handling the chunk-to-chunk carry
might be awkward.

--
Clive Tooth

christian.bau

unread,

Apr 3, 2012, 7:23:47 PM4/3/12

to

I've given this a bit more thought.

Let's say 10^n has 18 consecutive zeroes. Then for example 10^(n+23)
will still have 11 consecutive zeroes, because the digits right of the
first of the zeroes have been multiplied by 2^23 < 10^7, so at most 7
zeroes are gone. We can also show that 10^(n-10) must have 11
consecutive zeroes: Multiplying by 2 can create only one zero digit,
not two. So multiplying by 2^10 could have created only 10 zero
digits, so 8 zero digits must have been there already. But the digits
to the right of the first of the 18 zero digits must have a value <
1/1024 or we would have created a new non-zero digit, so there must
have been 3 zero digits to the right as well, for a total of 11.

Instead of the number 11 we can choose any number, and we get this
table:

2^(n-1) .. 2^(n+3): 17 zero digits
2^(n-2) .. 2^(n+6): 16 zero digits
2^(n-3) .. 2^(n+9): 15 zero digits
2^(n-5) .. 2^(n+13): 14 zero digits
2^(n-7) .. 2^(n+16): 13 zero digits
2^(n-8) .. 2^(n+19): 12 zero digits
2^(n-10) .. 2^(n+23): 11 zero digits
2^(n-11) .. 2^(n+26): 10 zero digits
2^(n-12) .. 2^(n+29): 9 zero digits
2^(n-14) .. 2^(n+33): 8 zero digits
2^(n-15) .. 2^(n+36): 7 zero digits
2^(n-17) .. 2^(n+39): 6 zero digits

So we can multiply for example by 2^34 at each step and check for 11
consecutive zeroes, and if we don't find them then there cannot be 18
consecutive zeroes.

We do the calculation in base 10^k. Checking for k+1 or k+2
consecutive zero digits is not trivial, but very fast: If there are k
+1 or k+2 consecutive zero digits, then the highest two or three
digits of some base 10^k number must both be zero. And that happens
only in one in 100 or 1 in 1000 cases.

This works out quite well if we choose k = 9 and n = 34: Do the
calculation in base 10^9. At each step we multiply by 2^34. We look
for sequences with 11 consecutive zeroes; this can only happen if one
of the base 10^9 digits is less than 10^6, which happens only in one
in 1000 cases. When that happens you can check

bool elevenzeroes;
if (number > 99999) elevenzeroes = (next % 100000000 == 0);
else if (number > 9999) elevenzeroes = (next % 10000000 == 0);
else if (number > 999) elevenzeroes = (next % 1000000 == 0);
else if (number > 99) elevenzeroes = (next % 100000 == 0);
else if (number > 9) elevenzeroes = (next % 10000 == 0);
else if (number > 0) elevenzeroes = (next % 1000 == 0);
else elevenzeroes = (previous <= 9999999 || (previous <= 99999999
&& next % 10 == 0) || next % 100 == 0);

This checks for 11 consecutive zeroes, which will be exceedingly rare.
We need to keep part of the previous result around in case you found
11 consecutive zeroes and have to check for 18 digits.

christian.bau

unread,

Apr 3, 2012, 7:34:26 PM4/3/12

to

On Apr 3, 5:21 pm, Clive Tooth <cli...@gmail.com> wrote:
> On Mon, 2 Apr 2012 11:39:39 -0700 (PDT), "christian.bau"
>

> <christian....@cbau.wanadoo.co.uk> wrote:

> >x86 is very fast when operations are independent, but much slower when
> >they are dependent, so I'd split the number into 4 parts and handle
> >them in parallel (as many as fit into registers).

The two things that limit the speed of code are latency and
throughput.

A modern x86 processor can for example start a multiplication every
cycle (throughput = 1 per cycle). However, the result may be available
only after say 4 cycles (latency = 4 cycles). So lets say you take x,
and multiply by y, and multiply by y again and so on, then every
multiplication must wait full 4 cycles before it can start and you get
only one multiplication every four cycles. On the other hand, if you
take x1, x2, x3, and x4, multiply each by y, multiply each by y again
and so on, then a multiplication can start every cycle, and you get
one multiplication per cycle.

So if you want to find out how long your code takes, you need to write
down which operations depend on which previous operations, and which
is the chain of dependent operations with the longest latency. That
determines how long the code takes. But an x86 processor can usually
do two or three instructions in the same cycle. So if you have say 9
instructions with a latency of 10 cycles, then you probably have
enough throughput to do 18 or 27 instructions in those 10 cycles. So
if you operate on three separate bits of data, you can often do three
times as much work in the same time.

1treePetrifiedForestLane

unread,

Apr 4, 2012, 8:21:51 PM4/4/12

to

whew; how does a division work,
with respect to ... can you take "partial products"
wihtout floating-point processes?

what is the latter-day ... oops, IEEE specification?

no/less/infinite jokes about spead reading,
thak you.

Clive Tooth

unread,

Apr 5, 2012, 3:46:21 AM4/5/12

to

Thanks for all this, Christian. I will start experimenting...

--
Clive Tooth

Clive Tooth

unread,

Apr 12, 2012, 6:45:13 PM4/12/12

to

I have implemented this *2^34 method, using base 10^9. It is about
three times faster than my previous method (multiplying by 2^11). The
overall time has reduced from 16 picoseconds per digit to about 5.5 ps
per digit.

--
Clive Tooth