Faster integer square roots through a seed

Albert van der Horst

unread,

Aug 29, 2017, 5:01:26 AM8/29/17

to

The problem with Newton iteration is the initial value.
Once we are near the root it goes like a spear.

Mostly if we are calculating many square roots, the arguments
are near each other. Like in the "slightly better prime
counting program" of Anton Ertl.
We can take advantage of that by remembering the second
to last iteration. We also must make sure to do at least
one iteration before the possible final step, to make
sure the first iterated result is larger than the square root.
(It is really subtle.)

-------8<---------------------8<----------
\ For N return FLOOR of the square root of n.

VARIABLE seed 1 seed !
\ While calculating roots near each other., the seed can be kept.
\ Otherwise this can be used to save 10 iterations.
: init-seed DUP 10 RSHIFT 1024 MAX seed ! ;

: SQRT DUP IF
>R seed @
R@ OVER / OVER + 2/ NIP DUP .
BEGIN R@ OVER / OVER + 2/ DUP . 2DUP > WHILE NIP REPEAT seed ! RDROP
THEN ;
------------------------------------------

Now we can do a test, with this SQRT that prints all intermediate
results. The test also prints the argument and the outcome.
--------------------
: test
1000000000 10 0 DO DUP I 100000 * + DUP . SQRT 4 SPACES . CR LOOP
DROP
1000000000 10 0 DO DUP I 100000 * - DUP . SQRT 4 SPACES . CR LOOP
DROP
;
--------------------
S[ ] OK test
1000000000 500000000 250000001 125000002 62500004 31250009 15625020
7812541 3906334 1953294 976902 488962 245503 124788 66400 40730 32640
31638 31622 31622 31622
1000100000 31624 31624 31624
1000200000 31625 31625 31625
1000300000 31627 31627 31627
1000400000 31629 31629 31629
1000500000 31630 31630 31630
1000600000 31632 31632 31632
1000700000 31633 31633 31633
1000800000 31635 31635 31635
1000900000 31637 31637 31637
1000000000 31622 31622 31622
999900000 31621 31621 31621
999800000 31619 31619 31619
999700000 31618 31618 31618
999600000 31616 31616 31616
999500000 31614 31614 31614
999400000 31613 31613 31613
999300000 31611 31611 31611
999200000 31610 31610 31610
999100000 31608 31608 31608

S[ 1000000000 1000000000 ] OK

De test shows that if the initial value is bad, Newton becomes
a sort of binary search. Then the first value is already spot on,
even between 1000900000 and 1000000000 , such that the second
iteration can confirm we are on the root.

The cost of two divisions for a square root is about as good
as it gets.

Don't initialise the seed to 0.

Have fun!

Groetjes Albert
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

minf...@arcor.de

unread,

Aug 29, 2017, 8:15:46 AM8/29/17

to

I would seed with the old approximation: sqrt(1+x) ~ 1+x/2.

hughag...@gmail.com

unread,

Aug 29, 2017, 2:11:55 PM8/29/17

to

On Tuesday, August 29, 2017 at 2:01:26 AM UTC-7, Albert van der Horst wrote:
> The problem with Newton iteration is the initial value.
> Once we are near the root it goes like a spear.

This is my DSQRT that has been in the novice-package for many years:

: <dsqrt> ( Darg -- Uroot ) \ first estimate (Darg must be positive)
d2/ 1 >r begin 2dup d0= not while
d2/ d2/ r> 2* >r repeat
2drop r> ;

: uaverage ( Ua Ub -- Umean ) \ rounds down
u>d rot u>d d+ \ -- Dsum
d2/ drop ; \ -- Usum/2 \ the high word is always zero

: dsqrt ( Darg -- Uroot ) \ square root
2dup d0< abort" *** DSQRT needs a positive argument ***"
2dup d0= if drop exit then
2dup <dsqrt> begin >r \ -- Darg \r: -- estimate
2dup r@ um/mod nip \ -- Darg new-estimate
r@ uaverage \ -- Darg root
dup r> - abs 2 < until \ -- Darg root
nip nip ;

Robert L.

unread,

Aug 29, 2017, 2:45:21 PM8/29/17

to

;; newton integer square root
(define (isqrt-newton n)
(let loop ((x n) (y (quotient (+ 1 n) 2)))
(if (>= y x)
x
(loop y (quotient (+ y (quotient n y)) 2)))))

;; integer square root (non-newton)
(define (isqrt n)
(if (< n 2)
n
(let* ((small (* 2 (isqrt (quotient n 4))))
(large (+ 1 small)))
(if (> (* large large) n) small large))))

(define (test func)
(do ((i 999222000 (+ 1 i)))
((= i 999888000))
(func i)))

> (time (test isqrt-newton))
cpu time: 421 real time: 469 gc time: 0
> (time (test isqrt))
cpu time: 359 real time: 375 gc time: 0

--
For two years ... he was held in solitary confinement in the Toronto West
Detention Centre.... [A] court in Mannheim sentenced him to five years
imprisonment for the crime of "popular incitement" under Germany's notorious
"Holocaust denial" statute. www.revisionists.com/revisionists/zundel.html

Albert van der Horst

unread,

Aug 29, 2017, 9:41:52 PM8/29/17

to

In article <26a4d034-ea61-4745...@googlegroups.com>,

That accomplishes little.
It is almost the same than 1 SEED ! .

Let's try:
1000000000 CONSTANT x
OK
1 x 2/ + seed !
OK
x sqrt .

250000001 125000002 62500004 31250009 15625020 7812541 3906334 1953294 \

976902 488962 245503 124788 66400 40730 32640 31638 31622 31622 31622 OK

1 seed !
x sqrt .

500000000 250000001 125000002 62500004 31250009 15625020 7812541 3906334 \
1953294 976902 488962 245503 124788 66400 40730 32640 31638 31622 \

31622 31622 OK

If you use my INIT-SEED you save 10 iterations for large numbers.
Then after that you save even more.
x init-seed
OK
x sqrt
488793 245419 124746 66381 40722 32639 31638 31622 31622 OK
x sqrt
31622 31622 OK

Seeding by 1 is better for small numbers.With my INIT-SEED you'll
need 10 iterations:

10 DUP init-seed SQRT
512 256 128 64 32 16 8 4 3 3 OK

Paul Rubin

unread,

Aug 29, 2017, 10:10:41 PM8/29/17

to

alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
>>I would seed with the old approximation: sqrt(1+x) ~ 1+x/2.
> That accomplishes little.
> It is almost the same than 1 SEED ! .

I couldn't understand quite what was meant. But if you know x and
s=sqrt(x), and y is near x, then the obvious initial guess is
s+(y-x)/(2*s) .

minf...@arcor.de

unread,

Aug 30, 2017, 1:20:30 AM8/30/17

to

You are right of course. 1+x/2 is the first Taylor approximation for small
numbers. It doesn't fit to large numbers. Mea culpa.

m...@iae.nl

unread,

Aug 30, 2017, 4:22:29 AM8/30/17

to

.. which is the two-term Taylor expansion for sqrt(x)
around x=x0, isn't it? This works very nicely, at the
cost of one (not necessarily accurate) division.

-marcel

Paul Rubin

unread,

Aug 30, 2017, 4:35:02 AM8/30/17

to

m...@iae.nl writes:
> .. which is the two-term Taylor expansion for sqrt(x)
> around x=x0, isn't it?

Yep, that's what Newton's method is, too, come to think of it ;-).

If multiplication is cheaper then division, it might be worth putting
on a third order term: s + (y-x)/(2*s) + (y-x)/(4*x*s) if I have it right.

Paul Rubin

unread,

Aug 30, 2017, 4:35:38 AM8/30/17

to

Paul Rubin <no.e...@nospam.invalid> writes:
> s + (y-x)/(2*s) + (y-x)/(4*x*s)

That was supposed to say s + (y-x)/(2*s) - (y-x)/(4*x*s)
of course.

m...@iae.nl

unread,

Aug 30, 2017, 4:59:34 AM8/30/17

to

You lost me there: now I need two divisions
and three multiplications for an initial
value that is not critical anyway?

-marcel

Paul Rubin

unread,

Aug 30, 2017, 5:11:56 AM8/30/17

to

m...@iae.nl writes:
> You lost me there: now I need two divisions and three multiplications
> for an initial value that is not critical anyway?

Well remember what you were going to do with the initial value?
That got an iteration or two out of the way.

Albert van der Horst

unread,

Aug 30, 2017, 6:05:07 AM8/30/17

to

In article <87tw0po...@nightsong.com>,

Guys, guys. Do some experimentation. It is not even clear whether
you're talking about a first guess. or the case I want to consider
improving a previous result that is one or two off.

You never can do better than the two divisions I need.

Suppose I want sqrt(87) the result must be 9. Suppose my initial
guess is 9. That is correct, but how can I know?

First step (obligatory) 9 -> 9
Now it is sure that the iterator is above or at the square root
second step 9->9
The result doesn't come down. so we're finished.

Andrew Haley

unread,

Aug 30, 2017, 1:22:44 PM8/30/17

to

Albert van der Horst <alb...@cherry.spenarnc.xs4all.nl> wrote:
> In article <87tw0po...@nightsong.com>,
> Paul Rubin <no.e...@nospam.invalid> wrote:
>>m...@iae.nl writes:
>>> You lost me there: now I need two divisions and three multiplications
>>> for an initial value that is not critical anyway?
>>
>>Well remember what you were going to do with the initial value?
>>That got an iteration or two out of the way.
>
> Guys, guys. Do some experimentation. It is not even clear whether
> you're talking about a first guess. or the case I want to consider
> improving a previous result that is one or two off.
>
> You never can do better than the two divisions I need.
>
> Suppose I want sqrt(87) the result must be 9. Suppose my initial
> guess is 9. That is correct, but how can I know?
>
> First step (obligatory) 9 -> 9
> Now it is sure that the iterator is above or at the square root
> second step 9->9
> The result doesn't come down. so we're finished.

That's interesting.

So, for a first guess, count the bits in x (i.e. the characteristic of
its log base 2): there are fast instructions on some processors to do
this. Divide the number of bits by 2, then shift x
right that many bits. This is a very rough integer approximation to
the square root.

87 has 6 bits. 6 2 / is 3.

So, our guess is 87 3 rshift, 10. We now have

10 87 10 / + 2/

And we're done. Larger numbers would take more iterations. This
does, of course, require a fast algorithm to calculate floor(log base
2). On machines that can't do that quickly this isn't such a good
algorithm.

Andrew.

Paul Rubin

unread,

Aug 31, 2017, 12:12:56 AM8/31/17

to

alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:

> Suppose I want sqrt(87) the result must be 9.

I didn't even realize this was about integer square roots. Maybe go
with some shifts/find-first-1, and a table lookup on the top few bits.

The Beez

unread,

Sep 7, 2017, 7:25:03 AM9/7/17

to

No problem at all:
- Get the highbit of the square;
- if uneven, add 1;
- Shift the square right by that number.

It gives significantly better results than 1+x/2.

E.g.

x sqrt(x) x/2+1 shift
1 1 1 0
2 1,414213562 2 1
4 2 3 2
8 2,828427125 5 2
16 4 9 4
32 5,656854249 17 5
64 8 33 8
128 11,3137085 65 11
256 16 129 16
512 22,627417 257 22
1024 32 513 32
2048 45,254834 1025 45
4096 64 2049 64
8192 90,50966799 4097 90
16384 128 8193 128
32768 181,019336 16385 181
65536 256 32769 256
131072 362,038672 65537 362
262144 512 131073 512
524288 724,0773439 262145 724
1048576 1024 524289 1024
2097152 1448,154688 1048577 1448
4194304 2048 2097153 2048
8388608 2896,309376 4194305 2896
16777216 4096 8388609 4096
33554432 5792,618751 16777217 5792
67108864 8192 33554433 8192
134217728 11585,2375 67108865 11585
268435456 16384 134217729 16384
536870912 23170,47501 268435457 23170
1073741824 32768 536870913 32768
2147483648 46340,95001 1073741825 46340
4294967296 65536 2147483649 65536

Some code:
\ 4tH library - FLS FFS - Copyright 2011 J.L. Bezemer
\ You can redistribute this file and/or modify it under
\ the terms of the GNU General Public License

\ An implementation of the BSD ffs() and fls() functions.

\ Bits are numbered starting from 1, starting at the right-most (least
\ significant) bit. A return value of zero from any of these words means
\ that the argument was zero.

[UNDEFINED] firstbit [IF]
: firstbit ( n1 -- n2)
dup if 1 swap begin dup 1 and 0= while swap 1+ swap 1 rshift repeat drop then
;

: lastbit ( n1 -- n2)
dup if 1 swap begin dup 1 <> while swap 1+ swap 1 rshift repeat drop then
;
[THEN]

This is a highspeed integer SQRT. I measured it to be ten times faster than a vanilla one (note this is 4tH, so you might have to tweak it a bit):

\ Thanks for asking. Those routines are public domain (originally posted to
\ comp.sys.arm a long time ago), so you can use them freely for any purpose.
\ Cheers, Wilco Dijkstra

\ http://github.com/irungentoo/filter_audio/blob/master/other/spl_sqrt_floor.c

\ Algorithm:
\ Successive approximation of the equation (root + delta) ^ 2 = N
\ until delta < 1. If delta < 1 we have the integer part of SQRT (N).
\ Use delta = 2^i for i = cell-bits/2 - 1 .. 0.

\ Output precision is half cell size. Note for large input values
\ (close to MAX-N), the highest bit of the low half cell) contains
\ the MSB information (a non-sign value).

\ This routine is significantly faster than the standard SQRT which
\ is defined in MATH.4TH

[UNDEFINED] isqrt [IF]
[UNDEFINED] cell-bits [IF] include lib/constant.4th [THEN]
: isqrt ( n1 -- n2)
0 swap -1 cell-bits 2 / 1- do ( r v)
over 1 i lshift + ( r v t)
i lshift over - dup 0> 0= ( r v t-v f)
if abs rot 2 i lshift or swap rot then drop
-1 +loop drop 2/ ( r)
;
[THEN]

May be also interesting: https://sourceforge.net/p/forth-4th/wiki/Inside%20Zen%20FSQRT/

Hans Bezemer

The Beez

unread,

Sep 7, 2017, 7:31:13 AM9/7/17

to

This should be:

- Get the highbit of the square;

- If uneven, add 1;
- Divide by 2;

- Shift the square right by that number.

A rough Excel equivalent is: =INT(0.5+((LOG(A2)/LOG(2))))/2

Hans Bezemer

Albert van der Horst

unread,

Sep 7, 2017, 11:37:09 AM9/7/17

to

In article <aa2cf464-c57c-49c6...@googlegroups.com>,

The Beez <the.bee...@gmail.com> wrote:
>On Thursday, August 31, 2017 at 6:12:56 AM UTC+2, Paul Rubin wrote:
>> alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) writes:
>> > Suppose I want sqrt(87) the result must be 9.
>>
>> I didn't even realize this was about integer square roots. Maybe go
>> with some shifts/find-first-1, and a table lookup on the top few bits.
>
>No problem at all:
>- Get the highbit of the square;
>- if uneven, add 1;
>- Shift the square right by that number.
>

<SNIP>

>It gives significantly better results than 1+x/2.

If you're going to loop 32 times, you've already lost the speed contest.

You must find the last bit using technisques like:

HEX
: COUNT-BITS
DUP 05555555555555555 AND SWAP 1 RSHIFT 05555555555555555 AND +
DUP 03333333333333333 AND SWAP 2 RSHIFT 03333333333333333 AND +
DUP 00F0F0F0F0F0F0F0F AND SWAP 4 RSHIFT 00F0F0F0F0F0F0F0F AND +
0FF MOD ;

I thought this horse was already dead.

Some one please add this to the faq:

What is the fastest method to do integer square roots?

And make an excerpt of the existing posts.

P.S.
Note that my seed method is used in special circumstances to add even
more speed to the fastest method.

>May be also interesting:
https://sourceforge.net/p/forth-4th/wiki/Inside%20Zen%20FSQRT/

That seems to say that integer calculations can be used to speed up
fp calculations. Nowadays that is questionable.

Since the 8087 (IBM PC anno 1985) FSQRT can be a single
assembler instruction.

Try to beat this

LOCATE FSQRT
SCR # 228
0 ( -fp- FSQRT FABS FNEGATE F1/N FMOD ) CF: ?32+ \ AvdH B6apr03
1
2 \ All: ( r -- r )
3 CODE FSQRT FSQRT, NEXT, END-CODE
...

ASSEMBLER LOCATE FSQRT,
SCR # 211
0 ( ASSEMBLER-CODES-PENTIUM --fp_1 ) CF: ?32+ \ A7oct19 AvdH
1 0800 00D8 7 2FAMILY, FADD, FMUL, FCOM, FCOMP, FSUB, -- FDIV,
2
3 0800 10D9 6 2FAMILY, FST, FSTP, FLDENV, FLDCW, FSTENV, FSTCW,
4 0100 E0D9 6 2FAMILY, FCHS, FABS, -- -- FTST, FXAM,
5 0100 E8D9 4 2FAMILY, FLD1, FLDL2T, FLDL2E, FLDPI,
6 0100 ECD9 3 2FAMILY, FLDLG2, FLDLN2, FLDZ,
7 0100 F0D9 4 2FAMILY, F2XM1, FYL2X, FPTAN, FPATAN,
8 0100 F4D9 4 2FAMILY, FXTRACT, FPREM1, FDECSTP, FINCSTP,
>>>>> 9 0100 F8D9 4 2FAMILY, FPREM, FYL2XP1, FSQRT, FSINCOS,
10 0100 FCD9 4 2FAMILY, FRNDINT, FSCALE, FSIN, FCOS,
11 00D9 2PI FLD, C8D9 2PI FXCH, D0D9 2PI FNOP,
12
13 0800 00DA 4 2FAMILY, FIADD, FIMUL, FICOM, FICOMP,
14 1000 20DA 2 2FAMILY, FISUB, FIDIV,
15 E9DA 2PI FUCOMPP,

Intel/AMD's microcode definitely beats any Forth code.

>
>Hans Bezemer

Hans Bezemer

unread,

Sep 10, 2017, 5:03:16 AM9/10/17

to

alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) Wrote in
message:

> If you're going to loop 32 times, you've already lost the speed contest.

Very true. BTW, that's what happens when you do integer SQRT
without a good estimate.

> You must find the last bit using technisques like:
>
> HEX
> : COUNT-BITS
> DUP 05555555555555555 AND SWAP 1 RSHIFT 05555555555555555 AND +
> DUP 03333333333333333 AND SWAP 2 RSHIFT 03333333333333333 AND +
> DUP 00F0F0F0F0F0F0F0F AND SWAP 4 RSHIFT 00F0F0F0F0F0F0F0F AND +
> 0FF MOD ;
>

Wrong. The fastest method is an unrolled loop of checking a cell
byte by byte, combined with an indexed lookup of the byte
value.

Hans Bezemer

--

----Android NewsGroup Reader----
http://usenet.sinaapp.com/

Albert van der Horst

unread,

Sep 10, 2017, 10:44:33 AM9/10/17

to

In article <59b4ffbc$0$833$e4fe...@news.xs4all.nl>,

Hans Bezemer <the.bee...@gmail.com> wrote:
>alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) Wrote in
> message:
>
>> If you're going to loop 32 times, you've already lost the speed contest.
>Very true. BTW, that's what happens when you do integer SQRT
> without a good estimate.

That's why the seed technique is brilliant.

>
>> You must find the last bit using technisques like:
>>
>> HEX
>> : COUNT-BITS
>> DUP 05555555555555555 AND SWAP 1 RSHIFT 05555555555555555 AND +
>> DUP 03333333333333333 AND SWAP 2 RSHIFT 03333333333333333 AND +
>> DUP 00F0F0F0F0F0F0F0F AND SWAP 4 RSHIFT 00F0F0F0F0F0F0F0F AND +
>> 0FF MOD ;
>>
>Wrong. The fastest method is an unrolled loop of checking a cell
> byte by byte, combined with an indexed lookup of the byte
> value.

I didn't claim that it was "the" fastest method, as you do.
That is hard to tell without a comprehensive survey.
I was much more cautious with "techniques like".
If your table has time to get swapped out of the cache before the next
square root is called, it may not be so fast compared to an inline
version of e.g. the above code.

>
>Hans Bezemer

Groetjes Albert

Hans Bezemer

unread,

Sep 10, 2017, 1:37:24 PM9/10/17

to

Albert van der Horst wrote:

> I didn't claim that it was "the" fastest method, as you do.
> That is hard to tell without a comprehensive survey.
> I was much more cautious with "techniques like".
> If your table has time to get swapped out of the cache before the next
> square root is called, it may not be so fast compared to an inline
> version of e.g. the above code.

Well, this uses a table of 510 bytes, so I wonder if it will break the bank.
Note it WILL require changes for 64bit, but those are trivial (repeat inner
lines 4 more times). Tested with gForth.

Hans Bezemer

8 constant char-bits
4 constant /cell

create (msb)
1 c, 2 c, 2 c, 3 c, 3 c, 3 c, 3 c, 4 c, 4 c, 4 c, 4 c, 4 c, 4 c, 4 c, 4 c,
5 c, 5 c, 5 c, 5 c, 5 c, 5 c, 5 c, 5 c, 5 c, 5 c, 5 c, 5 c, 5 c, 5 c, 5 c,
5 c, 6 c, 6 c, 6 c, 6 c, 6 c, 6 c, 6 c, 6 c, 6 c, 6 c, 6 c, 6 c, 6 c, 6 c,
6 c, 6 c, 6 c, 6 c, 6 c, 6 c, 6 c, 6 c, 6 c, 6 c, 6 c, 6 c, 6 c, 6 c, 6 c,
6 c, 6 c, 6 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c,
7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c,
7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c,
7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c,
7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 7 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c,
8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c,
8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c,
8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c,
8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c,
8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c,
8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c,
8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c,
8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c, 8 c,
does> swap chars + c@ ;

create (lsb)
1 c, 2 c, 1 c, 3 c, 1 c, 2 c, 1 c, 4 c, 1 c, 2 c, 1 c, 3 c, 1 c, 2 c, 1 c,
5 c, 1 c, 2 c, 1 c, 3 c, 1 c, 2 c, 1 c, 4 c, 1 c, 2 c, 1 c, 3 c, 1 c, 2 c,
1 c, 6 c, 1 c, 2 c, 1 c, 3 c, 1 c, 2 c, 1 c, 4 c, 1 c, 2 c, 1 c, 3 c, 1 c,
2 c, 1 c, 5 c, 1 c, 2 c, 1 c, 3 c, 1 c, 2 c, 1 c, 4 c, 1 c, 2 c, 1 c, 3 c,
1 c, 2 c, 1 c, 7 c, 1 c, 2 c, 1 c, 3 c, 1 c, 2 c, 1 c, 4 c, 1 c, 2 c, 1 c,
3 c, 1 c, 2 c, 1 c, 5 c, 1 c, 2 c, 1 c, 3 c, 1 c, 2 c, 1 c, 4 c, 1 c, 2 c,
1 c, 3 c, 1 c, 2 c, 1 c, 6 c, 1 c, 2 c, 1 c, 3 c, 1 c, 2 c, 1 c, 4 c, 1 c,
2 c, 1 c, 3 c, 1 c, 2 c, 1 c, 5 c, 1 c, 2 c, 1 c, 3 c, 1 c, 2 c, 1 c, 4 c,
1 c, 2 c, 1 c, 3 c, 1 c, 2 c, 1 c, 8 c, 1 c, 2 c, 1 c, 3 c, 1 c, 2 c, 1 c,
4 c, 1 c, 2 c, 1 c, 3 c, 1 c, 2 c, 1 c, 5 c, 1 c, 2 c, 1 c, 3 c, 1 c, 2 c,
1 c, 4 c, 1 c, 2 c, 1 c, 3 c, 1 c, 2 c, 1 c, 6 c, 1 c, 2 c, 1 c, 3 c, 1 c,
2 c, 1 c, 4 c, 1 c, 2 c, 1 c, 3 c, 1 c, 2 c, 1 c, 5 c, 1 c, 2 c, 1 c, 3 c,
1 c, 2 c, 1 c, 4 c, 1 c, 2 c, 1 c, 3 c, 1 c, 2 c, 1 c, 7 c, 1 c, 2 c, 1 c,
3 c, 1 c, 2 c, 1 c, 4 c, 1 c, 2 c, 1 c, 3 c, 1 c, 2 c, 1 c, 5 c, 1 c, 2 c,
1 c, 3 c, 1 c, 2 c, 1 c, 4 c, 1 c, 2 c, 1 c, 3 c, 1 c, 2 c, 1 c, 6 c, 1 c,
2 c, 1 c, 3 c, 1 c, 2 c, 1 c, 4 c, 1 c, 2 c, 1 c, 3 c, 1 c, 2 c, 1 c, 5 c,
1 c, 2 c, 1 c, 3 c, 1 c, 2 c, 1 c, 4 c, 1 c, 2 c, 1 c, 3 c, 1 c, 2 c, 1 c,
does> swap chars + c@ ;

: (msb@) 1- (msb) + nip ; ( n shift byte -- bit)
: (lsb@) 1- (lsb) + nip ; ( n shift byte -- bit)

: lastbit ( n1 -- n2)

/cell 1- char-bits * over over rshift dup if (msb@) exit then
drop char-bits - over over rshift 255 and dup if (msb@) exit then
drop char-bits - over over rshift 255 and dup if (msb@) exit then
drop char-bits - over 255 and dup if (msb@) exit then
drop drop dup xor
;

: firstbit ( n1 -- n2)

0 over 255 and dup if (lsb@) exit then
drop char-bits + over over rshift 255 and dup if (lsb@) exit then
drop char-bits + over over rshift 255 and dup if (lsb@) exit then
drop char-bits + over over rshift dup if (lsb@) exit then
drop drop dup xor
;