New edax 4.3.alpha1 source code available on google-code

Richard Delorme

unread,

Aug 9, 2012, 3:00:34 AM8/9/12

to edax-r...@googlegroups.com

As Toshihiko Okuhara sent to me several move generators, and one faster than the current one, I upload a snapshot of the future 4.3 version, currently in development.

The download page is here:

http://code.google.com/p/edax-reversi/downloads/list

I would like to thanks Toshihiko Okuhara for his contribution to Edax code.

--
Richard

Stéphane Nicolet

unread,

Aug 10, 2012, 6:39:02 AM8/10/12

to edax-r...@googlegroups.com, Stéphane Nicolet, HGE0...@nifty.ne.jp

Hi all,

Thanks Richard and Toshihiko san for this new version of Edax !
I've had a look at the new move generators, they are very nice
pieces of code indeed :-)

I thought that maybe you might be interested be the following
idea, which uses carry propagation to flip discs in the MSB
to LSB direction : it might avoid the use of arrays in some
more cases...

Hope this helps, Stéphane

// Example code to flip discs to the left of H1 for Othello.
//
// This code use carry propagation and a CountLeadingZero()
// function to be fast, so it is probably only useful on
// processors/languages with a CountLeadingZero() intrinsic.
//
// Notes : a) SHL is the "shift left" operator;
//
// b) SHR is the "shift right" operator;
//
// c) to flip discs upward across lines (H8->A1 or H8->H1),
// it may be necessary to fill the opponent_discs
// bitboard with dummy discs (using a OR mask)
//
// d) the following identities might be usefull to calculate
// the right number of discs for upward flipping :
//
// "n shr 3" == "n div 9"
// if n is a multiple of 9 , O <= n <= 63
//
// "(n+7) shr 3" == "n div 7"
// if n is a multiple of 7, 0 <= n <= 49

-------

square := $00000080; //
$00000080 is square H1
player_left := player_discs AND $0000003F; //
mask player's discs on squares A1-F1
result := (opponent_discs + (player_left SHL 1)) AND square; //
use carry propagation to flip discs

if (result <> 0) then
begin
number_of_flipped := COUNT_LEADING_ZEROS(player_left) - 25;
flipped_discs := square - (square SHR number_of_flipped);
end;

-------

Stéphane Nicolet

... /// ... \\\ ...

youri matiounine

unread,

Aug 23, 2012, 6:36:09 PM8/23/12

to edax-r...@googlegroups.com, Stéphane Nicolet, HGE0...@nifty.ne.jp

New move generating code is very interesting. I imagine "bitscan" version is the fastest?
I am using something similar in my move generators. Here is what i have for moves to A7, vertical line:

UINT64 fliiped_all_A7_vert[7];
    DWORD bits;

    _BitScanReverse64(&bits,(O & 0x0000010101010100ULL) ^ 0x0000010101010101ULL);
    if(!_bittest64(&P,bits))
        bits=0;
    flipped=fliiped_all_A7_vert[bits];

Here code "if(!_bittest64(&P,bits)) bits=0; " compiles to just 3 assembler instructions: put 0 is a register; BT; conditional move.
And array "fliiped_all_A7_vert" contains precomputed masks for all possible values of "bits", including zero. This way if move is not possible, it still works.

I have this coded in assember - compiler keeps first argument of "bittest" function in memory, which is slow. Here is assember version of the same code:

mov rax, 7f7f7f7f7f7f7f00H ; border mask(vert)

and rax, r10 ; mask out borders(vert)

not rax ; !x(vert)

mov r8, 0000010101010101H ; mask-out all but vert6(vert)

and r8, rax ; !x, 0(vert)

bsr rcx, r8 ; search. Now rcx is 40-8*x(vert)

xor r8d, r8d ; init to 0(vert)

bt r9, rcx ; check player(vert)

cmovc r8, ?all_av_move@@3PA_KA[rcx+8*946] ; mask (40-8*x is already in rcx)(vert)

All of my code is in
http://code.google.com/p/ymatioun/

Regards, Youri.

From: Stéphane Nicolet <cas...@free.fr>
To: edax-r...@googlegroups.com
Cc: Stéphane Nicolet <cas...@free.fr>; HGE0...@nifty.ne.jp
Sent: Friday, August 10, 2012 6:39 AM
Subject: Re: New edax 4.3.alpha1 source code available on google-code

Richard Delorme

unread,

Aug 24, 2012, 1:52:21 AM8/24/12

to edax-r...@googlegroups.com

Thank you all for your interest in this new move generation codes.

Here is a comparison between my own kindergarten approach, and Toshihiko Okuhara's carry propagation that looks the fastest one, with various compiler (gcc 4.7 clang 3.0 and icc 12.1), using pgo (on icc & gcc only), and code targeted to x64, x64-modern (with popcount) and x86. The ffo20-39 test was run 5 times, using 1 core on my laptop (with a sandy bridge cpu running at 3.1 Ghz), and fedora 17 as OS. The accuracy for the timing is around 1%.

kindergarten carry acceleration

-----------------------------------------------------

x64-modern-pgo-icc 00:05,18 00:05,18 0,00%

x64-modern-icc 00:05,19 00:05,11 1,61%

x64-icc 00:05,38 00:05,28 1,86%

x64-modern-pgo-gcc 00:05,40 00:05,39 0,07%

x64-modern-gcc 00:05,57 00:05,40 3,11%

x64-gcc 00:05,72 00:05,62 1,74%

x64-modern-clang 00:05,90 00:05,81 1,58%

x64-clang 00:06,03 00:05,94 1,58%

x86-clang 00:09,06 00:08,39 7,96%

x86-icc 00:09,21 00:08,89 3,60%

x86-gcc 00:10,26 00:09,71 5,62%

as you can see, the acceleration vary from 0% to 8% and strongly depends on the compiler used and the targeted abi. In practice, the acceleration is only important on 32 bits abi (the x86), where optimized code to this target was used only with the carry approach. To my humble opinion, I do not think it is important to be fast here, as running Edax on a 32 bits system is whatever much slower than running it on a 64 bits system; so I am not interested to optimize the kindergarten approach to this system (although this can be done).

What is exciting me to further improve code generation speed, is the announcement of the AVX2 extensions in future CPU (coming in 2013). Code like "(O & 0x0002040810204080ULL)* 0x0101010101010101ULL >> 57" will be replaced by a single instruction (PDEP), and will be easy to reverse without using an array (PEXT). This will probably make the kindergarten approach very competitive.

So see you in the future for faster move generator.

Reply all

Reply to author

Forward