Re: Is this group still alive?

47 views
Skip to first unread message

Ron Peacetree

unread,
Feb 25, 2015, 9:23:14 PM2/25/15
to edax-r...@googlegroups.com, me
-----Original Message-----
>From: Edax <edax.r...@gmail.com>
>Sent: Feb 25, 2015 6:37 PM
>To: edax-r...@googlegroups.com
>Subject: Re: Is this group still alive?
>
>Le 25/02/2015 16:08, Ron Peacetree a écrit :
>> Enough time has passed since the original design of edax in ~1998 that
>> there have been some significant changes in HW architecture:
>I do not think there is a single line of Edax 1.0 (the 1998 version) in
>Edax 4.x. Edax has been rewritten from scratch between each major
>version. It does take into account some recent changes in hardware
>architecture: 64 bit instructions, multi-core CPU, etc. that were not
>available (at a reasonable price) in 1998.
>> =as of this writing, one can buy commodity systems (albeit servers at
>> present) that support 1-2 TeraBytes of RAM
>> =1+ TeraByte SSDs with ~RAM speed IO paths (SATA Express) are now reality.
>They won't replace ram. RAM is still ~1000x faster than SSD for access
>time.
>
Replacing RAM is not the point.
OTOH, being able to do random access IO to permanent storage at multi GB/sec speeds is something we've never been able to do before...

>> =GPU functionality is becoming much more tightly integrated to CPUs
>> =GPUs are far better general purpose computing devices than they were
>> even 2 years ago
>> =HW support for transactional memory
>> =HW support for automated fine grain parallelism
>All this is very recent (in Haswell CPU or higher), and sometimes buggy
>(transactional memory on the first Haswell or Broadwell CPU).
>
Yes it's very recent as of this writing.
In the 6-12 months it will take to modify edax to take best advantage of all this, these architectural features will be debugged and become standard.

>>
>> All of the above suggests that it might be time to see if the
>> performance of edax could be significantly improved:
>> =support for transposition tables up to 1+ TB
>That's easy to do, but, considering the transposition table size,
>/bigger/ does not mean /better/.
>
In at least 1 respect, bigger does mean better.
The only way search can be a graph search rather than a tree search is if the Ttable contains the relevant nodes of the search.
If edax is searching 100M nps and our time budget is ~180secs/move, we need a Ttable of ~18 billion nodes to maximize the chances of finding every potential transposition that could be discovered by the search.

The deeper we search, the more important the Ttable becomes in helping us approximate the Real Minimal Graph rather than a tree search.
At some point, that starts paying very large dividends in terms of both search efficacy and the quality of the returned result.

>> =seeing if the magic bitboard code can better leverage the GPU
>> infrastructure
>
>Probably not. As far as I know, it is still very slow to switch between
>GPU & CPU. It means you cannot have the move generator on the GPU & the
>search in the CPU.
>
Please note I was talking about the GPU support in the CPU architecture as well as any external GPU that might be present.
Clearly there is low switching cost when the GPU is part of the CPU die ;-)

Even for the die to die switching case you mention above, the "Big Three" (AMD, Intel, and nVidia) are all driving CPU <-> GPU switching cost down quite a bit and at quite a pace (Intel's 128MB cache chip that can be used by both CPU and GPU for instance).
At the least, there's enough improvement in this area that it warrants some experimentation.

>> =seeing where else move generation and evaluation could be improved by
>> the HW changes
>move generation can benefit of some new instructions available in the
>Haswell, for example with the following code :
>https://code.google.com/r/okuharaandroid-edax-reversi/source/browse/src/flip_avx.c
>
neat code. will comment more on it after thorough examination.

>>
>> The laptop i'm writing this on is an i7-4860HQ with 32GB of RAM and a
>> nVidia GeForce GTX 980M running Win 8.1
>> the "stock" Edax 4.3.2 distro routinely searches 20-40M nps with peaks
>> of 50-65M nps on this system.
>> (hash-table-size 30, n-tasks 4, level 32 or 34)
>A few tricks to run Edax faster:
>- Try a smaller hash-table-size (25 should be near optimal for level
>32-34) & n-tasks 8.
>
My CPU is a 4 core with hyper thread support.
I set l 32 and varied n-tasks for fresh runs each time.
For n-tasks 1-4, nps scales nearly linearly.
For n-tasks 5-8, nps scaling is very slight.
I'll try to format the results and post them.

I'll try to redo the experiments I did some time ago varying cache size and post those as well.

>- Try to recompile Edax (stock Edax is optimized for generic CPU, if you
>recompile it, it will be optimized for yourown cpu).
>
What compiler suite do you recommend for windows?

>- Avoid windows (10% slower than Linux or Mac OS/X).
>
Meh. 10% is not worth a different OS install. ;-)

Ron


Reply all
Reply to author
Forward
0 new messages