Lc0 loses on time??

1,018 views
Skip to first unread message

MindMeNot

unread,
Sep 15, 2018, 3:20:56 PM9/15/18
to LCZero
I cannot figure out why lc0 is losing on time in cutechess-cli both against sfdev and against itself. how do i fix this? Also attached the pgn file of the 2 games.

Started game 1 of 2 (SFDev vs Lc0.11198)
Finished game 1 (SFDev vs Lc0.11198): 1-0 {Black loses on time}
Score of SFDev vs Lc0.11198: 1 - 0 - 0  [1.000] 1
Started game 2 of 2 (Lc0.11198 vs SFDev)
Finished game 2 (Lc0.11198 vs SFDev): 0-1 {White loses on time}
Score of SFDev vs Lc0.11198: 2 - 0 - 0  [1.000] 2
Elo difference: inf +/- nan
Finished match

Here's the command line to reproduce:

@echo off
SET outfile="out.txt"
echo Cutechess started at %date% %time%. Output is redirected to %outfile%
echo Cutechess started at %date% %time% > %outfile%

cutechess-cli.exe -tournament round-robin -rounds 2 -games 1 -repeat -concurrency 1 -pgnout out.pgn -recover ^
-resign movecount=2 score=500 -draw movenumber=100 movecount=3 score=50 ^
-engine name=SFDev option.Threads=1 option.Hash=1024 cmd=engines\SFDev\stockfish_18090410_x64_bmi2.exe ^
-engine name=Lc0.11198 cmd=engines\Lc0\Lc0.exe arg="--weights=engines\lc0\11198.pb" ^
-each proto=uci timemargin=10 tc=30/30+1 book=book\bookfish.bin bookdepth=2 dir=. >> %outfile%

echo Tournament ended at %date% %time%.
echo Tournament ended at %date% %time%. >> %outfile%
pause
out.pgn

Francesco Tommaso

unread,
Sep 15, 2018, 4:07:23 PM9/15/18
to LCZero
Is it using CPU or GPU?

MindMeNot

unread,
Sep 15, 2018, 4:13:22 PM9/15/18
to LCZero
CUDA version on a 750ti gpu. 1100 nps at start position for net 11198 after 1 minute.

Ivo Doko

unread,
Sep 15, 2018, 5:14:37 PM9/15/18
to LCZero
I saw someone in another thread mentioned hard disk spin-up delay might be causing this issue, but I don't remember where I saw it.

Are you running off a hard disk or an SSD (or something else)?

Jon Mike

unread,
Sep 15, 2018, 5:28:50 PM9/15/18
to LCZero
I ran into the same problems.  I found most 20xxx networks caused the problem for me.  LucasChess gui was able to help the id's move in time.  I tried changing prefetch to 0 and inference to 16 with overhead to 200 ms, with no luck.  Then I downloaded lc0(v18) with many of the same problems.  I found most the 20xxx networks needed about 45 minutes to not lose on time.  

Jon Mike

unread,
Sep 15, 2018, 5:30:09 PM9/15/18
to LCZero
Version 18 has 309 fix so it should help and adjusting overhead won't be needed.

tibo...@gmail.com

unread,
Sep 16, 2018, 8:56:46 AM9/16/18
to LCZero
I would suggest to reduce "scale thinking time" (0.17 UCI option)
default is 2.400000 - I use half = 1.200000 which performs well assuming slow hardware


David Grosvenor

unread,
Sep 16, 2018, 11:15:45 AM9/16/18
to LCZero
@Jon Where did you download Lc0 Vers. 18? I can only find Version 17 … 

David Grosvenor

unread,
Sep 16, 2018, 11:33:18 AM9/16/18
to LCZero
You could try to reduce minibatch-size (default 256) - I once set it to 512 and got loads of time losses as a consequence (GTX 1060). So it might as well solve your problem to reduce the default size. Only an educated guess though …. 

MindMeNot

unread,
Sep 16, 2018, 2:03:05 PM9/16/18
to LCZero
Guys i think i found a bug!

By using the "-debug" switch in cutechess and analyzing the log it appears that cutechess sometimes sends negative time left on the clock (wtime -1 or btime -1).
If lc0 is presented with negative time left, it analyzes indefinitely! Which breaks any and all time controls.

Try it: open lc0.exe, type "go wtime -1 btime 100 movestogo 10" and press enter.

I dont know if this is lc0's fault for mishandling such situations or cutechess by sending negative values willy-nilly, yet SF returns immediately when presented with this situation.

Ivo Doko

unread,
Sep 16, 2018, 2:35:51 PM9/16/18
to LCZero
That certainly seems to be a bug in cutechess.

lc0 shouldn't be faulted for working with the assumption that the received remaining time is non-negative (it's a perfectly valid assumption), so imo it makes no sense to call this a bug in lc0. It may be edited to not evaluate indefinitely if the remaining time is negative, but that still leaves engines spuriously receiving negative time from cutechess, thus being randomly unfairly penalised, which makes cutechess matches invalid for evaluating engines.

Jon Mike

unread,
Sep 16, 2018, 4:27:28 PM9/16/18
to LCZero
latest build v18

As I mentioned, LucasChess will solve many of the time issues.  (If you get cutechess working without time issues please enlighten us)

Jupiter

unread,
Sep 17, 2018, 2:37:52 AM9/17/18
to LCZero
What version of cutechess-cli do you use?

Jupiter

unread,
Sep 17, 2018, 2:56:57 AM9/17/18
to LCZero
Try to experiment changing value of this option.

option name Aversion to search if change unlikely type string default 1.330000

Aversion to search if change unlikely

command line name is
--futile-search-aversion

Increase its value from 1.33 to 10

Try 2, 3, 4, 5, 6 ...
until time forfeit is zero.

MindMeNot

unread,
Sep 17, 2018, 6:50:31 AM9/17/18
to LCZero
On Monday, 17 September 2018 08:37:52 UTC+2, Jupiter wrote:
What version of cutechess-cli do you use?

I'm using 1.0.0 from this page: https://github.com/cutechess/cutechess/releases
 

On Monday, 17 September 2018 08:56:57 UTC+2, Jupiter wrote:
Try to experiment changing value of this option.

option name Aversion to search if change unlikely type string default 1.330000

Aversion to search if change unlikely

command line name is
--futile-search-aversion

Increase its value from 1.33 to 10

Try 2, 3, 4, 5, 6 ...
until time forfeit is zero.

I'd rather wait for a bugfix for the rogue negative times before running other tests to avoid pollution.

Jupiter

unread,
Sep 17, 2018, 7:25:51 AM9/17/18
to LCZero
There is a newer compile for windows for both cutechess gui and cli, lets see if it sends negative wtime/btime this time.

v1.1.0 (CLI and GUI 64-bit builds)
I don't know why there hasn't been an official release for this yet, you can consider this an RC (release candidate):

MindMeNot

unread,
Sep 17, 2018, 9:30:31 AM9/17/18
to LCZero
I tried the newer build as you suggested but didn't change anything. However, removing "timemargin" prevents sending negative values and lc0 doesn't get stuck anymore. Can be considered a workaround.
But lc0 keeps losing on time... In particular, big nets lose on time to smaller nets everything else being equal.
I tried matching 11198 (20x256) vs ccrl net (10x128) vs 9155 (6x64) in a tournament with an extreme tc=100/1 (1000ms per 100 moves). Results are as follows.

11198 vs 9155: 11198 loses on time while 9155 still has 940-920ms on the clock.
11198 vs ccrl net: 11198 loses on time while ccrl net still has 780-800ms on the clock.
ccrl net vs 9155 is more interesting. 9155 uses always less time but the wtime/btime ratio grows with game length. Here's the last commands sent to the ccrl and 9155 nets for every match where they meet, sorted by the last number which is the ratio (wtime/btime or btime/wtime whichever is bigger). None of them ended in a time forfeit:

go wtime 403 btime 827 movestogo 14  2.052
go wtime 545 btime 832 movestogo 33  1.526
go wtime 585 btime 878 movestogo 40  1.5
go wtime 588 btime 874 movestogo 40  1.49
go wtime 572 btime 841 movestogo 38  1.47
go wtime 845 btime 579 movestogo 38  1.46
go wtime 862 btime 601 movestogo 41  1.434
go wtime 838 btime 587 movestogo 39  1.427
go wtime 601 btime 852 movestogo 42  1.418
go wtime 861 btime 613 movestogo 43  1.4
go wtime 864 btime 642 movestogo 47  1.346
go wtime 889 btime 691 movestogo 54  1.286
go wtime 691 btime 888 movestogo 55  1.285
go wtime 696 btime 893 movestogo 55  1.283
go wtime 886 btime 695 movestogo 59  1.275
go wtime 920 btime 728 movestogo 60  1.2637
go wtime 748 btime 922 movestogo 62  1.23
go wtime 903 btime 734 movestogo 60  1.23
go wtime 916 btime 783 movestogo 67  1.17
go wtime 810 btime 946 movestogo 72  1.168

Jupiter

unread,
Sep 17, 2018, 12:22:50 PM9/17/18
to LCZero
extreme tc=100/1 (1000ms per 100 moves). Results are as follows.

This is very fast indeed.

Lc0 is very flexible when it comes to time management. It has more than 2 options to influence time control. Tweaking it to suit the time control and hardware used to avoid time forfeits is not a bad idea.

MindMeNot

unread,
Sep 17, 2018, 1:22:24 PM9/17/18
to LCZero
I would expect avoiding forfeits to be default behavior. I shouldn't be manually tweaking it to not lose on time...
The behavior i'm seeing is very strange too, i mean the simple choice of network having catastrophic influence on time management.
I'll wait for the official v0.18 release which apparently has a bunch of time-related fixes and report back.

Andy Olsen

unread,
Sep 17, 2018, 2:36:55 PM9/17/18
to LCZero
MindMeNot, can you test a dev version of Lc0 to make sure it fixes the issue? Before this version Lc0 was prone to losing on time. If you're on Windows you can download CUDA (Nvidia GPU) version here:

That binary includes a fix for losing on time:


2018년 9월 17일 월요일 오후 12시 22분 24초 UTC-5, MindMeNot 님의 말:

MindMeNot

unread,
Sep 17, 2018, 4:08:28 PM9/17/18
to LCZero
Ok i tried the same test with that dev build.
The situation got definitely better. Now the difference between ccrl and 9155 is lower than 40ms (getting wider for longer games) in the reported wtime and btime on the last move of their games against each other.
11198 still loses on time... to a lesser degree than before. Now there's "only" 500-550ms separating it from both ccrl and 9155 at the moment of the loss on time (as opposed to 780 and 920 ms respectively with 0.17).
Essentially 11198 is still lasting exactly 30 moves before forfeiting and this didn't change with the new lc0 version. Still 30 moves. Sometimes it manages a lucky win in less moves but eh.


Btw, here's the script used to run this test:

@echo off
SET outfile="out.txt"
echo Cutechess started at %date% %time%. Output is redirected to %outfile%
echo Cutechess started at %date% %time% > %outfile%

cutechess-cli.exe -tournament round-robin -rounds 10 -games 2 -repeat -debug -concurrency 1 -pgnout out.pgn -recover ^

-resign movecount=2 score=500 -draw movenumber=100 movecount=3 score=50 ^
-engine name=Lc0.9155 cmd=engines\Lc0\Lc0.exe arg="--weights=engines\lc0\9155.txt" ^
-engine name=Lc0.ccrl cmd=engines\Lc0\Lc0.exe arg="--weights=engines\lc0\128x10-base-200000.pb" ^

-engine name=Lc0.11198 cmd=engines\Lc0\Lc0.exe arg="--weights=engines\lc0\11198.pb" ^
-each proto=uci tc=100/1 book=book\bookfish.bin bookdepth=2 dir=. >> %outfile%


echo Tournament ended at %date% %time%.
echo Tournament ended at %date% %time%. >> %outfile%
pause

Andy Olsen

unread,
Sep 17, 2018, 6:04:08 PM9/17/18
to LCZero
Oh I noticed you're doing tc=100/1, and that means 100 moves in 1 second? I don't think Lc0 can support such very fast speeds. Please try testing something a little slower. 

It needs at least enough time to do a single NN eval, plus probably some extra overhead. A rough guess is that ~0.5s per move should be ok, less than that you're in danger.



2018년 9월 17일 월요일 오후 3시 8분 28초 UTC-5, MindMeNot 님의 말:

Jon Mike

unread,
Sep 17, 2018, 7:50:38 PM9/17/18
to LCZero
These settings helped lc0 to move fast.
aversion: 4 to 10
Prefetch 0-32
Inference 1-256
Time: 1.05

These settings helped lc0 to consider an alternative second PV
Cpuct: 1-3
FPU: -.08 to 1

Jupiter

unread,
Sep 17, 2018, 10:40:54 PM9/17/18
to LCZero
Run some test game with v0.17.0 opencl but use backend=blas on 1 thread using Ender38 (4x64) net from 12-men-startpos.pgn on i7-2600 cpu at TC 60s + 1s against Texel 1.07, and to avoid time forfeit I use the following options and values after some trials.

v0.17.0:
--futile-search-aversion=5
Move time overhead in milliseconds set to 1000

With v0.18.0-dev 17Sept18 opencl, backend=blas, I just use the default, no time time forfeits were found. This version also uses 1 thread.

Tests sched:
v0.17.0 vs Texel, 50 games
v0.18.0-dev vs Texel, 50 games

Below is the Atime(s) (average time per move in seconds).
So v0.18-dev is effectively using the available time.

Rank Name                                          Games  Atime(s)
   1 Lc0 v0.18.0-dev 17Sept18 Ender38 blas 1t          50     2.58
   2 Texel 1.075 64bit popcnt                         100     1.55
   3 Lc0 v0.17.0 Ender38 blas 1t                       50     1.46
   
Game results:
Texel on 1 thread is set at 0.0 rating points as anchor.

The lead of v0.18-dev over v0.17 is only 0.5, but defeated Texel twice.

Overall the TC management of v0.18-dev is improved.

Summary

   # PLAYER                                      :  RATING  ERROR  POINTS  PLAYED   (%)
   1 Texel 1.075 64bit popcnt                    :     0.0   ----    67.5     100    68
   2 Lc0 v0.18.0-dev 17Sept18 Ender38 blas 1t    :  -126.0   57.5    16.5      50    33
   3 Lc0 v0.17.0 Ender38 blas 1t                 :  -134.1   60.2    16.0      50    32

White advantage = 44.27 +/- 21.86
Draw rate (equal opponents) = 74.87 % +/- 7.81

Head to head statistics:

1) Texel 1.075 64bit popcnt                    0.0 :    100 (+38,=59,-3),  67.5 %

   vs.                                             :  games (  +,  =, -),   (%) :    Diff,    SD, CFS (%)
   Lc0 v0.18.0-dev 17Sept18 Ender38 blas 1t        :     50 ( 19, 29, 2),  67.0 :  +126.0,  29.3,  100.0
   Lc0 v0.17.0 Ender38 blas 1t                     :     50 ( 19, 30, 1),  68.0 :  +134.1,  30.7,  100.0

2) Lc0 v0.18.0-dev 17Sept18 Ender38 blas 1t -126.0 :     50 (+2,=29,-19),  33.0 %

   vs.                                             :  games ( +,  =,  -),   (%) :    Diff,    SD, CFS (%)
   Texel 1.075 64bit popcnt                        :     50 ( 2, 29, 19),  33.0 :  -126.0,  29.3,    0.0

3) Lc0 v0.17.0 Ender38 blas 1t              -134.1 :     50 (+1,=30,-19),  32.0 %

   vs.                                             :  games ( +,  =,  -),   (%) :    Diff,    SD, CFS (%)
   Texel 1.075 64bit popcnt                        :     50 ( 1, 30, 19),  32.0 :  -134.1,  30.7,    0.0


Bench of v0.17.0 using Ender38 net at 2 threads and 2 minutes/position on 4 positions.
Average nps = 3135
Reply all
Reply to author
Forward
0 new messages