AZ vs LZ Graph Comparison

826 views
Skip to first unread message

Sam Jukes

unread,
Apr 28, 2018, 3:25:00 PM4/28/18
to LCZero



I made a quick rating graph comparison of AZ and LZ in relation to number of games played with the assumption that LZ's latest strength is high 2700s. 


It seems AZ after being stuck in mid 2000s 'discovered' something and shot up to 3000s. I am hoping LZ will get this however it is a little overdue... Maybe this is because of bugs in the past or a different network size.


Still it looks like it is making good progress!



FWCC1

unread,
Apr 28, 2018, 3:51:05 PM4/28/18
to LCZero
Great Job doing this comparison.We don't have the resources of Google unfortunantely.

Trevor

unread,
Apr 28, 2018, 5:05:57 PM4/28/18
to LCZero
Those AZ jumps look a lot like what happens when you reduce learning rate. I.e.: https://datascience.stackexchange.com/questions/23213/why-does-decreasing-the-sgd- learning-rate-cause-a-massive-increase-in-accuracy 

Notice in the paper that the first big jump occurs almost precisely at 100k steps, and there is a similar jump in Shogi strength at 100k steps as well. The Alpha Zero paper states: "The learning rate was set to 0.2 for each game, and was dropped three times (to 0.02, 0.002 and 0.0002 respectively) during the course of training."

If that is what it is, I can nonetheless imagine that there's good reasons to *not* decrease Leela's learning rate right now.

Jesse Jordache

unread,
Apr 29, 2018, 1:58:50 PM4/29/18
to LCZero
It's funny that one of A0's early points of stalling out before shooting up again almost exactly coincides with the 25-or-so iterations it took for Leela to finally get stronger than network 125, after slipping because of version .6 weirdness.

I wondered about that though - thanks for the information about A0's graph, and the link.

Redshift

unread,
Apr 29, 2018, 3:28:07 PM4/29/18
to LCZero
This is comparing apples to oranges, as A0's curve describes its performance running on multiple TPUs, with a huge number of MCTS visits compensating for any innate tactical deficiencies of the network. Lc0 when tested, does not get that benefit.

Jeremy Zucker

unread,
Apr 29, 2018, 8:19:19 PM4/29/18
to Redshift, LCZero
According to the AlphaZero paper, "During training, each MCTS used 800 simulations." LCZero also uses 800 simulations per MCTS.  
AlphaZero ran 5000 1st generation TPUs  for 9 hours, generating 44 million games.  LCZero has now played for 2 months, generating 10 million games.
AlphaZero took 700k training steps, where each step consisted of a mini-batch of 4096 positions, with the associated MCTS playouts,  win/draw/loss values
LCZero has so far generated 219 networks with median number of games per network = 42k, but I don't know how many training steps were run on each network, nor do I know how many positions are in each mini-batch. 
It is not clear to me from the paper how many blocks and filters AlphaZero used, but AlphaGo Zero used 20 and 40 block architectures, with filters of size 256. LCZero v0.7 uses a 10 block architectures with filters of size 128, and we are testing a 15x192 architecture now.

So it is not so much about comparing apples and oranges as it is Honey Crisp and Granny Smith

Sincerely,

Jeremy

On Sun, Apr 29, 2018 at 12:28 PM, Redshift <gso...@gmail.com> wrote:
This is comparing apples to oranges, as A0's curve describes its performance running on multiple TPUs, with a huge number of MCTS visits compensating for any innate tactical deficiencies of the network. Lc0 when tested, does not get that benefit.

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/b36890a1-53f7-4ae6-acdf-f61cac5adb08%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jesse Jordache

unread,
Apr 29, 2018, 9:10:01 PM4/29/18
to LCZero
If you look at the various spreadsheets that the devs have available and linked from the FAQ page (which noone seems to do) you'll find a graph that compares A0 to L0 to "Max L0", which is L0 if it had so much processing power that more would not make a difference, you'll see L0 and Max L0 bracket A0's curve.  Which you'd expect.


On Saturday, April 28, 2018 at 3:25:00 PM UTC-4, Sam Jukes wrote:
Reply all
Reply to author
Forward
0 new messages