Why is Fat Fritz 1st on CCRL?

1,484 views
Skip to first unread message

Sam Jukes

unread,
Nov 27, 2019, 8:36:23 AM11/27/19
to LCZero
Fat Fritz on CCRL blitz time control is now 1st place ahead of Lc0 and Stockfish.
https://www.computerchess.org.uk/ccrl/404/
It has a huge plus score against every engine it has played (except Lc0 (+3−6=72)) including Stockfish-Dev (+14−3=64).
It has got incredible scores against all other engines too:
- Allie (+22−0=56)
- Komodo 13 (+27−1=52)
- Houdini 6 (+33−2=46)

Yet Fat Fritz was tested against Lc0 (+4-14=84) and Stockfish (+6-16=80) on CCC and as you can see got beat rather soundly.

What could be the explanation for this?

Message has been deleted

Stefan Pohl

unread,
Nov 27, 2019, 10:43:17 AM11/27/19
to LCZero
On CCRL, they use a super fast RTX 2080 for lc0 and Fat Fritz and dont care about the Leela-Ratio. So, their results are just rubbish.
The results on CCC are much more realistic. 
Look on my ratinglist (Leela-Ratio 1.3) for a valid rating of Fat Fritz:

Pawel SalsaDura

unread,
Nov 27, 2019, 5:32:31 PM11/27/19
to LCZero
And who cares about your leela ratio anymore? For example, if next year they will release even more powerful ffp16 graphic cards then we should not take into consideration their performance because they are too powerful? That is pure nonsense! In this race graphic cards start dominating over processors and we should accept this AS it is and not look for  EXCUSES!! 

Dietrich Kappe

unread,
Nov 27, 2019, 6:01:55 PM11/27/19
to LCZero
When I formulated the leela ratio, it was to help make test results comparable, even if only in a rough and approximate way. Otherwise you had match results on a 1060 vs 4cpu and a 1080 vs 1cpu that were inexplicably different. I pegged the 1.0 point at a 875 multiple based on reports of the A0 vs SF8 match. That was not to say that 1.0 was the “right“ ratio, just that we were in the same ballpark as A0, which we hoped to equal and surpass.

There are many flaws with the leela ratio, but we do need some way of comparing tests on unequal platforms like CPU and GPU. I welcome anyone who has a better idea.

Jesse Jordache

unread,
Nov 27, 2019, 6:13:43 PM11/27/19
to LCZero
Even if hardware capabilities start driving things - which hasn't happened yet - think of the Leela ratio as saying "pound for pound".

Also, tensor processors may be improving faster than CPUs, but thus far Lc0 hasn't been able to get around processing bottlenecks.  It doesn't scale up well, so get rid of the Leela ratio, and they bring out Clusterfish (yes, it exists).

NuclearPawn

unread,
Nov 27, 2019, 7:49:28 PM11/27/19
to LCZero
What kind of nonsense are you talking about? Youre basically saying to an Ant "hey don't make excuses for getting wrecked by a bee". True strength is proven on equal grounds. CCRL might as well have SF on 512 cores vs Leela on a GTX960 card. According to your logic this would be fair. No excuses. 


On Wednesday, November 27, 2019 at 5:32:31 PM UTC-5, Pawel SalsaDura wrote:

Pawel SalsaDura

unread,
Nov 27, 2019, 8:01:57 PM11/27/19
to LCZero
Im talking about NuclearPawn kind of nonsense! We dont have to use LeelaRatio at all, instead we can use price to price comparison that will be fair enough for me!  Just bring 700usd processor and set it up  against 700 usd graphic card , that is all that we should carry about! Leela Ratio is noses and it is gonna be worse as time goes by and new cards comes to the market! 

NuclearPawn

unread,
Nov 27, 2019, 8:05:39 PM11/27/19
to LCZero
That's a BS ration. I don't know if you're serious or trolling.

Pawel SalsaDura

unread,
Nov 27, 2019, 8:26:07 PM11/27/19
to LCZero
The one who trolling is you! Price comparison is the simplest way of comparing the potential of new vs old technology!  And the technology that is cheaper and better wins! You dont have to know all technicalities that stand behind it, it is just cheaper and better, that is all! 

Markus Domanski

unread,
Nov 28, 2019, 3:01:02 AM11/28/19
to LCZero
Good marketing?


Am Mittwoch, 27. November 2019 14:36:23 UTC+1 schrieb Sam Jukes:

Hoang Hiep Vu

unread,
Nov 28, 2019, 3:09:36 AM11/28/19
to LCZero
The price can change and it is not *physically* inherent. Leela also uses CPU too while SF does not need GPU. If you want to compare the price, just do not ignore other things like CPU, memory price, so compare the whole system. With a budget of 1000$ machine, a SF machine with 3950X can make Leela machine with RTX 2080 (with a crappy CPU) a hard time. Now there is Threadripper 3970X (32 cores, 64 threads) that SF will like very much.

On the other hand, let's see sth more "physic": the power consumption, with an envelop of power of the whole system, I'm sure SF will win against Leela at the moment.

Shah

unread,
Nov 28, 2019, 8:13:57 AM11/28/19
to LCZero
The physical power consumption idea is nice.
Another important question that might allow such comparison would be who *scales* better?
That is: On the same HW, increase TC in steps.
If starting from a certain TC one engine gets increasingly better, this is the better engine.
Even if it loses in small tc.

Pawel SalsaDura

unread,
Nov 28, 2019, 10:00:48 AM11/28/19
to LCZero
The reference point for such a comparison would be RTX 2060 that cost 350USD and has TDP of 160W. Now, please find a processor within the price range that could compete with this card! You can't!  Lella on this card would outperform all AB engines runs on similarly priced processors, so why as a owner of this card would I be interested in Leela ratio? Does it make any sense? Even more expensive processor would have problem beating 2060 and this is what really matters! Also , the tdp can be taken into consideration together with price range, so it doesnt have to be just price.  We can combine these two factors. As a matter of fact , What you guys trying to do with Leela ratio  is that you're trying to break down the computational power of processors and graphic cards into factors and then directly compare those factors one to another but again, what is the point of that? In the case of graphic cards we have more processing power for less, so this technology is more efficient, it wins! No need for any ratios , just take it as it is!! 

Jonathan Rosenthal

unread,
Nov 28, 2019, 10:14:25 AM11/28/19
to LCZero
It's actually a very complicated matter. RTX 2080 is absurdly overpowered for CCRL when you consider the rating list is quite old and they normalize for hardware from 2005.

I'm not even sure how they deal with the hardware asymmetry, as the normalization is done by reducing time control on faster hardware. If they don't even reduce time control for a GPU based engine then the comparison is even more lopsided. To make matters worse, an engine not utilizing the GPU at all would perform better on the GPU system simply because its time is not reduced to the 2005 standard.

On the flip side if the GPU based engine has its time scaled based on the CPU speed it performance will end up better on a system with a slower cpu...

I feel for projects like CCRL, which have the unenviable task of somehow dealing with heterogenous non-comparable hardware of its various testers now that engine needs are so asymmetric.

DBg

unread,
Nov 28, 2019, 1:08:56 PM11/28/19
to LCZero
Sorry, if i sound rude, but have you heard of control groups? in any science, when trying to establish results, there is a need for controlling variables so that the results stay comparable.  My understanding is that lc zero is not a commercial entreprise racing to get the best gladiator at all times, but works in a reasonably scientific manner, to learn what works and what does not.  I may be biased by my interests, but i do think that the above statement is right....


On Wednesday, November 27, 2019 at 5:32:31 PM UTC-5, Pawel SalsaDura wrote:

Dietrich Kappe

unread,
Nov 28, 2019, 1:56:09 PM11/28/19
to LCZero
The leela ratio is simply a rough benchmark that allows you to compare the relative power of gpu vs cpu across different systems. For consistency it should be performed with a net like 11248 or 32930 against sf9 on however many cores you are planning to use. It’s a compute benchmark, not a chess benchmark.

There is no “right” ratio. The 875 multiple was simply picked to focus matches and tests on the a0 playing conditions, as at the time we were trying to play as well as a0. As gpu’s become more powerful per $, the ratios possible on commodity systems will creep up, a problem you don’t have when ab engines are playing on the same cpu. It will get even more confusing when cpus start to incorporate gpu-like capabilities.

So relax. The leela ratio is just a number, not your enemy.

Sam Jukes

unread,
Nov 28, 2019, 3:17:53 PM11/28/19
to LCZero


I just got an E-Mail from ChessBase advertising that Fat Fritz is now the #1 engine in the world (based off of CCRL). Very disingenuous. 


Picture1.png



Francesco Tommaso

unread,
Nov 28, 2019, 8:56:25 PM11/28/19
to LCZero
This is very sad

glbchess64

unread,
Nov 29, 2019, 1:26:31 AM11/29/19
to LCZero
Fat Fritz is not first on CCRL : you forgot a very important information and also Chessbase (for commercial sake I suppose). There is an error bar that not allow to known really the order for the first 3 engines in the list.

The list just say that FF is strong and is close to Leela an SF 10 level. This is not a discovery !

This CCRL list is very long and error bars are high. Obviously some engines are misplaced. May be the first one.

Jesse Jordache

unread,
Nov 29, 2019, 11:40:13 AM11/29/19
to LCZero
The Leela ratio is the most widely used/understood number for comparing chess apples and chess oranges at this point.  Maybe later it will be supplanted by something less ad hoc, but I'm comfortable with it.

Oh, also, the Lc0 used in that chart was part of the 49000 series, which is not the strongest of of the T40s.

smashu2

unread,
Nov 29, 2019, 3:58:23 PM11/29/19
to LCZero
The Leela ratio is completely arbitrary and was determined by google because at that ratio and the 1 min per move they used Alphazero was beating sf8 convincingly there is nothing scientific or particular about that number other than that is the number deep mind chose. 

Dietrich Kappe

unread,
Nov 29, 2019, 6:26:35 PM11/29/19
to LCZero
You understand that the leela ratio isn’t a number but a formula? Sure picking a particular value, like 1.0, and saying that that is the “right” value, is arbitrary. If you like, you can use a different multiplier and call it something else, like “the curmudgeon ratio.” :-)

Francesco Tommaso

unread,
Nov 29, 2019, 9:26:08 PM11/29/19
to LCZero
I think that what he means is that the ratio selected by Google (Deepmind) on their experiment, later named Leela Ratio by you, is arbitrary. And it is. However, it's an important ratio, since this whole project began trying to replicate Deepmind's work.

Pawel SalsaDura

unread,
Nov 29, 2019, 11:08:08 PM11/29/19
to LCZero
The problem is not with leela ratio itself but rather how some of the leela and AB engines fans overestimate it. They just cant grasp that this formula is only a contractual value, a rough value, and as such can be changed  and with it all previous tests and arrangements  will prove worthless.   This is not a panacea for computing imbalance because this is not about computing only.  This is more about new approach and development than computing. Nobody tells me that to compare Leela's performance on RTX  to SF   we have to use a 90 core processor because  otherwise the result will be  distorted. This is the bigges BS that Leela ratio fans trying  to   tell us.  We can even imagine that with enough training T60 on RTX will become unbeatable for modern processors, what then? Will the ratio be still such an excuse?                                                                                                                                                      

molinac...@yahoo.com

unread,
Nov 30, 2019, 4:07:45 AM11/30/19
to LCZero
I also support testing that will bring cost (TCO) to the spotlight, abandoning the Leela ratio. This is won't give you the reliability of a formula, because prices fluctuate over time and depend on where you live, but at any given time and place, some basics can be stablished.

Some components will be needed to operate either an AB engine or a NN. In both cases you need: monitor, keyboard, mouse, case, PSU, HDD and RAM. These components are of no consequence when comparing. The motherboard on the other hand, should be given extra attention.

For AB engines, you want a strong processor, which means a good MOBO with quality VRM. If you plan on buying with a NN in mind, you can save money on this front, as a low power processor will do the job just fine and the graphics card's VRM, will take care of cleaning up the current delivered to the GPU. 

Of course, you finally have to look at prices for CPU and GPU, but they aren't exclusive. As mentioned, the NN will also use CPU cores (although not as many as you need for AB), but inversely, the AB engine will also need a GPU, even if just for 2D output. The real comparison should be done like this:

(Cheap CPU + cheap MOBO + expensive GPU) vs (expensive CPU + expensive MOBO + cheap GPU)

Lothar Jung

unread,
Nov 30, 2019, 5:22:11 AM11/30/19
to LCZero

Lothar Jung

unread,
Nov 30, 2019, 5:46:53 AM11/30/19
to LCZero

The Leela-Ratio is a linear function, based on the report of A0 vs SF8.
Tests on discord indicated, that it is not working on weak hardware and/or on low TC.
One can see this especially on T60 nets, where tests on TCEC conditions are very much in favour of the T60  ELO strengst.
I assume there should be a minimum of TC and nodes/sec. especially for big nets.
Or perhaps a new ratio based on SF10 and RTX 2080 for small, medium and bigger nets.

Lothar Jung

unread,
Nov 30, 2019, 6:28:40 AM11/30/19
to LCZero

You can see what I mean on the current match on Twitch/Endosani.

Tryfon Gavriel

unread,
Nov 30, 2019, 6:52:32 AM11/30/19
to LCZero
Hi Guys

I am checking one game example from the games link at CCRL which I may do a video about at some point

I put it into a study here:


It seems as though it has very high accuracy compared to Leela in this game example

Very interesting

Maybe others can check the decisive games vs Leela and stockfish

But this Leela ID network is a bit old now isn't it? Leela ID - 49921

Both apparently had the RTX 2080

Cheers, K

Tryfon Gavriel

unread,
Nov 30, 2019, 7:16:55 AM11/30/19
to LCZero
Addendum:

leela-match2019.png

It seems from the 9 games, Leela won the 'virtual match' 6-3 vs Fat Fritz


has Leela playing 1257 games whilst Fat Fritz 753 games

Now I have done basic research methods and it seems that maybe a smaller sample size could lead to a higher rating. Ideally surely they need to be playing the same number of games and same opponents in a tournament format?

However if their virtual match is considered 6-3 seems a pretty good winning score - i discounted all the draws and just filtered in Chessbase those 753 fat fritz games

Leela basically still seems on top given the evidence at CCRL itself. Maybe I am missing some information though.

Cheers, K

glbchess64

unread,
Nov 30, 2019, 7:47:57 AM11/30/19
to LCZero
T49 was nether a good choice since it is roughly equal to T40.T8.610 (which was release before), I don't know why CCRL use it. This net is use by CCRL at very short TC, it is likely that LD2 (smaller and faster) or T40B (same size but stronger) is better at this TC.

Jesse Jordache

unread,
Nov 30, 2019, 10:17:49 AM11/30/19
to LCZero
Strictly speaking, a ratio is a number. :P

FormazChar

unread,
Nov 30, 2019, 7:46:46 PM11/30/19
to LCZero
Which is importaint for a meaningful rating list. Metrics such as nodes/s, power and ops/s are key for comparing result with differening hw.
Reply all
Reply to author
Forward
0 new messages