A network, in this case the 11248 is just a file with several weights. You can't optimize a network without changing it's nature. The magic happens with the way that these weights are found.
There is a theoretical optimal configuration of the weights, which would lead to the best performance given the size of this network. The way weight configurations are achieved is through training. The training happens with a network playing against itself, but with high variability of move choices (high temperature) that guarantees exploration and novelty, finding of new ideas, which generate new networks that incorporates those ideas. And so and so...
The parameters tuning made lately are mainly related to two things:
1 - How the exploration (moves considered) is done, given a certain amount of nodes searched: should it be more swallow but more diverse, or should it focus on less moves diversity, but go deeper into those variations?
2 - How those things that were learned are applied to the new weights: for example, how important are such principles found in those specific cases (novelties)? Should they be given more or less importance to than those that it already knows or already though were optimal?
My explanations is far from precise, but I hope it gives some clues about the internal workings of the training and can help with my argumentation.
Deepmind already made optimisations in these parameters. Of course, their focus is more open than ours, since they are not inherently worried specificaly with Chess, but with board games like Go and Shogi. Given this, it is fair to conclude that we can find better parameters for Chess than the originals.
The problem is that all these parameters tuning tests are being done with big networks and we only know the result months later. If we are to try to surpass Deepmind's tuning we should do this in a more efficient way, as pointed by Mike, with smaller networks.
One parameter which we tested, and it was the first one, was network size. The project started with very small network and went slowly increasing its size. It wal clear that while smaller networks were reaching plateau, bigger networks surpassed these with the same parameters other than network size.
When we eventually reached 20x256 size, the same as Alpha Zero, we simply stopped. The best network we have so far, even with bugs, is the 11258. Which parameters were used? Deepmind's! The result: a network that IS Alpha Zero. It plays almost identical, the benchmarks also show that it's elo is very close to those of Alphazero. Lots of benchmarks under same conditions as the paper lead to the same result. Dietrich also showed some of these.
How do we surpass Alpha then? According to the strategy currently in execution, by outsmarting the guys on Deepmind, finding better parameters in the first or second try, since we can't afford to keep years testing parameters.
My humble suggestion: lets increase network size. Take a look at the difference between the 15x192 and the 20x256. With the same parameters you end up with a much stronger network. Why not increase it's size a little bit and see?
Until now no one gave a good argument against it. I would be very satisfied if someone came here and said somenthing like: "Hey, Francesco, you are an idiot. It does not work because... (and gives good solid argument against)." I would be very relieved to know that we are going through the best path, even if I am an ignorant fool.
The time consuption argument, that bigger networks take more time to train, is flawed. We wasted more than 100 million games with tests that until now look like complete failures (I hope I am wrong!)
With this amount of games we could have fully trained a 28x362 network (41% increase size in each dimension, which leads to 1.41^2 time increase, or two times current time, which is of 48 million games at current time, considering the 48 million games of 11258).
Regards,
Francesco