Hi guys,
Thanks for the responses.
I believe the best approach would be if you could somehow coordinate and try to REPEAT the same test cases/parameters, but with your own test conditions/environment
So imagine Jupiter wants to perform a certain test.
He should specify the parameters in the Test Plan sheet (NN ID to be tested, Time Controls, ...), and inserts his results in the table next to it.
Then Blakely checks the test Jupiter performed, and repeats the same test conditions but under his own environment (CPU, GPU...)
Later on, whenever Blakely for example, wants to test a new NN with some other parameters, Jupiter can again repeat that same test, and we will be able to MEASURE, to COMPARE how the NN is really evolving, as we have different references producing results under same specifications.
This way we will avoid the SMOKE tests (not very meaningful tests) many people are making, which hardly provide any statistical evidence of where we are (we need to set test runs with the same number of samples, i set a fixed number of 50 games) as they are not only not enough in terms of number of samples, but also, we cannot compare against other people's environment.
I hope you all understand that it is not only important to get all the data collected in the same document (so that it is easier to find and validate), but also to actually have a TEST PLAN, so that people willing to contribute with testing, can actually select a certain test case that you design, and others can compare against their hardware.
I know you are all busy and are doing this just to help the project, but if we want to provide some more meaningful data, it would be great if you could make the little effor to check what your other tester budies have done, and simply try to repeat it.
Thanks for the effort!