Publishing sources should probably be the last requirement

131 views
Skip to first unread message

Григорий Назаров

unread,
Apr 27, 2024, 10:08:13 AMApr 27
to Hutter Prize
Hello!

The problem is, submission validation is not an easily predictable process. For example, it might be that compressor will be working much slower on the test machine because some loop wasn't vectorized for particular architecture. Even the same highly optimized binary code can give easily 50% speed difference on different processors. Some UB can lead to crash on particular glibc version. Geekbench provided doesn't match the test machine - there's only 4 GB RAM. All in all, it's impossible to be 100% sure that everything will go just fine. Or even nearly fine.

Why is it important? Because it's a contest. There's no one correct solution. It specifically requires to push for a less-than-percent increases. And even a 10% difference in performance might be killing.

There's a possible solution "we report the problem, you fix it". But it doesn't cover everything. Finding a UB one cannot reproduce is very interesting task. Highly optimizing code for the machine one doesn't posses is very interesting as well. This might easily take weeks, or require significant rewrites.

Meanwhile, the source code is already published - with documented sources and even algorithm description. With a license that explicitly allows any legal use. What will happen if in a week or two some other contestant publishes another solution with the same idea? While the process of 'fixing' is still on. What if rewriting takes a month or two?

It doesn't seem to be fair to forbid using the same idea. But this makes original publisher extremely vulnerable. My solution - require publishing sources and idea only when all other criteria are met and validated. Then, the 30-days period starts. Moreover, I guess the best option is to require properly licensed sources with the submission, but without publishing them yet. This allows to validate 'well-documented' clause, and actually OSI licenses explicitly allow publishing sources without author involvement later. That's what these licenses were designed for after all.

Sorry for the long text,
Thanks for your attention,
Grigorii

James Bowery

unread,
Apr 27, 2024, 11:59:01 AMApr 27
to Hutter Prize
TLDR: We already do not require publication of source code prior to the public comment period.  We should probably emphasize this fact in the formal rules.

First of all, no need to apologize for your "long text" as it is important for us to understand the concerns of prospective contestants who are serious enough to understand the nuances of the competition, as you clearly have.  Although your questions are partially answered at the Hutter Prize website's rules and FAQ, it is nevertheless good to get feedback such as yours that may help us refine that website's documentation.  The very thing that makes winning an indicator of contestant value, virtually guarantees that the contestant will have invested much value that is due much respect.

More below...

On Saturday, April 27, 2024 at 9:08:13 AM UTC-5 whitec...@gmail.com wrote:
Hello!

The problem is, submission validation is not an easily predictable process. For example, it might be that compressor will be working much slower on the test machine because some loop wasn't vectorized for particular architecture. Even the same highly optimized binary code can give easily 50% speed difference on different processors.

The "Spirit" of the Hutter Prize's hardware resource constraints is to avoid "The Hardware Lottery" described by Sara Hooker.  By avoiding that "Lottery", we minimize bias in scientific research directions while also encouraging broad participation.  General purpose CPUs are the least biased and most widely available implementation of the UTM fiction.  Your nuanced concerns about vectorization, etc. have, thus far, been handled by judges bending over backwards to be accommodating to contestants.  Ask prior winners if you need assurance.  It may be that we'll need to go to semi-automation of the judging process, which would require that contestants rent time on a cloud instance for their pre-submission testing.
 
Some UB can lead to crash on particular glibc version.

Worse, all recent submissions have crashed on the Ryzen architecture.  This has not blocked awards to contestants although it has impeded judging convenience as much as convenience to contestants.  The inconvenience has not resulted in anything remotely like 5 weeks of full time work by either parties and is quite unlikely to do so.  It's been more like a day or two resolve.

Geekbench provided doesn't match the test machine - there's only 4 GB RAM.

The 4GB RAM is irrelevant.  That is a very old Geekbench result before the 10x increase in the Hutter Prize payouts and corresponding increase in resource permitted, including RAM to 10GB.


 
All in all, it's impossible to be 100% sure that everything will go just fine. Or even nearly fine.

Indeed, for the judging process, things have not gone "just fine. Or even nearly fine" but, again, this has inconvenienced the judges more than the contestants.

More relevant, if not urgent, is the need for the rules to upgrade from Geekbench 5 to Geekbench 6, as well as increasing the number of machines that may be considered a "test machine" including a cloud machine.  

Why is it important? Because it's a contest. There's no one correct solution. It specifically requires to push for a less-than-percent increases. And even a 10% difference in performance might be killing.

In practice, none of the issues you have raised impacted actual awards.  If it turns out that, for example, utilization of AVX-512 instructions results in a restriction in the population of contestants to those who possess these more advanced CPUs, that is something we'll likely have to live with.  Thus far we have not had to deal with this particular edge case as decisive.

There's a possible solution "we report the problem, you fix it". But it doesn't cover everything. Finding a UB one cannot reproduce is very interesting task. Highly optimizing code for the machine one doesn't posses is very interesting as well. This might easily take weeks, or require significant rewrites.

 The case you're talking about pertains, as you say, "particular glibc version".  If there is a particular glibc version required for a winning submission to work, then judging will be flexible.  We require that the installation instructions include such requirements for the same reason require the results to be reproducible by those independent of the contestant.

Meanwhile, the source code is already published - with documented sources and even algorithm description.

That is only in the case of the judges determining that it has already passed muster with them.  Priority The public comment period is not intended to be decisive but rather to 
 
With a license that explicitly allows any legal use. What will happen if in a week or two some other contestant publishes another solution with the same idea? While the process of 'fixing' is still on. What if rewriting takes a month or two?

Even in the unlikely scenario where a substantial rewrite is required due to feedback during the comment period (which is after the judges have satisfied themselves that an award is due, and is the only time publication is necessary) if someone else submits an entry it will have to start at the closed source period of the judging process, during which the judges will be demanding full documentation of the improvements.  If there is overlap with the rewrite period of public comment of the original submission and the new submission already embodies the rewrite, it will be clear that the new submission is little more than assistance provided to the original contestant, as well as demonstrating the rewrite is not likely to require the original contestant much labor. 

In any event, the judges will be extremely biased toward the original contestant even if the rewrite takes a long time.  If, on the other hand, a malicious contestant engages in a fraudulent submission that the judges do not catch due to the documentation lying when the only real "improvement" was fixing the issue that arose in the aforementioned highly unlikely situation, it will be apparent to the original contestant if no one else, and the judging bias toward the original contestant.

It doesn't seem to be fair to forbid using the same idea. But this makes original publisher extremely vulnerable. My solution - require publishing sources and idea only when all other criteria are met and validated.
 
Then, the 30-days period starts. Moreover, I guess the best option is to require properly licensed sources with the submission, but without publishing them yet. This allows to validate 'well-documented' clause, and actually OSI licenses explicitly allow publishing sources without author involvement later. That's what these licenses were designed for after all. 

See TLDR
 
Sorry for the long text,
Thanks for your attention,
Grigorii

Thanks for the feedback, Grigorii! 
Reply all
Reply to author
Forward
0 new messages