"...identify relevant sub-networks during training and promote the prunability of the network. However, these approaches do typically not lead to training speedups
Sara Hooker's influential paper "
The Hardware Lottery" is all about this kind of bias in
research (as distinct from development aka engineering aka technology) into the nature of learning. In the current instance, it pays to notice three things:
- The very term "GPU" exemplifies the lottery's "randomness" in that "G" stands for "Graphics" meaning the hardware was repurposed from gaming graphics to machine learning. We were lucky that gaming is a huge market capable of capitalizing companies like nVIDIA.
- Sparsity means fewer "parameters" (ie: algorithmic bits) to model the data under consideration (ie: "corpus" in the case of language models), hence the relevance to Solomonoff Induction.
- "prunability" to attain sparse models is very much in line with what we observe in the neocortex as the infant brain starts out more connections than the adult brain, hence the relevance to the natural science of cognition.
By comparison, Matt Mahoney's
Large Text Compression Benchmark, by removing the resource constraint, is about
development of current infrastructure that enables
economically valuable engineering advances subject to those economic biases.
Both of these approaches provide unique value to the Machine Learning world. Those who bridle at the Hutter Prize resource constraints should consider the above quote regarding sparsity being ignored because of The Hardware Lottery, and recognize that Matt's Large Text Compression Benchmark deserves a prize of its own that they may wish to underwrite.