Getting Lossless Compression Adopted For Rigorous LLM Benchmarking

111 views

Skip to first unread message

James Bowery

unread,

Jan 17, 2024, 5:17:02 PM1/17/24

to Hutter Prize

The increasing recognition that "Language Modeling Is Compression" has not yet been accompanied by recognition that lossless compression is the most principled unsupervised loss function for world models in general, including foundation language models in particular.

Take, for instance, the unprincipled definition of "parameter count" not only the LLM scaling law literature, but the Zoo of what statisticians called "Information Criteria for Model Selection". The reductio ad absurdum of "parameter count" is arithmetic coding where an entire dataset can be encoded as a single "parameter" of arbitrary precision.

By contrast, the algorithmic bit of information (whether part of an executable instruction or program literal) is an unambiguous quantity up to the choice of instruction set. If you want to quibble about that instruction set choice, take it up with John Tromp because what I'm about to propose obviates that along with a lot of other "arguments".

Since any executable archive of any kind of data can serve as a model of the world generating that data, it follows that any executable archive of any text corpus can serve as a language model with a rigorous "parameter count". Therefore, a procedure which runs LLM benchmarks against any such executable archive as a language model, contributes a uniquely rigorous data point to the literature on LLM scaling laws.

So, what I'm proposing is that authors of lossless compression algorithms consider adding a command-line option that, at the end of decompression, saves the state of the decompression process in a file that can be read back in and executed as a language model -- with the full understanding that these language models will perform very poorly on the vast majority of LLM benchmarks. The point is not to produce high quality language models. The point is to increase rigor in the research community by providing some initial data points that exemplify the approach.

Reply all

Reply to author

Forward

0 new messages