Proposal to add enwik9 to the "gym"

144 views
Skip to first unread message

James Bowery

unread,
Aug 3, 2022, 2:53:20 PM8/3/22
to Hutter Prize
I've proposed that enwik9 be turned into a game for the OpenAI "gym":

https://github.com/openai/gym/issues/3018

Here is the text of the proposal:

Proposal

Include the Hutter Prize corpus (enwik9) as a "game" for the purpose of sample-efficient reinforcement language modeling.

Motivation

Recently, EfficientZero has demonstrated sample-efficient reinforcement learning in the gym. The interest in large language models generally ignores 3 deficiencies: model size, sample size and planning. Ignoring model size is particularly egregious as it implicitly ignores the strongest theorem in unsupervised model selection: Solomonoff Induction (ie: approximation of Algorithmic Information of the sense data) to generate optimal predictions. The sample sizes of the LLM corpora are intractable for all but the wealthiest institutions. The lack of LLM planning ability implies a general inability to perform in dynamical environments.

Pitch

Consider a simple "game" that consists of one of 2 moves each time step -- 0 or 1 -- resulting in a positive or negative reinforcement based on whether it predicts the next bit in the enwik9 corpus or not. Obviously, the agent could simply contain the enwik9 corpus and score perfectly. However, this very fact points to the motivation for Solomonoff Induction as part of an AGI that must learn from its environment. Therefore, while the utility function for the agent would remain its hits-misses, the SoTA measure would be the size of the agent itself -- the degree to which it approaches the Solomonoff Induction model (aka the Algorithmic Information content of the agent's observations). While there are bound to be a variety of ways of measuring this size, as well as a variety of ways of incorporating resource utilization (TPUs, CPUs, time, etc.) into any SoTA metric, such is the case with other SoTA measures in the gym.

Such a reinforcement learning agent would define "scientist" in rigorously operational terms and, in approximating the actual information content of Wikipedia, provide epistemological insights into that critical resource.

Finally, as EfficientZero is a sample-efficient derivative of MuZero and MuZero has inspired the bold hypothesis that "Reward Is Enough" it would be most interesting to see the various approaches to reward-driven science arising from the competitive environment of the gym.

Alternatives

The Hutter Prize for Lossless Compression of Human Knowledge and The Large Text Compression Benchmark were both constructed to advance epistemology through knowledge modeling, but the field's lack of familiarity with Solomonoff's fundamental finding in AGI, combined with the profligacy of computational resources and data, has led to a situation in which SoTA performance metrics are largely eliding a foundational component of AGI.

Reply all
Reply to author
Forward
0 new messages