NiNOR Complexity

53 views
Skip to first unread message

jabowery

unread,
Apr 15, 2024, 9:52:20 PMApr 15
to Algorithmic Information Theory
Is the following thesis an adequate counter to the philosophical nuisance of KC's open parameter of UTM choice in AIT? 

If not, what does it lack (accepting it needs rigor involving such things as an integer parameter for the size of RW memory DCGs such as flipflips and address decoders etc.)? 

If so, is there prior research that I missed?  

Thesis

The Algorithmic Information Criterion (AIC), based on Kolmogorov Complexity, is a valid and principled criterion for model selection in the natural sciences, despite objections regarding the choice of the underlying Turing Machine.

Supporting Arguments

  1. Universality of NiNOR Gates: A finite, directed cyclic graph (DCG) of N-input NOR (NiNOR) gates can, given a starting state, perform any computation that a Turing Machine with finite memory can execute. This universality suggests that the choice of a specific computational model can be principled, akin to choosing an axiomatic basis in mathematics.
  2. Minimization of NiNOR Complexity: By creating an instruction set emulation program that simulates a directed cyclic graph of NiNOR gates (which in turn provides the instruction set), and another program written in that instruction set to output a given dataset, a parameter-free definition NiNOR Complexity is established: The minimum length of these two programs. Note that since both programs are written in the same non-arbitrary instruction set, this factors out any arbitrary Universal Turing machine choice that that might be chosen to emulate the instruction set emulator.
  3. Philosophical Consistency with Scientific Methods: By removing an "arbitrary" parameter from Kolmogorov Complexity's definition of Algorithmic Information, Solomonoff's proofs can be revisited without an parameter any more subjective than the proof of NOR gate universality. All that must be given up is the notion of infinities. Moreover, this revised definition of an Algorithmic Informatin Criterion for model selection retains its relevance to the dynamical systems of the natural world -- a decisive advantage over statistical information criteria.

Counterarguments

  • Claim of Arbitrary Turing Machine Choice: Critics argue that the choice of Turing Machine in determining Kolmogorov Complexity is arbitrary because one can tailor a machine's instruction set to trivially minimize the complexity of any given dataset.
  • Reductio ad Absurdum on Turing Machine Instruction Set: Critics might use a reductio ad absurdum approach by proposing a Turing Machine whose instruction set includes a command that outputs the entire dataset in a single instruction, thus falsely reducing the complexity to an absurdly low value.

Rebuttals

  1. Non-Arbitrariness in Computational Model Choice: The choice of a particular model and its instruction set reflects underlying computational principles (e.g., the universality of NiNOR gates) and is not more arbitrary than foundational decisions in other scientific and mathematical fields.
  2. Logical Flaw in Critics’ Argument: The critic’s approach to arbitrarily defining a Turing Machine’s instruction set to minimize complexity does not properly consider the complexity of the instruction set itself in which the dataset is encoded. By focusing on trivializing the output instruction, they overlook the broader implications of the instruction set’s design, which fundamentally contributes to the system's overall complexity. This misrepresents the principle of Kolmogorov Complexity, which aims to measure the minimal description length of the dataset in a way that genuinely reflects its informational content, rather than artificially minimizing it through tailored instruction sets.

Conclusion

The critique against the Algorithmic Information Criterion (AIC) using Kolmogorov Complexity based on the arbitrary choice of Turing Machine does not withstand scrutiny. Proper understanding and application of AIC demonstrate that it robustly captures the essential complexity of datasets consistent with Solomonoff's proofs. This complexity includes the design of the instruction set itself, which should not be arbitrarily minimized to misrepresent the dataset's intrinsic informational content. Thus, the AIC remains a principled and effective method for model selection in the natural sciences. Indeed, prior criticisms based on the supposed subjective choice of UTM are considered not only specious but harmful to the scientific enteprise.

Gabriel Leuenberger

unread,
Apr 16, 2024, 9:50:08 AMApr 16
to Algorithmic Information Theory
Yes, here is one more argument a critic could bring: The critic could agree with your reference machine but then still reject the criterion afterall.
He would agree with the reference machine because its simplicity makes it free of any bias, as you explained. But instead of going along with your criterion, the critic would then rely on the universal prior m(x), which would arguably be better than K(x). You could then counter him via Levin's coding theorem K(x) = -log m(x) +O(1) which says that the two criterions would be almost the same. But then the critic could retort by pointing out that the additive term  +O(1) coud be too large, thus invalidating your criterion and invalidating Occam's razor for many practical purposes, where the difference between the complexities of theories in question would be smaller than the upper bound of the additive term. And this is the reason why I started a different thread where we can fix the additive term and save Occam's razor. See here:  https://groups.google.com/g/ait0/c/yyGViH6dzCw/m/eqm0b-GnAAAJ

Cole Wyeth

unread,
Apr 16, 2024, 5:33:31 PMApr 16
to Algorithmic Information Theory
Good afternoon,

What you are describing seems to be closely related to my (and Carl Sturtivant's) "Index Complexity," which is essentially circuit complexity. 

I've attached a pdf of the Physica D paper.

Best,
Cole Wyeth
Circuit_complexity_AIT.pdf

Gabriel Leuenberger

unread,
Apr 16, 2024, 5:53:54 PMApr 16
to Algorithmic Information Theory
This indeed seems to match Bowery's idea. In your paper you also write:
"In Section 5.2 we derive upper and lower bounds on the number of errors we should expect to make with this approximation. These results are loosely analogous to the coding theorem in algorithmic information theory"  
So, did you by any chance also manage to obtain a smaller additive constant than they had in the original coding theorem (Levin) of AIT?

Cole Wyeth

unread,
Apr 16, 2024, 6:43:51 PMApr 16
to Algorithmic Information Theory
The bound I derived was more of a frequentist result. It concerns the performance of a predictor based on the smallest circuit, so probably it is more similar to MDL than the coding theorem. I'm guessing my proof could be rephrased in terms of non-uniform PAC learnability. 

I did not actually related lg mu(x) to I(x). This seems somewhat difficult to do because there are many circuits of each size, but it would be an interesting question to investigate. The direction lg mu(x) <= I(x) holds without any constant error when nu is 2^(-n) and probably holds more generally. I think the hard direction is lg mu(x) >= I(x) + O(1). For a first step, any string computed by a small circuit should also be computed by many larger circuits. The exact statistics are a matter of combinatorics on DAGs. If you would like to pursue results on this (with or without my help) I'd be happy to chat and excited to see what you find!

Best,
Cole

James Bowery

unread,
Apr 20, 2024, 2:20:21 AMApr 20
to ai...@googlegroups.com
It's important to retain the generality of finite directed cyclic graphs, as I originally described.  Although this is, in a non-trivial sense, also at a low level in the Chomsky Hierarchy it is nevertheless true that all physically realizable computers are finite state.  

I'm acutely aware of this and its implications.  But in the final analysis, there is a distinction between the semantics of the Turing complete codes we chose and the underlying physical realization of that Turing complete fiction.

4 years ago I took some heat in this very group about my rewrite of the MDL Principle article in Wikipedia. That rewrite was necessary because Cambridge's pandemic modeling contact told me that the Algorithmic Information Criterion was the same as the Bayesian Information Criterion!  The phrase "The  Minimum Description Length Principle" had originated in a 1970s Turing INcomplete "MDL" by Rissanen.  We would all like to believe we can get by without taking, with deadly seriousness, such errors, and "Trust The Science''.  But now that we are facing quite plausible existential risks imposed by the failure to distinguish between "IS" and "OUGHT" within AIXI's two components,  AIT and SDT, it is no longer an ethical position to be lax about DAGs vs DCGs.

So, please, I beg of you, despite my resort, in NiNOR DCGs as a quasi-Godel Numbering, do NOT set us back to the 1970s with Rissanen's attempt to render "AIT" computationally tractable.

More information: http://www.hutter1.net/ait.htm
---
You received this message because you are subscribed to a topic in the Google Groups "Algorithmic Information Theory" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ait0/D1wd2fV6Ax4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ait0+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ait0/b804ad2e-3a8f-4661-b95c-c3b08dfb5aa9n%40googlegroups.com.

David Dowe

unread,
Apr 20, 2024, 4:45:31 AMApr 20
to Algorithmic Information Theory
Hi James and all, and I trust that all are well.

Both re the use of the acronym AIC and re the stated history of MDL and related work (e.g.,
algorithmic information theory and the work of R J Solomonoff), I feel now might be a good time
to offer some comments.


First, re the term Algorithmic Information Criterion and the acronym AIC that I believe was used
for this in recent days in this thread, a couple of comments.  Akaike's Information Criterion is a
statistical approach that has been used in machine learning (e.g., neural networks) and has been
referred to as AIC for decades.
My readings of the writings of Ray Solomonoff (1926-2009) include mention of the term
"algorithmic probability", which Ray also abbreviated to ALP - indeed, I recall once discussing with
Ray that the (political party) Australian Labor Party uses this same acronym.
Given Ray's use of (his term) ALP (and I can give references, including to a survey I did of Ray's work)
and given that AIC has been used for decades to refer to Akaike's Information Criterion - given these
two precedents - I suggest taking care regarding acronyms, for the sake of clarity and/or disambiguation.


Next, while I can understand the mention of J Rissanen's minimum description length (MDL) work
from 1978 and subsequently, I note the seeming omission in this thread to date of the decade-earlier
minimum message length (MML) work of Chris Wallace (1933-2004) and colleagues, going back to
C S Wallace and D M Boulton (Computer J, 1968).
One place where one can see the relevant MML references prior to 1978 spelt out is p901 (1st page)
Here are some relevant works involving Chris Wallace and David Boulton from 1968 to 1975:
    Christopher S Wallace and David M Boulton. An information measure for
classification. Computer Journal, 11(2):185–194, 1968.
    David M Boulton and Christopher S Wallace. The information content of
a multistate distribution. Journal of Theoretical Biology, 23(2):269–278, 1969.
    David M Boulton and Christopher S Wallace. A program for numerical
classification. The Computer Journal, 13(1):63–69, 1970.
    David M Boulton. Numerical classification based on an information measure,
M.Sc. thesis, Basser Computing Dept., University of Sydney, Sydney, Australia, 1970.
    David M Boulton and Christopher S Wallace. A comparison between information
measure classification. In Proc. of the Australian & New Zealand Association for
the Advancement of Science (ANZAAS) Congress, August 1973. Abstract.
    David M Boulton and Christopher S Wallace. An information measure for
hierarchic classification. Computer Journal, 16(3):254–261, 1973.
    David M Boulton and Christopher S Wallace. An information measure for
single link classification. Computer Journal, 18(3):236–238, 1975.
    Christopher S Wallace and David M Boulton. An invariant Bayes method
for point estimation. Classification Society Bulletin, 3(3):11–34, 1975.
    David M Boulton. The Information Measure Criterion for Intrinsic Classification,
PhD thesis, Dept. Computer Science, Monash University, Clayton, Australia, August 1975.

As above, Dowe (2011, page 901) is one place where these MML references from
1968-1975, prior to 1978, are spelt out.  I can find at least one other such place if
anyone is interested.

J J Rissanen's MDL and G Schwarz's Bayesian Information Criterion (BIC) were both
published in 1978, both advocating the same approach back then in 1978.  (Akaike's AIC
pre-dates these, and MML pre-dates all of these.)  In the time since 1978, MDL has evolved.

Whatever one's views on MML (going back to 1968) and MDL (going back to 1978), given the
discussion in this thread re Solomonoff-Kolmogorov complexity and algorithmic information
theory (AIT), I mention https://doi.org/10.1093/comjnl/42.4.270 (Wallace and Dowe, Computer J,
1999a, "Minimum Message Length and Kolmogorov complexity").  That issue of the Computer J
- namely, Volume 42, Issue 4, 1999 https://academic.oup.com/comjnl/issue/42/4 - also includes
contributions by R J Solomonoff, J J Rissanen, (former PhD student of A N Kolmogorov) L A Levin,
and other relevant researchers.


P.S.: Partly a little off topic, this https://academic.oup.com/comjnl/issue/51/5 (2008) surveys
C S Wallace's work - not just MML but also (e.g.) random number generation, fast multiplication, etc.
The notion of universality probability - later dissected in http://dx.doi.org/10.1098/rsta.2011.0319 
(G Barmpalias and D L Dowe, 2012) - is also due to Chris Wallace.


Keep well,  thank you for your consideration,  cheers  and  sincerely    from David D
   
You received this message because you are subscribed to the Google Groups "Algorithmic Information Theory" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ait0+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ait0/CAN%3DDHyZx%3DoMEdqLCMDGUFwe8691qJ7_1-HLsOKMoMy1rfW9yPw%40mail.gmail.com.

Cole Wyeth

unread,
Apr 20, 2024, 11:03:51 AMApr 20
to Algorithmic Information Theory
Hi James,

I don't think I understand the connection between requiring cyclic graphs and the paragraph about MDL. In fact I am unaware of the history in that paragraph do not understand how it is relevant.

The accusation that I'm being "unethically" lax about the distinction between DAGs and DCGs makes no sense to me. I'm aware of the distinction between uniform and non-uniform models of computation, which should be quite clear in my paper. I used circuits to study the complexity of binary functions and because neural networks seem best approximated by circuits. Solomonoff induction is probably a better model for optimal inductive inference. These are just two different problems. But my work is clearly related to your original suggestion and request for prior research.

By the way, if you wanted to stick with a universal TM instead of a finite state machine, you could use the universal TM that accepts circuit descriptions of a transition function.

James Bowery

unread,
Apr 20, 2024, 12:54:22 PMApr 20
to David Dowe, Algorithmic Information Theory
In defense of "AIC", and with acute awareness that Akaike Information Criterion has not only occupied that acronym, but is the most widely cited information criterion with the possible exception of Bayesian:

The gold standard information criterion requires Turing complete codes and none of  Wikipedia's 20 information criteria qualify.  Even "GIC" (which is not so-listed) for Gold standard Information Criterion yields  "Generalized Information Criterion".

It is no hyperbole to characterize this as a disastrous state of affairs.  The veritable explosion of information criteria is symptomatic of an underlying intellectual malaise qualitatively on par with physics getting stuck with ancient Greek kinematic mode of description. Imagine relegating Newton's dynamics so esoteric as to elicit puzzlement at those "fringe" physicists who insist on it.

Statistics, in accord with the name, rarely ventures into the recurrent (let alone recursive) mode of description required by dynamics hence reality hence the very notion of truth.  

Choosing to violate the memetic "turf" of Akaike is not in spite of the fact that it is the most widely cited information criterion, but precisely because it is the most widely cited information criterion.  The time is long past-due to put into perspective the entire field of statistical "information" criteria.

"Parameter" counts?  Good grief.  Give me one "parameter" and I can, with arithmetic coding, encompass the universe. 

Every single bit of algorithmic information is a parameter in the Gold Standard.

Reply all
Reply to author
Forward
0 new messages