Attempt to overcome difficulties with implicit knowledge in Bayesian updates?

Clément Michaud

unread,

Mar 14, 2025, 2:28:12 PMMar 14

to open-nars

Hey Pei,

I have been thinking about the nature of uncertainty and probabilities lately and the difficulties we have with standard probabilities in AI. I have written an article on Medium to try formalize one important problem I see. Writting it made me think about an article you wrote on Bayesian updates in 2004 and that I read few months ago. Here is the article: https://medium.com/@clement.michaud/the-limits-of-probability-theory-can-we-model-the-truly-unknown-09e98237ef7f

I have two questions for you:

1. do you think the problem I am describing in my article is connected to the limitation you were describing in your article?

2. if yes, have you attempted to develop a mathematical framework limited to solve that specific problem mathematically? I mean regardless of what you do in NARS.

Thanks,

Clément

Pei Wang

unread,

Mar 15, 2025, 6:34:02 PMMar 15

to open...@googlegroups.com

Hi Clément,

I agree with your writing. This issue is not completely unnoticed - people have been talking about "known unknowns vs. unknown unknowns", though few have traced this issue to probability theory. I think the reasons include

Since a sample space can be infinite, it also includes the unknown outcomes (as you mentioned).
In principle, all "possible worlds" can be described using a representation language, so there is nothing that is really "unknown" (as Rudolf Carnap's Logical Foundations of Probability, 1950).
It can be solved using technical tricks, such as introducing "virtual nodes" into Bayesian networks, which are not explicitly expressed but used to explain certain phenomena, so it can be considered as extending the sample space.

Of course, I don't think any of them solves the problem, but they make the problem less obvious, as if the sample space is not a real limitation. The practical reason is that in traditional statistics, continuous and incremental revision of probabilistic models is not absolutely necessary. It is now inevitable in AI, but the current mainstream is after whatever "works" and pays little attention to this kind of theoretical issue.

Regards,

Pei

--
You received this message because you are subscribed to the Google Groups "open-nars" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-nars+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/open-nars/a3b40440-b6f5-4a1a-ab85-3bd828b41cabn%40googlegroups.com.

Pei Wang

unread,

Mar 15, 2025, 6:40:57 PMMar 15

to open...@googlegroups.com

As for the 2nd question, I don't see any alternative other than a NARS-like solution by giving up a consistent distribution function (call it probability or something else) and letting each belief have its own evidential base and degree of support, which will need two numbers (though they can be interpreted differently), as argued in Formalization of Evidence: A Comparative Study.

Pei

On Fri, Mar 14, 2025 at 2:28 PM Clément Michaud <clement....@gmail.com> wrote:

--

Clément Michaud

unread,

Mar 15, 2025, 8:56:27 PMMar 15

to open-nars

Good to know that we are talking about the same thing. It's funny that I discovered the link with your paper at the very end of my exploration. I literally made the connection right before I was about to publish my article and I edited it accordingly.

Regarding your response, to anyone arguing that we can take the biggest space as the sample space, I would argue that this would lead to many paradoxes... Burali-Forti, Russell's paradox and what not. Also, if the sample space is the biggest space, given this concept of biggest space even exists, one cannot take the power set as the event space, because there should be nothing bigger than the sample space... My research led me to acknowledge that there is probably no such thing as a biggest space, there rather exists an inifinite hierarchy of spaces (proper classes or Gothendriek universes) which is somehow also in agreement with dependent type theory having hierarchies of types. So now the question is how do we make the connection between one space and the next?

Clément

Pei Wang

unread,

Mar 16, 2025, 4:37:35 PMMar 16

to open...@googlegroups.com

Even if the relationship between sample spaces is clear, the probability distribution functions on them cannot be "extended" from one to a larger one but must be redefined. That is why a "Dynamic Sample Space" is practically impossible. Using your die example, if a 7th outcome is introduced, there is no easy way to locally modify the probability distribution.

Regards,

Pei

To view this discussion visit https://groups.google.com/d/msgid/open-nars/e072e899-fee7-4a9a-89a0-33ee62c2a873n%40googlegroups.com.

Christopher Hong

unread,

Mar 21, 2025, 3:50:55 PMMar 21

to open-nars

Hopefully it's ok to briefly chime in here. I've been following this thread and have been interested in this topic.

I think there is some nuance to the concept of sample space and epistemic uncertainty. Taking computer vision as an example, the sample space for computer vision problems (even ones that change over time) all have an approximately fixed sample space, namely (I, J, R, G, B, T), where (I, J) is the image dimension, (R, G, B, ...) is the color dimension, and T is the time dimension. Computationally, this sample space varies only in a small number of likely inconsequential, mathematically fixed ways that are handled quite well by deep learning methods. The advent of deep learning introduced the concept of intermediate sample spaces that could be learned in the I.I.D./backprop framework. I think it should be obvious from a deep learning perspective that the further up the layers you move, the more problematic epistemic uncertainty becomes, but most novel things in upper sample spaces (as defined by intermediate embeddings) are just different combinations of things already seen on lower layers, and quite rarely do you find a need for a truly novel variable (aka perceptron in DL) the lower you go.

Assuming that some method of deep learning is viable for AGI, I think the problem is in the I.I.D./backprop framework that forces a division between training and deployment, as well as fixing the neural net architecture.

Of course these are just my own biases.

jeff Thompson

unread,

Mar 22, 2025, 4:54:57 AMMar 22

to open...@googlegroups.com

Hi all. Just a note that Pei's paper "On Defining Artificial Intelligence" was cited on Machine Learning Street Talk. This was during an interview with Iman Mirzadeh who is very politely trying to say that he doesn't believe that LLMs do reasoning.

https://www.youtube.com/watch?v=yQPduek-Q5s&t=2128s

Cheers,

- Jeff

Christopher Hong

unread,

Mar 22, 2025, 11:07:00 AMMar 22

to open-nars

Personally I think it would be interesting to see a mathematical framework incorporating epistemic uncertainty that could: 1. Directly augment the perceptron building block in DL 2. Facilitate (limited) neural growth or appropriate pruning 3. Enable non IID backprop

Pei Wang

unread,

Mar 22, 2025, 7:18:24 PMMar 22

to open...@googlegroups.com

Hi Christopher,

At the front end of a modality, such as vision, the "sample space" is typically constant in terms of recognizable stimulus. However, this is not the case for the stimulus patterns, which is the space where the decisions are made. Take language as an example, the English alphabet is fixed, but the space of English words is not. Even if a limit is put on the length of a word, it is still impractical to assume a fixed sample space of words on which a probability distribution function can be defined and maintained, as language usage changes (which challenges the I.I.D. assumption). As soon as "new words" are allowed, the sample space is not constant anymore.

Regards,

Pei

To view this discussion visit https://groups.google.com/d/msgid/open-nars/b20ac7aa-58f2-4767-bc1d-e4014dc1871an%40googlegroups.com.

Pei Wang

unread,

Mar 22, 2025, 7:24:28 PMMar 22

to open...@googlegroups.com

Hi Jeff,

Thanks for the message. It's right on time, as "Reasoning in LLM" will be the topic of my coming TRiPS talk, in which I will mention the article by Iman Mirzadeh et al.

Regards,

Pei

--

You received this message because you are subscribed to the Google Groups "open-nars" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-nars+...@googlegroups.com.

To view this discussion visit https://groups.google.com/d/msgid/open-nars/CACSdSzrJx2w9A-KWf5zdyLvH7BbFx4TEzEgtif%3D40LdfoAoY5A%40mail.gmail.com.

Christopher Hong

unread,

Mar 23, 2025, 11:22:36 AMMar 23

to open-nars

I would agree in distinguishing "hard" vs "soft" modalities (those grounded in physical limitations vs. not). Language is a product of culture. It could only be a hard modality if human culture and operation were fixed, but this is clearly a bad assumption.

Clément Michaud

unread,

Mar 25, 2025, 5:45:20 AMMar 25

to open-nars

Hello Christopher,

Thanks for interacting in this thread, I like observing the various points of views.

Personally I make a clear distinction between what I call "standard" probability as formally defined by Kolomogorov and what is called a probability distribution in neural nets, usually a categorical distributions made out of softmax or equivalent but based on axioms from information theory. My article is clearly about the former, not specifically the later. However, softmax for instance might have the same kind of issues since the set of categories is not supposed to evolve during training either. In my article I try to analyze the situation with an as unbiased perspective as possible and that's why I will not take perceptron as the foundation of my analysis. Not to deny all the great things that NN has brought to the table, it seems to me that this concept was forced by the observations we have made of biological neurons, but it's not a guarantee that it would be the best solution eventually and therefore I rather try to approach the problem from first principles and necessary conditions instead.

Having said that, the important bit that I'm most interested in and that is shared by both frameworks is the concept of amount of information gathered after an observation (also called surprise or entropy). This is what I focus on atm. Also note that there is a third interpretation of probabilities that I also consider very interesting, which is the perspective taken in quantum physics which sees the wave function as representing a probability of observing a given state of the particle in a given configuration.