Dear Shubhamkar, thank you for your interest in NARS vision.
I was looking into the application of NARS to, say, visual perception,
since perception seems like a very core requirement for an AGI system to
work in the real world.
I agree.
may be something more fundamental corresponding to the early processes
in the human visual system. A related line of work would involve what
kinds of preprocessing should be done before the input is forwarded to
NARS, and how the different kinds of non-NARS preprocessing would limit
the system.
Preprocessing: For vision in NARS, a pre-processing step could use a deep convolutional network to extract the bounding boxes of objects. That way, NARS will receive both the location and class of various objects captured by its vision sensor. The benefit of this approach is that NARS receives high-level information directly (e.g., "there is a car to the left of the image, and a person to the right"), the downside is that NARS misses out on using the low-level sensory information, and the high-level info is static (e.g., NARS cannot recognize a picture of a "car" unless the pre-processor is also designed to recognize a "car" and pass that information to NARS).
Pure NARS: There is also the possibility to input the raw image data directly into NARS, with no pre-processing involved, as shown the MNIST experiment. Using this method, NARS was able to memorize images of digits with up to 100% accuracy, and could classify new images of digits with up to ~43% accuracy. This approach, however, is speculative. It is reminiscent of a neuro-symbolic technique.
1. How would the performance change with increasing number of training images?
Theoretically the accuracy should improve, since NARS would be able to pick out and memorize new visual features it has not seen before, and so could recognize new images. In practice, the performance did not improve further. Perhaps the new features learned by NARS were drowned out by already-learned more-common features.
2. As far as humans are concerned, Pylyshyn, in Is Vision Continuous with Cognition? seems
to present evidence that early vision is cognitively impenetrable,
while the visual perception as a whole can indeed be cognitively
penetrable. He also raises the question about what even is the input to
the system. In light of that, it will be interesting to see how the
performance changes as we change the inputs to the MNIST system - say,
by using edge information instead of pixel-level activation
Thank you for the reference, I will be sure to read it. As for edge detections, that is a keen insight and would be interesting to try in NARS. In the previous MNIST experiment, the images are black-and-white and very low resolution (28x28 pixels), so the pixel values in this case are very similar to edge detections.
I also have two other thoughts related to this work: One is that even if
it is in-theory possible to learn
everything-that-is-necessary-to-perform-tasks-like-a-human, would a
constraint on available computing power actually permit learning that
everything, and would it not be recommendable to actually provide the
right kinds of hard-coding? I think a similar question had also came up
during AGI 2022 NARS workshop - there was a hyperparameter that was
hardcoded for purposes of exploring, but it was also demonstrated that
it is possible to learn the hyperparameter.
Hopefully I understand the question: It is feasible to determine the "correct" hard-coding for a specific task, but it is much harder to determine the correct system for accomplishing various tasks, especially various tasks across different domains. In that case, a more general learning mechanism is useful.
The other is that: If an AGI system is to communicate effectively with
humans, or learn from the books and works written and made for humans,
and that is also how a large part of how humans learn, it feels
reasonable that it should at least have the capability to emulate humans
in certain basic experiences even if its own primary experience is
fairly different from humans.
I agree, if you mean to experience embodiment (physical movement, vision, hunger, etc.). Otherwise, the real-world experiences the AGI system reads about will not be grounded by its own personal experience.
Regards,
Christian Hahm