Sutton's bitter lesson

28 views
Skip to first unread message

Paul McQuesten

unread,
Jul 18, 2020, 6:14:00 PM7/18/20
to link-grammar
Linas,

I think this reinforces your view of learning from data, instead of adding more human-curated rules:

Linas Vepstas

unread,
Jul 18, 2020, 6:53:58 PM7/18/20
to link-grammar, opencog
Well yes. What's truly remarkable is how frequently that lesson has to be re-learned.  There are vast swaths of the AI industry that still have not learned it, and are deluding themselves into thinking that they've made bold progress, when they've gotten nowhere at all, and seem blithely unaware that they are repeating the same mistake... again.

I refer, of course, to the deep-learning true-believers. They have made the fundamental mistake of thinking that their various network designs provide an adequate representation of reality.  How little do they seem to realize that all that code, running hand-tuned on some GPU is just, and I quote Sutton, here: "leveraged human understanding of the special structure of chess". Except, cross out "chess" and replace with "dimensional reduction" or "weight vector" or whatever buzzword-bingo is popular in the deep-learning field these days.

I'm back again to insisting that "patterns matter". If you can't spot the pattern, you've not accomplished anything. Neural nets can't spot patterns. They're certainly interesting for various reasons, but, as an AGI technology, they are every bit a dead-end as the hand-crafted English link-grammar dictionary.

This is one reason I'm sort of plinking away, working on unfashionable things. I'm thinking simply that they are more generic. and more powerful.  But perhaps the problem is recursive: perhaps I'm just "leveraging my human understanding of the special structure of patterns", and will hit a wall someday.  For now, it seems that my wall is more distant.  If only I could convince others ...

--linas


--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/464d1f92-00b7-4780-870a-2156229b4567o%40googlegroups.com.


--
Verbogeny is one of the pleasurettes of a creatific thinkerizer.
        --Peter da Silva

Linas Vepstas

unread,
Jul 18, 2020, 8:26:37 PM7/18/20
to opencog, link-grammar
The word "training" is problematic. If you mean "memorize an association list of pairs" (e.g. faces+text-string) well, technically that is "training" in the AI jargon file,  but it's of little utility for AGI.

The word "pattern" is problematic. Exactly what a "pattern" is, is ... tricky. Much (most? almost all?) of my effort is about trying to define "what  is a pattern, anyway". I'm not sure what you had in mind, when you used that word.  (Its a tricky word. Everyone obviously knows what it means, but how to turn it into an algorithmically graspable "thing"?)

--linas

On Sat, Jul 18, 2020 at 6:44 PM Dave Xanatos <xan...@xanatos.com> wrote:

"If you can't spot the pattern, you've not accomplished anything."

 

Every significant – and truly useful - advance I've made on my own language apprehension code has been based on recognizing a pattern, and coding for it.  I fully agree.

 

Can a neural network be trained on patterns instead of things?

 

Can code designed to recognize – for example, faces (like eigenfaces) – be trained to instead recognize blocks of data that look the same, despite perhaps being in vastly dissimilar fields?

 

Apologies if I'm intruding, or seem to be "out of my lane"… a popular buzzword these days.

 

Dave – LONG time lurker…

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA36x8QBXGUg4f9BMw5StdhRu1WFjFr_9ySo_vZesMeZrTA%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/002701d65d5d%244fdc07d0%24ef941770%24%40xanatos.com.

Dave Xanatos

unread,
Jul 18, 2020, 8:31:16 PM7/18/20
to ope...@googlegroups.com, link-grammar

"If you can't spot the pattern, you've not accomplished anything."

 

Every significant – and truly useful - advance I've made on my own language apprehension code has been based on recognizing a pattern, and coding for it.  I fully agree.

 

Can a neural network be trained on patterns instead of things?

 

Can code designed to recognize – for example, faces (like eigenfaces) – be trained to instead recognize blocks of data that look the same, despite perhaps being in vastly dissimilar fields?

 

Apologies if I'm intruding, or seem to be "out of my lane"… a popular buzzword these days.

 

Dave – LONG time lurker…

 

 

 

From: ope...@googlegroups.com <ope...@googlegroups.com> On Behalf Of Linas Vepstas
Sent: Saturday, July 18, 2020 6:54 PM
To: link-grammar <link-g...@googlegroups.com>; opencog <ope...@googlegroups.com>
Subject: [opencog-dev] Re: [Link Grammar] Sutton's bitter lesson

 

Well yes. What's truly remarkable is how frequently that lesson has to be re-learned.  There are vast swaths of the AI industry that still have not learned it, and are deluding themselves into thinking that they've made bold progress, when they've gotten nowhere at all, and seem blithely unaware that they are repeating the same mistake... again.

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA36x8QBXGUg4f9BMw5StdhRu1WFjFr_9ySo_vZesMeZrTA%40mail.gmail.com.

Kyle Downs

unread,
Jul 18, 2020, 8:31:16 PM7/18/20
to ope...@googlegroups.com, link-grammar
Too much concern for other human behavior

You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA36x8QBXGUg4f9BMw5StdhRu1WFjFr_9ySo_vZesMeZrTA%40mail.gmail.com.

Richard Hudson

unread,
Jul 19, 2020, 8:01:29 AM7/19/20
to link-g...@googlegroups.com, Linas Vepstas, opencog

Another voice from a long-time lurker: patterns are only relevant if they can be recognised by humans. Some can, others can't. E.g. I don't think humans are good at recognising that cba is the reverse of abc, but I imagine a machine might well pick it up. So maybe you need a mixture of hand-crafting for the pattern types and uncontrolled searching for those patterns.

One refinement of this idea is that purely reactive learning could turn into proactive learning as you learn what to expect. In language, you learn that big often stands before book, and that words which in other ways are like big often stand before words that are otherwise like book, so you generalise to adjectives standing before nouns; then whenever you hit a word which you think is an adjective, you actively look for its noun - a very different learning strategy from purely statistical learning. These expectations gradually turn into a grammar, where you can talk about dependencies and dependency types (e.g. subjects versus objects). I know that sounds like hand-crafting creeping back in through the back door, but all that's hand-crafted is your initial set of pattern types.

Best wishes for your thinking. Dick

Mike Innamorato

unread,
Jul 19, 2020, 3:41:36 PM7/19/20
to ope...@googlegroups.com, link-grammar
Hi Linas
My first time entering the room and curious to see how this works.   Replace “if” with “when” 


You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA36x8QBXGUg4f9BMw5StdhRu1WFjFr_9ySo_vZesMeZrTA%40mail.gmail.com.

Dave Xanatos

unread,
Jul 19, 2020, 3:41:36 PM7/19/20
to ope...@googlegroups.com, link-grammar

I appreciate your response.  Honestly, I have been finding that as of late, I have been coming up against these "definition" issues with greater frequency.

 

I don't claim to be an expert.  I'm probably – mostly – an idiot in this field, but I did figure out a way to discern if the utterance of a speaker (human) is a question or a statement, with 98.4% accuracy.  It even can detect an indirect question as well as a direct question.

 

I found … a pattern – in English speech, and I coded it. 

 

Can my robots *answer* the question when it is detected – mostly no.  But they can identify the utterance better than anything I've tried from any other source.

 

Beyond this, my last message to you regards recognizing patterns in data, the way scripts recognize patterns in an image.  Often they are all just three dimensional matrices.

 

For example, would it be possible to "train" (hold on here… lol) a net to recognize a given data pattern, then have it look at different databases/data lakes/wads of random data… and recognize if that particular data pattern existed in those other regions?

 

I apologize if I'm oversimplifying things.  I'm imagining that data structures would share a common architecture across disparate fields, and may be recognized in this manner.

 

Again, I may be overstepping my experience, and I apologize for wasting your time if so.   I have had some very rewarding experiences with language parsing/understanding based on coding for patterns that were simply determined from my own neurology/intuitions as a native speaker of the language I code in (English).  My intuitions tell me that there are possibly ways to view data in the same way as we view images, and to recognize a "data image" in much the same way.

 

You are correct, I believe, that neural nets can't spot patterns.  But humans can.  That seems to be one thig we do really well.  But I believe if we feed neural nets examples of patterns, instead of things – can we come up with something new?

 

I wish I had better words here.  While I can see the structures I am referring to, I can't seem to really articulate them…

 

Feel free to tell me to go back to Comp Sci 101 if I am not offering anything here, I won't be offended 😊

 

I love what you are all doing here.  I spend a lot of time imagining cognitive architectures and methods of creating something akin to genuine "understanding" in a coded base…. 

 

Feel free to tell me to shut up and leave you alone 😊

 

Dave

Linas Vepstas

unread,
Jul 19, 2020, 6:11:01 PM7/19/20
to Richard Hudson, link-grammar, opencog
Hi Dick,

Well, yes, exactly! But let me provide some insight.

On Sun, Jul 19, 2020 at 7:01 AM Richard Hudson <r.hu...@ucl.ac.uk> wrote:

Another voice from a long-time lurker: patterns are only relevant if they can be recognised by humans. Some can, others can't. E.g. I don't think humans are good at recognising that cba is the reverse of abc, but I imagine a machine might well pick it up. So maybe you need a mixture of hand-crafting for the pattern types and uncontrolled searching for those patterns.

Clearly, you were thinking of "garden-path sentences", which many computer language systems can parse just fine, but humans stumble over, if not completely baffled by. "The horse raced past the barn fell".

What is happening here is an interesting interaction between "metric distance" and algorithms.  Roughly speaking, an algorithm that takes N steps to find something means that the "something is distance-N away from the trivial case". There's also an idea of "relative distance": for example, two complex sentences that differ by a single-word substitution can be judged as being close to each other, instead of far apart (they are not 2N-apart).

Different algorithms have different embodiments of "distance". The abc- vs. cba example is famous from computer-science: It is fundamentally impossible (it's a theorem) for a finite automaton (aka regular-expression parser) to recognize reversed strings. Reversal is often used as the textbook example for why push-down automata (aka "context-free languages") are needed, and how/why they differ from finite state machines.

The success of deep-learning and neural nets is *entirely* due to these having a very dramatically different distance metric. Vastly different -- famously, reasoning by analogy: "King is to Queen as man is to XXX" and neural nets can solve this problem instantly (I think this is an example from one of the foundational papers) whereas attacking this problem by building traditional parsing and synonym-learning takes vast amounts of CPU, complexity, is error prone, opaque, and has poor results in general.

This is part of what I think about, what I mean, when I say "pattern". Perhaps the "blind men and the elephant" story: if you touch the elephant, you get a rough surface. If you smell it, it .. smells like (whatever elephants smell like). If you have extremely bad blurry vision, you see a ball of grey that moves on a background.  What does grey have to do with rough texture? These different sense-perceptions are like different algorithms. The "patterns" they are good at are wildly different. There's no one algorithm to rule them all.

I think there is a deep mathematical basis for all this, but I can't sketch it here. I draw insight from operator algebras, invariant measures, spectra of transfer operators, sofic shifts and the PBW theorem for representation theory and tensor algebras. These, in a certain sense, provide tools for thinking about algorithms and the distances between the things that the algorithm can visit/discover/explore.  The simplest examples include geometric state machines over cantor sets, which work with certain very simple fractals (ergodic systems, Bernoulli shifts) as their "natural domain".  So -- fractals look very complicated, until you know what their "secret" is, and then many of them become "simple". Even though the act of "recognizing" them ("which fractal is this?") is hard.

Roughly speaking, the relationship between a "grammar" and the language it generates (the set of all syntactically-valid sentences) is the same relationship as between a fractal-generator, and the fractal it produces. Learning a grammar is like deducing the fractal generator, given samples of fractals. Neural nets just provide one particular viewpoint into solving this problem. 

Some of what I am saying can be made very concrete in a rather very pretty way. Thiryt+ years ago, Przemyslaw Prusinkiewicz started out with very simple grammars generating very simple fractals, and over the decades, he has ended up with this: http://algorithmicbotany.org/papers/  Enjoy the pictures, but ..

But ... if you are at all serious about AGI, or about "solving biology", one of the most important things you can do is to start reading his papers from 1985, onwards, through the late 1980's, early 1990's .. late 1990's ... the 2000's ... and give them deep attention and thought. The deep secrets of logic, reasoning, natural language, common sense and human understanding are buried in there, even though it is superficially labelled "botany".

One refinement of this idea is that purely reactive learning could turn into proactive learning as you learn what to expect. In language, you learn that big often stands before book, and that words which in other ways are like big often stand before words that are otherwise like book, so you generalise to adjectives standing before nouns; then whenever you hit a word which you think is an adjective, you actively look for its noun - a very different learning strategy from purely statistical learning. These expectations gradually turn into a grammar, where you can talk about dependencies and dependency types (e.g. subjects versus objects). I know that sounds like hand-crafting creeping back in through the back door, but all that's hand-crafted is your initial set of pattern types.

Well, I think this is *exactly* what I'm trying to do with  https://github.com/opencog/learn and so perhaps the use of the word "statistical" is misleading. There's a problem with the word "statistics", and that is, that, for most people, it's about those stunningly dull and boring books you may have been forced to read, talking about expectation values and standard deviations and hypothesis testing and p>0.05 confidence levels. The reality of statistics is very very different.

But I can't find a better word .. for example, quantum mechanics is 99.999% "statistics", but if you use the word "quantum", people think of something totally different. If you say "fractals", its 99.999% the statistical nature of the cantor set, but most people don't know that, and think it's something else. If you look at the distribution of chaotic orbits on riemann surfaces and the Artin zeta: well, what the hecke-algebra -- that's 99.9999% statistics again, but when you say statistics, most people don't think "oh, riemann surfaces, modular forms, I see what you mean". So I'm at a loss for the correct word to use for the kind of statistical learning I am talking about. Calling it "quantum fractal modular surface learning" is only something a brazen marketing agency could come up with, and so I'm stuck with "statistical learning", for now. It really really sucks, since it is a HUGE impediment to getting the ideas across.

Oh ye minters of catchy phrases, I appeal to you now!

-- Linas

Linas Vepstas

unread,
Jul 19, 2020, 6:14:16 PM7/19/20
to opencog, link-grammar
Hi Jose,

On Sun, Jul 19, 2020 at 1:21 PM Jose Ignacio Rodriguez-Labra <nachos...@gmail.com> wrote:
This is a very interesting conversation. Thank you all for sharing your insights. Perhaps I could add that another thing to consider is that maybe humans don't have what I would call general pattern recognizers. Perhaps our ability to recognize patterns come from pattern recognizing modules that are specialized to work in various domains (ie. visual, time-frequency, audio, conceptual, etc) created through evolution. This would be similar to a pattern recognizer 'trained' by a neural network, lacking generality. Perhaps there are patterns in the world we don't have the mental faculty to detect. In that case, the pattern recognition would be developed through the evolutionary domain, rather than the learning domain.

Yes. Here is a paper that talks about this in a very illuminating kind of way:

"Forced moves or good tricks in design space?  Landmarks in the
    evolution of neural mechanisms for action selection", Tony J. Prescott
    (2007) https://www.academia.edu/30717257/Forced_Moves_or_Good_Tricks_in_Design_Space_Landmarks_in_the_Evolution_of_Neural_Mechanisms_for_Action_Selection


--linas


Now, if this is the case for biological brains, it doesn't mean it is impossible to develop a general pattern recognizer, able to be fed multi-dimensional data (images, audio, internal neural processes, etc) and have it be able to recognize similarities/patterns within the data, all developed during learning. Generally, when we recognize a pattern, we've built a model of the pattern and we're able to make predictions on it. The thing I can't wrap my head around is what kind of structure/architecture or mathematical model would be able to do the recognition?

Regards,
Jose Ignacio Rodriguez-Labra

To unsubscribe from this group and stop receiving emails from it, send an email to link-g...@googlegroups.com.



--

Verbogeny is one of the pleasurettes of a creatific thinkerizer.
        --Peter da Silva

--
You received this message because you are subscribed to the Google Groups "opencog" group.

To unsubscribe from this group and stop receiving emails from it, send an email to ope...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ope...@googlegroups.com.


--
Verbogeny is one of the pleasurettes of a creatific thinkerizer.
        --Peter da Silva

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.

Linas Vepstas

unread,
Jul 19, 2020, 7:26:10 PM7/19/20
to opencog, link-grammar
Hi Dave,

On Sat, Jul 18, 2020 at 10:57 PM Dave Xanatos <xan...@xanatos.com> wrote:

I appreciate your response.  Honestly, I have been finding that as of late, I have been coming up against these "definition" issues with greater frequency.


:-)

 

I don't claim to be an expert.  I'm probably – mostly – an idiot in this field, but I did figure out a way to discern if the utterance of a speaker (human) is a question or a statement, with 98.4% accuracy.  It even can detect an indirect question as well as a direct question.


I'm not sure how to respond to that.  It sounds like you are plugging a product of some sort. These days, I am more interested in abstractions, and so whenever I see something like "98.4% accurate" I quickly scurry in the opposite direction.  I'm pretty sure that no poet, and no mathematician ever claimed that their work was 98,4% accurate. (So, math is just poetry, it's free-verse that has to rhyme. 😃😃 See Lockhart's Lament for details. https://www.maa.org/external_archive/devlin/LockhartsLament.pdf )

 

I found … a pattern – in English speech, and I coded it. 

 

Can my robots *answer* the question when it is detected – mostly no.  But they can identify the utterance better than anything I've tried from any other source.

 

Beyond this, my last message to you regards recognizing patterns in data, the way scripts recognize patterns in an image.  Often they are all just three dimensional matrices.

 

For example, would it be possible to "train" (hold on here… lol) a net to recognize a given data pattern, then have it look at different databases/data lakes/wads of random data… and recognize if that particular data pattern existed in those other regions?


Yeah, I think that was the claim that the neural nets folks started to make in the 1980's (or earlier),  and you can safely say that they have fully made good this claim, and if not, you haven't spent enough time with tensorflow.

 

I apologize if I'm oversimplifying things.  I'm imagining that data structures would share a common architecture across disparate fields, and may be recognized in this manner.

 

Again, I may be overstepping my experience, and I apologize for wasting your time if so.   I have had some very rewarding experiences with language parsing/understanding based on coding for patterns that were simply determined from my own neurology/intuitions as a native speaker of the language I code in (English).  My intuitions tell me that there are possibly ways to view data in the same way as we view images, and to recognize a "data image" in much the same way.

 

You are correct, I believe, that neural nets can't spot patterns.  But humans can.  That seems to be one thig we do really well.  But I believe if we feed neural nets examples of patterns, instead of things – can we come up with something new?

 

I wish I had better words here.  While I can see the structures I am referring to, I can't seem to really articulate them…


I think Richard Feynmann had a quotable quote about exactly that.

 

Feel free to tell me to go back to Comp Sci 101 if I am not offering anything here, I won't be offended 😊


One of the easiest ways of being offensive is to imply, even in a very round-about fashion, that someone-else's life-work is unimportant, useless, or worse - .. as Hillary so finely put it: "deplorable".  Good thing we didn't elect her, eh?

In science, athletics and hollywood stardom, it is very easy for one scientist/athlete/star to imply that the work of another is wrong, trite, substandard, obvious, or "even a child could do that". And there lies the fountainhead of rivalry.

 

I love what you are all doing here.  I spend a lot of time imagining cognitive architectures and methods of creating something akin to genuine "understanding" in a coded base…. 


Well, when you say "cognitive architecture", I guess you mean this:  https://xanatos.com/airobotics.asp  and in that sense, I think that this is also where Ben started out, long ago (and I think he is still there, to some degree) -- that, if you just wire together some pieces-parts in some clever way, the way you might build an automobile or an airplane, it will just locomote or fly (or, in this case, "think").

I've forged some kind of working relationship with Ben, and others, even though I whole-heartedly reject the fundamental cornerstone, the idea behind "cognitive architecture".  It's a little bit like trying to build an atom bomb by trying to figure out, to "reverse engineer" how to pack 100 tonnes of TNT into a box no bigger than a basketball.  To me, it's a mis-perception of the task at hand.  It's Sutton's bitter lesson, in a somewhat more disguised form.

--linas

Reply all
Reply to author
Forward
0 new messages