AI-Generated Code Has A Staggeringly Stupid Flaw

144 views
Skip to first unread message

John F Sowa

unread,
Jul 27, 2024, 7:41:24 PM7/27/24
to ontolo...@googlegroups.com, ontolog...@googlegroups.com, CG
Another of the many reasons why Generative AI requires other methods -- such as the 70 yeas of AI and computer science -- to test, evaluate, and correct anything and everything that it "generates",

As the explanation below says, it does not "UNDERSTAND" what it is doing  It just finds and reproduces patterns that it finds in its huge volume of data.  Giving it more data gives it more patterns to choose from.  But it does nothing to help it understand any of them.

This method enables it to surpass human abilities on IQ tests, law exams, medical exams, etc. -- for the simple reason that the answers to those exams can be found somewhere on the WWW.  In other words, Generative AI does a superb job of CHEATING on exams.  But it is hopelessly clueless in solving problems whose solution depends on understanding the structure and the goal of the problem.

For similar reasons, the article mentions that self-driving cars fail in complex environments, such as busy streets in city traffic.  The number and kinds of situations are far more varied and complex than anything they have been trained on. Carnegie Mellon University is involved in more testing of self-diving cars because Pittsburgh has the most complex and varied patterns.  It has more bridges than any other city in the world.  It also has three major rivers, many hills and valleys, steep winding roads, complex intersections, tunnels, foot traffic, and combinations of any or all of the above.

Drivers who test self-driving cars in Pittsburgh say that they can't go for twenty minutes without having to grab the steering wheel to prevent an accident.   (By rhe way, I learned to drive in P:irravurgh.  Then I went to MIT and Harvard,, where the Boston patterns are based on 300-year-old cow paths.)

John

________________________________________________

AI-Generated Code Has A Staggeringly Stupid Flaw
It simply doesn’t work.
https://medium.com/predict/ai-generated-code-has-a-staggeringly-stupid-flaw-42b2d79f3443
. . .
So, what is the problem with AI-generated code?

Well, one of the internet’s favourite developers, Jason Thor Hall of Pirates Software fame, described it best in a recent short. He said, “We have talked to people who’re using AI-generated code, and they are like, hey, it would take me about an hour to produce this code and like 15 minutes to debug. And then they are like, oh, the AI could produce it in like 1 minute, and then it would take me like 3 hours to debug it. And they are like, yeah, but it produced it really fast.”

In other words, even though AI can write code way faster than a human programmer, it does such a poor job that making the code useful actually makes it far less efficient than getting a qualified human to just do the job in the first place.
. . .
Well, AI doesn’t actually understand what it is doing. These generative AI models are basically overly developed predictive text programs. They use statistics based on a stupidly large pool of data to figure out what the next character or word is. As such, No AI actually ‘knows’ how to code. It isn’t cognitively trying to solve the problem, but instead finds an output that matches the statistics of the data it has been trained on. As such, it gets it massively wrong constantly, as the AI isn’t actually trying to solve the problem you think it is. As such, even when the coding problem you are asking the AI to solve is well-represented in its training data, it can still fail to generate a usable solution simply because it doesn’t actually understand the laws and rules of the coding language. This issue gets even worse when you ask it to solve an AI problem it has never seen before, as the statistical models it uses simply can’t be extrapolated out, causing the AI to produce absolute nonsense.

This isn’t just a problem with AI-generated code but every AI product, such as self-driving cars. Moreover, this isn’t a problem that can be easily solved. You can’t just shove more training data into these AIs, and we are starting to hit a point of diminishing returns when it comes to AI training (read more here). So, what is the solution?

Well, when we treat AI as it actually is, a statistical model, we can have tremendous success. For example, AI structural designs, such as those in the Czinger hypercar, are incredibly efficient and effective. But it falls apart when we treat AI as a replacement for human workers. Despite its name, AI isn’t intelligent, and we shouldn’t treat it as such. [End]

Michael DeBellis

unread,
Aug 6, 2024, 11:16:06 PM8/6/24
to ontolog-forum
I agree. I've asked ChatGPT and Copilot for SPARQL queries, not extremely complicated, either things I thought I would attempt rather than going back to the documentation or in some cases to get DBpedia or Wikidata info because I find the way they structure data to be not very intuitive and it takes me forever to figure out how to find things like all the major cities in India (if anyone knows some good documentation on the DBpedia or Wikidata models please drop a note). I think part of the problem is that people see what looks like well formatted code and assume it actually works. None of the SPARQL queries I've ever gotten worked.  

We did an experiment in February this year with dental clinicians in India where we gave them a bunch of questions and had them use ChatGPT to get answers and they rated the answers very highly even though almost all of them were incomplete, out of date, had minor or major errors.  On the other hand, when I ran the same questions through ChatGPT (and in both cases I used 3.5) in May it was radically different. Almost all the answers were spot on. 

 And for coding, I have to say I find the AI support in PyCharm (my Python IDE) to be a great time saver. Most of the time now I never finish typing. The AI can figure out what I'm doing by figuring out patterns and puts the suggested completion in grey and all I do is hit tab. It's also interesting how it learned. My code is fairly atypical Python, because it involves manipulating knowledge graphs and at first I was getting mostly worthless suggestions. But after a few days it figured out the patterns to read and write to the graph and it has been an incredible benefit. I like it for the same reason I always copy and paste names whenever I can rather than typing, it drastically cuts down on typing errors. 

All this reminds me of the debates people had about Wikipedia. Some people thought it was worthless because you can always find some example of vandalism where there is garbage in an article. And other people think it is the greatest thing on the Internet. The answer is somewhere in the middle. Wikipedia is incredibly useful and also an amazing example of how people can collaborate just to contribute their knowledge, the way people collaborate on that site is so different than most of the Internet, but you should never use it as a primary source. Always check the references. That's the way I feel about Generative AI. Like Wikipedia I think it is a great resource in spite of the fact that some people claim it can do much more than it really can and that it can still be wrong. It's just another tool and if used properly, an incredibly useful one.

Michael

John F Sowa

unread,
Aug 7, 2024, 4:19:29 PM8/7/24
to ontolo...@googlegroups.com, CG
Michael,

The examples you cite illustrate the strengths and weaknesses of LLMs.  They show why multiple methods of evaluation are necessary. 

1. The failures mentioned in paragraph 1 show that writing a program requires somebody or something that can understand a problem statement and generate a sequence of commands (in some detailed notation) to specify a method for solving that problem.  LLMs can't do that

2. The second paragraph shows that ChatGPT had a better selection of answers available in May or perhaps an improvement in its ability to find answers.  It's possible that very few dental clinicians had ever used ChatGPT for that purpose.  You experiment and the work by the dental clinicians in India may have added enough new patterns that dental clinicians worldwide would have benefitted.

3. The third paragraph shows how ChatGPT learns how to do what it does best:  translate from one notation to another.  Since you did all the problem analysis to generate Python with miscellaneous errors, it learned how to translate your personal dialect of Python to the official Python syntax.  That is an excellent example of LLMs at their best.  It was learning how to translate, not learning how to understand.

4.  I would say that there is a major difference.  Wikipedia is not improved by any method of learning (by humans or machines).  Instead, some articles are excellent products of collaboration by experts on the subject matter.  But other articles were written hastily by people who don't have the expertise or the patience to do a thorough research of the topic.  The Wikipedia editors usually mark those articles that require further attention.  But there are many articles that fall between the cracks -- nobody knows whether they are accurate or not.

John
 


From: "Michael DeBellis" <mdebe...@gmail.com>

[Paragraph 1]  I agree. I've asked ChatGPT and Copilot for SPARQL queries, not extremely complicated, either things I thought I would attempt rather than going back to the documentation or in some cases to get DBpedia or Wikidata info because I find the way they structure data to be not very intuitive and it takes me forever to figure out how to find things like all the major cities in India (if anyone knows some good documentation on the DBpedia or Wikidata models please drop a note). I think part of the problem is that people see what looks like well formatted code and assume it actually works. None of the SPARQL queries I've ever gotten worked.  

[2]  We did an experiment in February this year with dental clinicians in India where we gave them a bunch of questions and had them use ChatGPT to get answers and they rated the answers very highly even though almost all of them were incomplete, out of date, had minor or major errors.  On the other hand, when I ran the same questions through ChatGPT (and in both cases I used 3.5) in May it was radically different. Almost all the answers were spot on. 

[3]  And for coding, I have to say I find the AI support in PyCharm (my Python IDE) to be a great time saver. Most of the time now I never finish typing. The AI can figure out what I'm doing by figuring out patterns and puts the suggested completion in grey and all I do is hit tab. It's also interesting how it learned. My code is fairly atypical Python, because it involves manipulating knowledge graphs and at first I was getting mostly worthless suggestions. But after a few days it figured out the patterns to read and write to the graph and it has been an incredible benefit. I like it for the same reason I always copy and paste names whenever I can rather than typing, it drastically cuts down on typing errors. 

[4] All this reminds me of the debates people had about Wikipedia. Some people thought it was worthless because you can always find some example of vandalism where there is garbage in an article. And other people think it is the greatest thing on the Internet. The answer is somewhere in the middle. Wikipedia is incredibly useful and also an amazing example of how people can collaborate just to contribute their knowledge, the way people collaborate on that site is so different than most of the Internet, but you should never use it as a primary source. Always check the references. That's the way I feel about Generative AI. Like Wikipedia I think it is a great resource in spite of the fact that some people claim it can do much more than it really can and that it can still be wrong. It's just another tool and if used properly, an incredibly useful one.

Michael

Michael DeBellis

unread,
Aug 8, 2024, 9:49:10 AM8/8/24
to ontolo...@googlegroups.com
John, I agree with everything you said. That's an interesting point about how our experiment may have influenced ChatGPT itself. I just assumed that it got better over time but it's an intriguing idea that by having many clinicians ask the same question that over time it learned and got better in that domain. 

Cheers,
Michael

--
All contributions to this forum are covered by an open-source license.
For information about the wiki, the license, and how to subscribe or
unsubscribe to the forum, see http://ontologforum.org/info
---
You received this message because you are subscribed to a topic in the Google Groups "ontolog-forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ontolog-forum/aN5IpcHXZAI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ontolog-foru...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ontolog-forum/e8fdb3c6ed1248bdaab885f542f8703d%40bestweb.net.

Ravi Sharma

unread,
Aug 13, 2024, 2:46:43 AM8/13/24
to ontolo...@googlegroups.com
John
Are we at a point where
1. We can turn AI off say on phone apps and desktops
2. We can limit the content access to AI for inputs to hopefully focus the results better
Regards
Thanks.
Ravi
(Dr. Ravi Sharma, Ph.D. USA)
NASA Apollo Achievement Award
​Former Scientific Secretary iSRO HQ
Ontolog Board of Trustees
Particle and Space Physics
Senior Enterprise Architect
SAE Fuel Cell Standards Member



You received this message because you are subscribed to the Google Groups "ontolog-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ontolog-foru...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ontolog-forum/CALGFikc4rqY3c3xaT48CA7_4aGOf9YUUbrzZUg5kU_QdZcd%3DWA%40mail.gmail.com.

John F Sowa

unread,
Aug 13, 2024, 3:16:31 PM8/13/24
to ontolo...@googlegroups.com, CG, Arun Majumdar
Ravi,

There is a huge difference between the theoretical issues about what LLMs (or the smaller SLMs) can do as the underlying technology and what any particular software system can do.

The limitations of LLM technology (summarized in the note that started this thread) cannot be extended by systems that just add various interfaces to the LLMs.  But applications that combine LLMs with other technology (from AI, computer science, or many kinds of applications) may support a wide variety of technology.

Examples that we have discussed before include Wolfram's use of LLMs to support an English-like front end to their powerful Mathematica system.  That does everything that anyone has done with Mathematica and adds an English-like front end to give users a simpler and more friendly interface.  Many other companies are supporting such technology with varying degrees of success.  They implement a friendly interface to their previous systems.

Our VivoMind system in 2010 included technology that was very powerful and ACCURATE for applications that cannot be done with LLMs even today.  See https://jfsowa.com/talks/cogmem.pdf 

Our new Permion.ai system combines a newer version of what VivoMind could do with LLMs to support the interface.  Arun Majumdar and I have discussed these issues in talks that we gave in the past year or so.

I believe that is the wave of the future:  use LLMs as one component of an Ai system that uses other kinds of technology to implement functionality that LLMs, by themselves, cannot support.

Answer to your question:   The features you're asking for in the note below would be very easy to implement -- just add an on/off button for features you don't want.  That does not require new technology.  It just requires somebody to add that button.

John
 
-------------------------------------------------------------------
From: "Ravi Sharma" <drravi...@gmail.com>
Reply all
Reply to author
Forward
0 new messages