Some remarks on doing AI research withthe help of AI!

dr.mt...@gmail.com

unread,

Jun 23, 2025, 5:37:11 AMJun 23

to Shen

The experiment of working with ChatGPT was immensely stimulating and enjoyable. But there is an important caveat. ChatGPT cannot be trusted. It is simply not reliable enough.

It is good at finding suggestions, unfailingly enthusiastic and supportive. But some of the stuff it offers up is not right. In particular THORN, at one point, was producing malformed proofs, which ChatGPT lauded as right. In fact THORN was solving the problems but not actually rendering the proofs properly. Bits of failed search were interpolated with parts of the genuine proof. I fixed the problem myself because I noticed it myself. ChatGPT thought the faulty proof was fine.

It also offered to given an English rendering of the correct proof which wasn't a rendering of the proof but of another proof it held in memory.

In short, ChatGPT cannot be trusted. You have to, ironically, treat it just like a fallible human being when involved in proof or programming. Don't take its assurances for granted, though it is, undoubtedly, very useful and a triumph of engineering.

Mark

Joel McCracken

unread,

Jun 23, 2025, 12:33:19 PMJun 23

to qil...@googlegroups.com

The quality of AI responses reminds me of stack overflow: its worth looking at, but really not trustworthy. I was just using Cursor yesterday to write some bash, and it made a very basic mistake in all its generated code that meant variable updates were being lost in subshells. I do find it to be useful though, but TBH I've come to appreciate that using it effectively is its own skill that requires development and experimentation.

In my head, I think of it as a kind of codification of intuition; I bet that if we combine these LLMs with plain old AI, we will see some really interesting results. So like A*, but the fitness function for the next option to test would rely on LLM token probabilities given prior inputs.

I've thought about this, but I don't think it will be easy to get there, just because there isn't a straightforward way to combine these technologies; my suspicion is that there will be at least several iterations of retraining before we see something that can "think".

--
You received this message because you are subscribed to the Google Groups "Shen" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qilang+un...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/qilang/950129fb-d405-442e-91f8-f3f77c29adf1n%40googlegroups.com.

nha...@gmail.com

unread,

Jul 2, 2025, 2:57:03 PMJul 2

to Shen

I'm using ChatGPT at the moment to generate disposable ELISP code to automate certain types of tedious code refactoring. I've found it generates stuff that mostly works with a few edge cases that don't. However it seems that trying to coach it into producing something correct seems to degrade after a certain point where it just start producing worse code and getting increasingly confused. Maybe it's a context window limitation.

Bruno Deferrari

unread,

Jul 2, 2025, 3:00:25 PMJul 2

to qil...@googlegroups.com

The back and forth really hurts performance. You can try by editing the original prompt and adding more information there instead.

To view this discussion, visit https://groups.google.com/d/msgid/qilang/a052ffd3-d61a-4b64-a0bb-dce4af835284n%40googlegroups.com.

--

BD

Bruno Deferrari

unread,

Jul 2, 2025, 3:02:48 PMJul 2

to qil...@googlegroups.com

Relevant: https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html

And the referenced paper too https://arxiv.org/pdf/2505.06120

--

BD

nha...@gmail.com

unread,

Jul 3, 2025, 1:51:16 PMJul 3

to Shen

Thank you, those links are very helpful.

jono338

unread,

Jul 5, 2025, 4:42:33 AMJul 5

to Shen

Just to add, if you've been using the free version of chatgpt, I'd highly recommend trying the full (paid) version, at least for 1 month - it's only $20. It is WAY better than the free version. But you're right, it's a skill to use chatgpt effectively in a technical conversation - it makes mistakes, and you have to be able to spot them - I'm starting to think the "knack" comes out of a substantial amount of work in a technical problem solving area - like programming - after a while of working with chatgpt, you develop an sensitivity for the errors that it slips in. Even the paid version can get into a death spiral. The other factor about it's "abilities" is that it seem to be proprotional to the amount of code it has seen, which makes sense. It is very sharp with Haskell code, and sometimes it really *seems* like it understands the code, but it made a whole lot more errors on a more obscure language like Mercury, though it was still amazingly good.

Reply all

Reply to author

Forward