Some important things are getting lost with the rush to AI coding. LLMs don't have any coherent and consistent view of large-scale structure. That's essential for large systems especially. They don't have good reasoning ability. I've had them flail away like any intern. And most of all, correctness and freedom from bugs and errors cannot be achieved by testing alone.In addition, good specifications are more important than ever for AI-written software but that step seems to be mostly ignored.
Pydantic and Hypothesis are aids to finding corner cases. Data validation as offered by Pydantic makes sense to me. The automation of the creation of test cases and simplification of reports of failures offered by Hypothesis do make sense to me.
Thanks for posting about Eurisko. It seems to have passed me by at the time. Fascinating!
Just my two cents on my position about all this "agentic programming fashionware", I don't want to delegate big parts of my understanding of complex systems, including software, to non-deterministic systems that "hallucinate" in non detectable ways. I try to confine the stuff I don't understand about the software artifacts I build to small parts, where I ask specific questions to the AI (or Apparent Intelligence, as I like to call it) and do small commits.
I keep my tokens usage small, use anonymous AI systems like duck.ai or Lumo, that don't use my data for training and I think that in all that rush of AI, seems a minimalist approach with little compromise.
I doubtful and worried of a grandiloquent visions with a unique convergent future for diverse people and worldviews, particularly when it comes from tech bros. Unfortunately "agentic programming" seems one of such visions.
Cheers,
Offray
--
You received this message because you are subscribed to the Google Groups "leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to leo-editor+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/leo-editor/61ec3877-0a0a-4960-a0e1-0303c747635en%40googlegroups.com.
So far I have had a 100% failure rate in trying to get Claude or ChatGPT help solve small but tricky problems. They have burned up a large number of free tokens and made many suggestions but haven't solved the problems.
They also don't seem to have a large enough context window to be able to maintain the overall picture of large threads. So I don't see how they could be able to create a decent large system design. Just complete enough specifications for one component would probably overflow the context window.
Given that we know that good quality and reliable operation cannot be achieved by testing alone, I don't see how widespread adoption is going to be anything but a long term software nightmare. All these systems that may *seem* to work but watch out when they hit edge cases...