Has AI Actually Improved Software Quality in Production Systems?

95 views
Skip to first unread message

Yaowen Zhang

unread,
Feb 20, 2026, 9:19:22 AM (5 days ago) Feb 20
to software-d...@googlegroups.com

With the rapid rise of AI coding assistants, there has been a lot of discussion about whether AI will replace software engineers. But I think a more fundamental question is worth asking:


Has AI actually made software better?


Specifically, in real-world production systems, have we observed measurable improvements in:

  • Software quality (fewer bugs, better reliability)?

  • Performance (faster execution, lower latency)?

  • Memory efficiency (lower memory footprint, fewer allocations)?

  • Overall user experience?


AI has clearly made it much faster to generate code. But software engineering has never been primarily constrained by typing speed. The real challenges have always been around system design, managing complexity, defining correct abstractions, and making sound trade-offs.


In your experience, has AI led to objectively better systems along these dimensions? Or has it mainly accelerated code production without fundamentally improving the underlying quality characteristics?


I’m particularly interested in observations from production environments rather than small demos or prototypes.

J.R. Hill

unread,
Feb 20, 2026, 12:45:09 PM (5 days ago) Feb 20
to software-design-book
Absolutely not, and I don't think the technology is capable. (I'm not anti AI, I do use it professionally)

High quality code is not statistically probable, and LLMs are statistical machines that produce statistically probable code. The occasions that they produce stellar code is about the same probability as a hallucination. (Although as Baldur Bjarnasson has noted, to an LLM everything is a hallucination.)

I have seen it capable of working at a junior level at best, and just like junior engineers, requires a lot of guidance and correction. The benefit is that it's cheap, but unless you're a beginner, you won't be able to get results that surpass your own ability to specify. As an engineer commanding LLMs or swarms or Ralph loops or whatever... You end up with a similar experience to a software manager, ability to direct production and get results, but less of an intuitive feel for the software itself and the value of its abstractions. 

LLM based code reviews are worse. They're not without value, but I think the best products are advertising 65% success rates. I'm not a math genius, but a 35% failure rate would be worth firing if it cost a salary to keep around. It doesn't though, and that's the rub.

High profile examples of AI produced code being lower quality keep popping up, Windows 11 deterioration, AWS outages, a number of CVEs. I don't know this is the right audience for getting data points on AI. Maybe check out pivot-to-ai.com

LLMs are about quantity constrained by specification, and even then it's a messy process. It can still be valuable depending on the goals. That's just software engineering. Not much has changed about the fundamentals of software since Brooks' "No Silver Bullet," the bottleneck to quality is not iteration speed.

-Justin

Yaowen Zhang

unread,
Feb 21, 2026, 8:44:08 AM (4 days ago) Feb 21
to J.R. Hill, software-design-book

I fully agree. I recently revisited No Silver Bullet and found it still highly applicable in the AI era. Fred Brooks observed that “no single development, in either technology or management technique, … promises even one order-of-magnitude improvement within a decade in productivity, in reliability, in simplicity.”

In my view, AI does not change the core point: meaningful gains will come from a portfolio of improvements rather than a single silver bullet.




J.R. Hill <jrhi...@gmail.com>于2026年2月21日 周六01:45写道:
--
You received this message because you are subscribed to the Google Groups "software-design-book" group.
To unsubscribe from this group and stop receiving emails from it, send an email to software-design-...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/software-design-book/f1f585f9-89f1-4eb2-ab77-dccc48ea7eaan%40googlegroups.com.

Joshua Miller

unread,
Feb 22, 2026, 9:26:02 PM (3 days ago) Feb 22
to software-design-book
Ken

The DORA research program tries to study the capabilities that software teams must have in order to drive software delivery, and recently they have been looking into the effects AI adoption in the workplace. It's based on a survey of five-thousand professionals. You can view the most recent report here:


I think it gives a good answer on your question about software quality. The report states:

"[...] a majority (59%) of survey respondents also observe that AI has positively impacted their code quality. 31% perceive this increase to be only “slight” and another 30% observe neither positive nor negative impacts. However, just 10% of respondents perceive any negative impacts on their code quality as a result of AI use."

They also find that the effect on code quality is amplified when combined with AI access to internal data.

Josh

Peter Ludemann

unread,
Feb 23, 2026, 2:47:25 AM (3 days ago) Feb 23
to software-design-book
On Sunday, February 22, 2026 at 6:26:02 PM UTC-8 Joshua Miller wrote:
Ken

The DORA research program tries to study the capabilities that software teams must have in order to drive software delivery, and recently they have been looking into the effects AI adoption in the workplace. It's based on a survey of five-thousand professionals. You can view the most recent report here:


I think it gives a good answer on your question about software quality. The report states:

"[...] a majority (59%) of survey respondents also observe that AI has positively impacted their code quality. 31% perceive this increase to be only “slight” and another 30% observe neither positive nor negative impacts. However, just 10% of respondents perceive any negative impacts on their code quality as a result of AI use."

They also find that the effect on code quality is amplified when combined with AI access to internal data.

That study also says that ~60% of respondents are in the cluster that delivers "high quality" software. I suppose this is self-reported, because I very much doubt that even 1/4 of software I've seen over the years is "high quality" or even "medium quality". (Are my standards too high? I don't think so; but I have worked on high reliability systems and on core systems that a lot of other software depended on)

One thing that I've found that "AI" is good for is summarizing discussions about products, which is very helpful given the dreadful state of much software documentation, especially in how it behaves on edge cases. But for more specific questions, the "3rd answer on StackOverflow" still seems to be better.

Jonathan Camenisch

unread,
Feb 23, 2026, 8:50:27 AM (2 days ago) Feb 23
to Yaowen Zhang, software-design-book
I have been actively looking for ways to use AI to reduce entropy/improve quality, and I think it can be used for that end even today in some particular ways:

1. It can find flaws in existing code. With all the caveats about hallucinations, etc., it can help a skilled engineer find flaws of different kinds. One source of data on how effective this is: the massive increase in vulnerabilities being found in open source software libraries in the last few months.

2. It can enable tool building that would not have been feasible before. Sometimes a quickly-conjured test harness or troubleshooting tool--regardless of it's internal code quality--can improve the quality of an engineer's insight, and ultimately the production code they ship.

2b. It can enable an engineer to implement the "right" architectural approach in certain situations where the up-front cost might have been prohibitive before.

2c. It can enable exploration of multiple approaches in situations where time would not have allowed this before.

There are probably other ways too, but these are ones I've found valuable so far.

--
You received this message because you are subscribed to the Google Groups "software-design-book" group.
To unsubscribe from this group and stop receiving emails from it, send an email to software-design-...@googlegroups.com.

Joshua Miller

unread,
Feb 23, 2026, 9:11:31 PM (2 days ago) Feb 23
to software-design-book
Peter,

The clusters near the beginning of the report aren't based off of an assessment of code quality. Software delivery throughput and software delivery instability are based off of calculations, product performance is based off of end-user experience, and team performance is perceived team effectiveness. DORA doesn't make an assessment of what is high quality code, at least from what I can find.

Also, it brings up a good point about who was included in the study. it's important to understand who the respondents are and what they're working on: the median age of the service they work on is 8 years, the majority work on high-reliability-necessary/critical software, the majority worked at a company with >1,000 employees (22% of which with >10,000 employees), the median experience in the role is 6 years, and the median age is 41 years old. 

On page 38, DORA creates a model to find what effects change when AI usage is increased, and found that there is a 89% probability that a positive effect on code quality exists. That might be important context to the quote that I mentioned earlier, which I should've included, because it shows correlation rather than raw numbers from the survey. Also, interestingly, the model shows there exists a heavy negative effect on software delivery stability.

Thanks,
Josh

Paul Becker

unread,
Feb 24, 2026, 9:12:44 AM (yesterday) Feb 24
to software-d...@googlegroups.com

And then there's the ethical question of, okay if this thing is starting to actually become intelligent --- and it's solving intellectual problems that used to require humans to solve --- at what point does controlled use of AI constitute slavery? What is our obligation in providing training data that exposes AIs to the fullness of reality? In providing embodiment? How will future machine intelligences look back on our treatment of their ancestors?

The old joke about computers being rocks that we tricked into thinking seems a lot more prophetic and a lot less funny, these days.

paul

Yaowen Zhang

unread,
9:44 AM (9 hours ago) 9:44 AM
to Paul Becker, software-d...@googlegroups.com
I just ask Gemini help me do a deep research and below is the summary of the research:

1. The Productivity Paradox: Perception vs. Reality

While nearly 91% of developers use AI assistants, the promised efficiency gains have largely plateaued around 10%.

  • The Chaperoning Effect: Randomized controlled trials show that while developers believe AI makes them 20% faster, they are actually 19% slower on complex tasks. This is due to the time required to review and debug "almost-right" AI code.

  • Onboarding Gains: One notable success is in onboarding, where the time for new hires to reach their 10th pull request has been cut in half.

2. Reliability and Security Concerns

The volume of code has increased, but its integrity has declined.

  • Issue Multiplier: AI-assisted pull requests contain 1.7 times more issues than human-authored ones, particularly regarding logic and correctness.

  • Security Vulnerabilities: Over 51% of AI-generated programs contain at least one security vulnerability, and credential exposure occurs nearly twice as often as in manual coding.

3. Architectural Erosion and Technical Debt

The ease of generating code has led to "vibe coding," which prioritizes local fixes over global system health.

  • Collapsing Maintenance: Refactoring activity has collapsed by 60%, while code duplication has increased by 48% as AI tools generate similar solutions without recognizing opportunities for abstraction.

  • The "AI Slop" Crisis: Codebases are becoming "semantically hollow," meaning they function but no longer accurately reflect complex business logic, making them harder for humans to maintain.

4. System Performance and User Experience

The impact on system performance is split between low-level gains and application-level bloat.

  • Compiler Optimization Success: Frameworks like Google’s MLGO have used machine learning to achieve a 3% to 7% reduction in binary size and slight improvements in datacenter queries per second (QPS).

  • Incident Management: AI-powered observability has significantly improved operational resilience, reducing Mean Time to Resolution (MTTR) by 40% to 70% and customer-visible outages by 30% to 50%.

5. The Shifting Role of the Engineer

By 2026, the developer's role is shifting from manual implementer to "architectural orchestrator". Successful teams have moved away from "mega-prompts" toward strategic decomposition, treating AI-generated code as untrusted and funneling it through rigorous automated quality gates. The consensus among engineering leaders is that organizations must "vibe, then verify" to prevent AI-driven velocity from destroying long-term system stability.


Justin Hill

unread,
12:24 PM (6 hours ago) 12:24 PM
to Yaowen Zhang, Paul Becker, software-d...@googlegroups.com
Can you clarify what you mean by research?

Is this a survey of academic studies? If so I'd be interested in seeing the actual studies more than a summary. Without it, it's hard to know what information here is true and what is just a statistically plausible answer that could be true.

David Hess

unread,
12:26 PM (6 hours ago) 12:26 PM
to software-d...@googlegroups.com
This rings true from what I’ve experienced and observed.

This is a useful summary, but since an LLM was involved, could you include non-LLM references to dispel concerns that the details are hallucinated?

Dave

Reply all
Reply to author
Forward
0 new messages