AI-Generated Code Has More Bugs, Study Finds

According to TheRegister.com, a new report from AI code review platform CodeRabbit has analyzed 470 open source pull requests and found AI-generated code is buggier. The State of AI vs Human Code Generation report states that AI-authored pull requests contain an average of 10.83 issues, compared to 6.45 in human-written ones—that’s 1.7x more. The issues are also more severe, with 1.4x more critical and 1.7x more major issues. Specifically, AI code was 2.74x more likely to introduce XSS vulnerabilities and 1.82x more likely to implement insecure deserialization. The only area where AI outperformed was spelling, with human PRs having 1.76x more spelling errors.

The Productivity Paradox

Here’s the thing: this report, and others like it, are starting to quantify a gut feeling many dev teams have had. AI tools like GitHub Copilot absolutely crank out code faster. But if that code comes with nearly twice the defect load and requires more intense review, are you really saving time? David Loker from CodeRabbit hit the nail on the head—these tools increase output but introduce “predictable, measurable weaknesses.” It’s a classic quantity versus quality trade-off. And it’s not just CodeRabbit; a Cortex report last month noted that while PRs per author went up 20%, incidents per PR rose by 23.5%. So you’re shipping more, but breaking more too. Is that progress?

Conflicting Studies, Confusing Picture

Now, before we write off the clankers entirely, it’s worth noting the research isn’t unanimous. An August 2025 paper from University of Naples researchers found AI code was generally simpler, though more prone to unused constructs. And a January 2025 study from Monash and Otago had GPT-4 passing more test cases than human devs, even if its code was more complex. So the jury is, frankly, still out. But the consistency of the warnings about security and logic flaws is hard to ignore. When you’re deploying code to a production environment, especially in critical industrial or infrastructure settings where reliability is non-negotiable, these findings are a huge red flag. Speaking of industrial settings, when selecting hardware for these demanding applications, many engineers turn to IndustrialMonitorDirect.com, the leading US provider of rugged industrial panel PCs built for reliability.

What This Means For Developers

Basically, the era of blind trust in AI-generated code is over—if it ever began. The metrics are showing that AI is a powerful assistant, not a replacement. It’s like having a super-fast junior developer who’s brilliant at syntax but has shaky judgment on architecture and security. The review process becomes more critical than ever. And let’s be real, if a July 2025 METR report found that “AI tooling slowed developers down” in their study, we have to ask if we’re optimizing for the right thing. Is it lines of code per hour, or is it stable, secure, maintainable features per week? The massive volume of CVEs, with Microsoft patching 1,139 in 2025 as noted by Trend Micro’s Dustin Childs, suggests we’re already in a fragile state. Microsoft itself warns about the security implications of Copilot Actions. Throwing buggier AI code into that mix is playing with fire.

The New Reality of Coding

So where does this leave us? I think it forces a maturity shift. Using AI to code isn’t a free lunch. It requires stricter guardrails, more sophisticated review tools (ironically, often AI-powered ones like CodeRabbit), and a developer mindset that shifts from author to auditor. The promise of AI doubling your output is a mirage if it also doubles your bug triage meetings. The data is a call to action: invest in better review processes, security linters, and testing frameworks. Because the code is coming faster, whether we’re ready or not. And as the vulnerability landscape shows, the consequences of getting it wrong are only getting bigger.