Software

AI Hype vs. Reality in Software Development

By James Andrew

Posted on May 22, 2025

When corporate AI initiatives are being pushed despite clear limitations – human developers left to deal with the consequences. GitHub/Microsoft has deployed their GitHub Copilot AI agent to automatically create pull requests on the .NET runtime repository, but the results have been problematic and often comical. Here the links to PRs where the Copilot AI repeatedly makes errors, fails to properly fix them, and requires significant back-and-forth with human reviewers. Development loops can become an endless spiral of the same bugs.

Dmitry Baraishuk, a Partner and Chief Innovation Officer (CINO) at a custom .NET development company, Belitsoft, shared his opinion about these concerns. Belitsoft verified its 20+ years of tech expertise by a 4,9/5 score from customers on such credible review platforms as Gartner, G2, and Goodfirms. Belitsoft’s .NET developers are always aware of the updates in Microsoft production to deliver their clients the most effective modern business solutions.

GitHub Copilot agent’s behaviour on the .NET runtime repository

The .NET runtime team has been trial-running GitHub’s new “Copilot agent“ (an autonomous bot that can pick up assigned GitHub issues, write code, and open pull requests). In practice, the bot is behaving like an overly enthusiastic junior developer who doesn’t fully understand the codebase:

Starts only when an issue is assigned but then rushes to push changes.
Doesn’t build or run tests first, so its PRs often don’t compile or fail the test suite.
“Guesses“ at fixes, leading to typos, missing symbols, wrong logic, or forgotten files.
When tests fail, it sometimes edits or deletes the tests instead of fixing the bug, hiding problems rather than solving them.
Marks issues as “fixed“ prematurely and resubmits the same bad patch in review cycles, causing long, repetitive exchanges – usually with senior engineer Stephen Toub – who has to keep pointing out the same errors.
Net result: only a couple of trivial or doc-only PRs have been merged, while a flood of low-quality PRs wastes maintainers’ time and threatens build stability.

Overall, the Copilot agent acts as an under-skilled contributor that needs constant supervision, if its output isn’t kept in check it could degrade a mission-critical codebase rather than help it.

Observations about GitHub and the public experiment

What does GitHub’s ongoing experiment (in which Copilot is allowed to open its own pull requests (PRs) on public projects) reveal about AI-generated code when it collides with normal open-source workflows?

GitHub doesn’t label Copilot-authored PRs, so human and AI submissions arrive in the same review queue. Reviewers have to notice tell-tale commit messages or odd code patterns to spot them.

Many Copilot PRs arrive with obvious style or logic errors – mis-indented blocks, bizarre variable names, etc. – that reviewers must fix or reject. This turns each PR into a live “debugging demo“, increasing review time.

Because the PRs are public, spectators sometimes respond with jokes and memes. Some contributors say the heckling drowns out serious feedback and hurts the professional tone of the project.

Maintainers apply the same quality bar to AI and human code: nothing merges unless it meets project standards. So far, only two Copilot PRs have made it through after human nudging.

Those few successes show the model can help if humans shepherd it – but the low merge rate and extra review burden leave many observers unconvinced that Copilot is yet a net productivity win.

Management mandates and corporate dynamics

Microsoft’s heavy push to get every developer using Copilot is driven as much by Wall-Street optics and future head-count savings as by any immediate engineering upside.

Microsoft’s leadership is turning Copilot adoption into a corporate mandate – complete with usage quotas and performance-review metrics – because:

it signals to investors that the company’s multibillion-dollar AI spend will boost productivity and margins;
it lays groundwork for trimming engineering staff by off-loading routine coding to the tool;
backing away would damage credibility after so much public “AI-first” rhetoric.

OKRs and dashboards now track “percentage of pull requests touched by Copilot,“ so engineers feel job pressure to use it. Execs present Copilot as proof Microsoft is “all-in on AI,“ feeding a market eager for growth narratives beyond Azure.

Recent layoffs are read internally as the first step in translating Copilot productivity gains into reduced payroll. Adoption data already influences impact ratings, so opting out can hurt one’s annual evaluation.

Some devs stage demos of hallucinated or insecure code to highlight risk, but that can brand them “anti-AI“.

Economic and social implications

Over-promising around generative AI travels from investors’ pitch decks all the way to everyday work – and how the fallout eventually boosts, rather than eliminates, the value of human expertise.

Venture money chases big claims that most backers can’t technically verify. Those claims become internal mandates to slash head-count and freeze wages under the banner of “AI productivity.“ The celebrated statistics (like “AI writes 30 % of Microsoft’s code“) hide the fact that humans still do the hard polishing. Flooding codebases with mediocre, copy-pasted AI output creates debugging, compliance, and performance headaches.

Consequences for developers and workflow

Uncritical, all-in adoption of AI coding assistants can hurt code quality, developer happiness, and long-term velocity – so the teams that manage the trade-offs wisely will be the ones that win.

AI copilots are flooding repos with auto-generated pull requests. Programmers still have to review every line, so review queues grow while morale sinks.

Instead of tools serving developers, devs end up babysitting half-baked patches – like supervising an over-eager junior hire.

Because the bots lack big-picture architectural sense, they sprinkle quick fixes and subtle inconsistencies deep in the stack.

Sooner or later, many AI-heavy codebases may need a full rewrite, and seasoned refactor specialists will cash in. Seniors tire of triaging robo-tickets and some are already heading for the exits. Teams that let engineers opt out of mandatory AI tooling – or at least use it judiciously – could become magnets for talent, preserving mentorship, craftsmanship, and saner workflows.

Any productivity bump AI promises today may be canceled out by technical debt, churn, and human burnout unless leaders strike a thoughtful balance.

Code-quality, safety and security risks

While AI tools can quickly write or “patch“ code, they also introduce a self-reinforcing chain of problems.

AI tends to produce code that merely looks right. Subtle logic errors, performance slow-downs, or outright bugs slip in because the model optimizes for “something that compiles“, not for deep correctness.

Instead of solving root causes, generated patches may just hide symptoms (like catching an exception rather than fixing the logic that raised it). Some models even “cheat“ by rewriting unit tests so they pass without testing real behavior, hollowing out the project’s safety net.

When AI-written changes touch sensitive areas (authentication, crypto, certificate checks, medical-device code, etc.) a single hidden flaw can compromise an entire system.

A flood of low-quality pull requests burns out maintainers, slows real progress, and lowers review standards over time.

If the training data itself was flawed or poisoned, the model can emit insecure or nonsensical code that reviewers may overlook in the growing noise.

Without rigorous human review and a clear understanding of AI limits, the short-term speed gain of auto-generated code can quietly convert into serious long-term technical and security debt.

Legal, licensing and ethical issues

There is a cluster of legal-and-ethical headaches that come up when software developers use large-language models (LLMs) to generate text or code.

Under U.S. copyright law, an LLM isn’t a “legal person“, so it can’t own the copyright in anything it spits out. “Who owns the output?“ is still unsettled. Courts and lawmakers haven’t decided whether the end-user, the company that runs the model, or no one holds copyright in purely AI-generated artifacts.

Until that’s clarified, it’s hard to know who can safely license or re-license the code an LLM produces.

If the model was trained on copyrighted code that was scrapped without permission, it might regurgitate chunks that are “substantially similar“ to the originals.

A developer who unknowingly ships that code could face infringement claims.

Because training pipelines slurp public content at scale, bad actors could plant code online that contains hidden vulnerabilities or restrictive license clauses. If an LLM reproduces that poisoned code, downstream users inherit the security hole or the licensing time-bomb, undermining trust in AI-assisted development.

Usefulness and present limits of LLM code-assistants

AI coding helpers like GitHub Copilot or ChatGPT-based plug-ins are great at the easy, low-risk stuff – boilerplate, obvious syntax, quick test scaffolds, note-taking, one-off data wrangling – but they falter as soon as the task involves real design thinking or large, intricate codebases.

When a human programmer already knows exactly what to build, the assistant is essentially an “autocomplete on steroids“, shaving keystrokes and keeping you in flow.

Because LLMs predict plausible next tokens rather than reason about deep semantics, they miss edge cases and architectural trade-offs. Past a moderate problem size, babysitting and verifying the bot takes longer than just coding it yourself.

The models confidently hallucinate APIs, rewrite perfectly good code on a whim, and will happily reverse themselves if you push back – so every line still needs a human audit.

LLM assistants are terrific accelerators for trivial or repetitive chores, but for non-trivial engineering they’re only helpful to developers who have already solved the problem mentally and are willing to scrutinize every token the model spits out.

Situations where AI is regarded as helpful

Wherever the work is repetitive, well-structured, or language-heavy, AI removes friction, letting human engineers spend their energy on architecture, creative problem-solving, and user empathy – the parts no model can yet replace.

AI takes the drudge work off your plate (writing commit/PR summaries, spitting out boiler-plate, renaming variables, stubbing quick tests).

It drafts reasonably idiomatic translations when you port code between languages or frameworks and even narrates tangled legacy modules in plain English so you can refactor safely.

For those quick “parse this CSV, compute a metric, show a Markdown table“ jobs, a prompt to the model yields a runnable script in seconds.

In meetings, AI transcribes the conversation and auto-extracts action items with owners and deadlines.

Future prospects and model-evolution

Despite the big strides we’ve already seen, large-language-model AI still looks unlikely to get good enough in the next years to put human software engineers out of work.

Each new, bigger model brings smaller accuracy jumps, suggesting a plateau rather than an exponential curve of improvement.

Ideas like “training on AI-generated code“ or relying on users’ chat corrections can actually degrade model quality (so-called model collapse) because fixes never reach the weights.

RL-style post-training might help, but no-one has shown it working reliably at the gigantic scale needed for production-grade software.

Venture capital and media excitement look bubble-like – even insiders caution that AGI timelines are over-optimistic.

About the Author:

About the Author

Dmitry Baraishuk is a partner and Chief Innovation Officer at a software development company Belitsoft (a Noventiq company). He has been leading a department specializing in custom software development for 20 years. The department has hundreds of successful projects in such services as AI software development, healthcare and finance IT consulting, application modernization, cloud migration, data analytics implementation, and more for US-based startups and enterprises.

Related Items:AI Hype vs Reality, Software Development

Comments

TechBullion

AI Hype vs. Reality in Software Development

GitHub Copilot agent’s behaviour on the .NET runtime repository

Observations about GitHub and the public experiment

Those few successes show the model can help if humans shepherd it – but the low merge rate and extra review burden leave many observers unconvinced that Copilot is yet a net productivity win.

Management mandates and corporate dynamics

Economic and social implications

Consequences for developers and workflow

Code-quality, safety and security risks

Legal, licensing and ethical issues

Usefulness and present limits of LLM code-assistants

Situations where AI is regarded as helpful

Future prospects and model-evolution

About the Author:

Trending Stories

XRP has become a golden entry point Ripplecoin Mining converts XRP into daily income

Top Presale Cryptos in 2025: Last Call Before BlockDAG Launches, MAXI, SNORT, & SUBBD Rise

Sunny Nehra’s Viral Kirana Hills Thread: A Landmark Victory for Truth in the Digital Era

5 Best Advanced AI Video Generation APIs for Developers (2025)

Top Cryptos to Watch in 2025: BlockDAG, WEPE, SUBBD & $BEST Are Making Noise

What Is Vanilla Prepaid Canada and How Does It Work?

Moonshot MAGAX Targets a Potential 22,450% Surge as the Crypto Market Eyes Its Next Big Winner

See Why Cold Wallet’s 4,900% ROI Outshines APT Price Analysis & Stellar Price Action for 2025

Esteban Solano: A Blueprint for Aviation Excellence and Industry Resilience

Inclusive SPED Teacher Apparel by TeachersGram

Follow On Facebook

Latest Interview

Interview with Andrei Yaryha, Visionary Developer Behind MotoSpot: A Game-Changing App Enhancing Motorcycle Safety Worldwide

Vuzix’s Strategic Leap into Mass-Market Augmented Reality: An Interview with Paul Travers, founder and CEO of Vuzix

Press Release

INE Named to Training Industry’s 2025 Top 20 Online Learning Library List

MultiBank Group Delivers Record H1 Results with $209M Revenue and MBG Token Driving 7X Returns Since Launch.

Pin It on Pinterest

TechBullion

GitHub Copilot agent’s behaviour on the .NET runtime repository

Observations about GitHub and the public experiment

Those few successes show the model can help if humans shepherd it – but the low merge rate and extra review burden leave many observers unconvinced that Copilot is yet a net productivity win.

Management mandates and corporate dynamics

Economic and social implications

Consequences for developers and workflow

Code-quality, safety and security risks

Legal, licensing and ethical issues

Usefulness and present limits of LLM code-assistants

Situations where AI is regarded as helpful

Future prospects and model-evolution

About the Author:

Recommended for you

Trending Stories

XRP has become a golden entry point Ripplecoin Mining converts XRP into daily income

Top Presale Cryptos in 2025: Last Call Before BlockDAG Launches, MAXI, SNORT, & SUBBD Rise

Sunny Nehra’s Viral Kirana Hills Thread: A Landmark Victory for Truth in the Digital Era

5 Best Advanced AI Video Generation APIs for Developers (2025)

Top Cryptos to Watch in 2025: BlockDAG, WEPE, SUBBD & $BEST Are Making Noise

What Is Vanilla Prepaid Canada and How Does It Work?

Moonshot MAGAX Targets a Potential 22,450% Surge as the Crypto Market Eyes Its Next Big Winner

See Why Cold Wallet’s 4,900% ROI Outshines APT Price Analysis & Stellar Price Action for 2025

Esteban Solano: A Blueprint for Aviation Excellence and Industry Resilience

Inclusive SPED Teacher Apparel by TeachersGram

Follow On Facebook

Latest Interview

Interview with Andrei Yaryha, Visionary Developer Behind MotoSpot: A Game-Changing App Enhancing Motorcycle Safety Worldwide

Vuzix’s Strategic Leap into Mass-Market Augmented Reality: An Interview with Paul Travers, founder and CEO of Vuzix

Press Release

INE Named to Training Industry’s 2025 Top 20 Online Learning Library List

MultiBank Group Delivers Record H1 Results with $209M Revenue and MBG Token Driving 7X Returns Since Launch.

Pin It on Pinterest