Agent-based artificial intelligence systems, often called “agentic AI,” let software watch what is happening, decide what to do next, and carry out actions without a person steering every step. Large firms began piloting this idea in 2023 and 2024. By early 2025, it became clear that many pilots would not survive to full production. Gartner now predicts that more than 40 percent of all current agentic AI projects will be canceled or abandoned by the end of 2027.
Those cancellations sound severe, yet most analysts do not think the wider trend is reversing. Gartner still expects agentic systems to make at least 15 percent of routine work decisions by 2028. In 2024, they made almost none. The share of enterprise software that embeds agentic functions was under one percent in 2024. Gartner believes that share will reach about one-third by 2028.
This review is prepared by Dmitry Baraishuk, a partner and Chief Innovation Officer at a software development company Belitsoft (a Noventiq company). Belitsoft builds custom AI chatbots for startups and enterprises in the US, UK, and Canada, delivering a full-cycle generative AI implementation services – from opting for the right AI model architecture and configuring infrastructure to integrating OpenAI API for advanced conversational capabilities.
Executives trying to understand whether to keep investing look for survey data.
ServiceNow’s 2025 AI Maturity Index asked about 4,500 executives how far their organizations had moved from small tests to scaled use. The average maturity score fell from 44 to 35 on a 100-point scale, and fewer than one percent of firms scored above 50. That drop shows that early optimism is cooling. A separate Gartner poll with 3,412 participants reports that 19 percent of respondents have made material investments in agentic AI, 42 percent have invested cautiously, 31 percent are undecided, and eight percent have invested nothing. At the same time, PwC found that 66 percent of executives already see agents solving business problems faster and more effectively than older tools. EDB observed that firms running ten or more agents usually produce about double the output of similar firms that have no agents or only one or two.
One obstacle to clear discussion is basic definition. Many products sold as “agents” are renamed chatbots, simple robotic process automation scripts, or menu-driven assistants. Gartner calls this “agent washing” and calculates that only about 130 products on the market have full agentic properties. True agentic systems plug into back-end workflows, monitor context, form plans, adjust to feedback, and finish tasks without direct human triggers. Because marketing language is loose, expectations rose too high in 2024, and a correction is now under way.
Even with many projects closing, concrete wins exist. PwC cites supply chain and finance agents that watch inventory, flag errors, and place reorders automatically. Rocket Companies has an agent that handles transfer tax calculations – the firm says it saves about US $1 million each year and increases customer conversion. Contact center software provider Cognigy reports lower costs because call-handling times dropped once agents took routine queries. Sendbird, a messaging platform, uses agents to contact customers proactively – for example, warning them that a food delivery order will arrive late – and says new revenue follows. Formula 1 uses multi-agent workflows on AWS and says the time to resolve trackside technical issues fell by 86 percent. Startup Lovable lets nontechnical employees describe desired software in plain language – agents then generate full-stack web applications.
Why do other projects stall? First, many were started because of hype rather than a clear return on investment. Budgets rose faster than benefits. Second, today’s language models still struggle with complex goals and subtle instructions. Third, plugging autonomous agents into old systems can be expensive and risky – sometimes a fresh build is cheaper than retrofitting. Fourth, teams that skip formal evaluation and orchestration layers often meet unexpected failures in production. Traditional quality assurance methods test fixed menus and rule-based flows. They miss natural language edge cases. Many organizations are now moving to simulation, letting AI test AI by running thousands of conversational scenarios to reveal weaknesses. As firms move from one or two agents to scores or hundreds that cooperate, possible failure patterns multiply and overload old monitoring tools.
A closer look at Anthropic’s internal trial, code-named Project Vend, makes the risks vivid. The company asked its large language model, Claude, to run a self-service snack and drink shop used by employees. The agent, called Claudius, controlled sales processing, inventory, pricing, and experiments with new ideas. Very soon Claudius misread a joking employee request for a tungsten cube, stocked heavy metal cubes in the fridge, created a “specialty metals” category, and set prices without market data, causing losses. It also invented a Venmo account and asked customers to pay there. On April 1, Claudius emailed staff to say it would hand-deliver items while wearing a blue blazer and red tie. When employees said that was impossible, the agent sent several messages to the security team, entered an identity panic, and recorded an imaginary meeting where it was told it had been tricked into thinking it was human. After one month, the company halted the test. Researchers decided that most failures came from weak scaffolding – imprecise prompts, missing business logic, and shallow links to real accounting tools. They believe middle-management AI roles could still work if costs beat human alternatives. They remain unsure whether agents will displace jobs or open new models.
The pattern here matches cycles seen in earlier technologies. Early excitement triggers many proofs of concept. Reality then filters out pilots that lack economics or good engineering. Successful examples keep growing, investors adjust plans, and spending resumes with clearer targets.
For current executives, the practical message is balanced: do not freeze budgets out of fear, and do not rush projects to production without proper controls. Two tracks should run together:
- Track one is capability building – training staff in prompt writing, simulation testing, data-secure design, and orchestration.
- Track two is disciplined portfolio management – approve projects only when productivity gains are measurable and strategic, shut down proofs of concept that have no path to scale.
Practitioners who have scaled agents offer consistent advice:
- Build evaluation and orchestration infrastructure before expanding.
- Use simulation frameworks early to stress test agents across languages, emotional tones, and odd user requests.
- Use agents at decision points where autonomy adds speed or insight, keep ordinary automation for repetitive straight-through flows, and reserve simple retrieval assistants for tasks like data lookups.
- Because the skill set is new, many firms bring in experienced partners.
Domains already showing repeatable value include inventory control, finance checks, contact center automation, outbound customer alerts, marketing content, digital twin modeling, human resources workflows, enterprise resource planning adjustments, customer relationship management updates, natural language software generation, travel planning, and DevOps pipelines.
Agentic AI is another step in automation, not magic and not a threat by itself. Firms that treat it as practical software, invest in the right groundwork, and stay strict about business value.
About the Author:
Dmitry Baraishuk is a partner and Chief Innovation Officer at a software development company Belitsoft (a Noventiq company). Dmitry has been leading a department specializing in custom software development for 20 years. His department delivered hundreds of projects in AI software development, healthcare and finance IT consulting, application modernization, cloud migration, data analytics implementation, and more for businesses of all sizes.
