Most enterprise AI agent purchases go wrong in the same place: the demo. A vendor shows an agent answering questions fluently for ten minutes, procurement sees a line item that looks like software, and six months later the platform is quietly abandoned because nobody thought to ask where the agent’s knowledge lives, who can see its output, or what happens when it makes a mistake inside a customer conversation.
This guide is a checklist for asking those questions before the contract is signed. It is vendor-neutral in structure. Where specific products appear, they appear as examples of a category, and every category has alternatives worth pricing.
First, know which kind of platform you are buying
“AI agent platform” now covers at least four genuinely different product shapes, and comparing across them on a single spreadsheet produces nonsense.
Workflow automation tools such as Zapier and n8n connect apps through triggers and predefined steps, with AI increasingly embedded in those steps. They are excellent at deterministic pipelines. They are not built for open-ended work where the agent decides its own next action.
Self-hosted agent runtimes such as OpenClaw run on your own infrastructure. You get maximum control and data locality, and you take on all operations, patching, and uptime yourself. For firms with strict data residency rules and a capable platform team, this is a legitimate choice rather than a compromise.
Autonomous agent services such as Manus sell the agent as a finished worker: you give it a task, it goes away and executes. Strong for individual productivity, thinner on the team-management and governance layer.
Agent workspace platforms give each agent a persistent cloud environment with files, tools, and chat-channel connections, managed at a team or organization level. This is the category most enterprise buyers actually mean when they say they want “AI employees,” and it is where the rest of this guide concentrates, though the evaluation criteria apply broadly.
Memory and file grounding
The single most predictive question I ask vendors: where does the agent’s knowledge live, and can my team read it?
Chat-history-based memory degrades. Context windows fill, threads get summarized, and the agent’s understanding of your business becomes an artifact of compression choices you cannot inspect. Platforms that ground agents in real files behave differently: policies, contracts, SOPs, and prior work sit in a drive the agent reads from and writes to, and a human can open any of those files, correct them, and version them.
The distinction matters operationally. When a file-grounded agent gives a wrong answer, you fix a document. When a chat-memory agent gives a wrong answer, you argue with a black box.
Related question: does the platform distinguish between long-term knowledge and per-conversation context? Good architectures separate durable files from isolated sessions, so one customer’s conversation never contaminates another’s, and temporary instructions do not silently become permanent behavior. Ask the vendor to show you, concretely, where a fact goes when the agent needs to remember it next month.
Sandboxing and audit trails
Agents that only generate text are low risk. Agents that browse the web, run terminal commands, execute code, and modify files are a different asset class, and they are also the ones that deliver most of the value. The control question is containment.
Each agent should run in an isolated sandbox: its own compute environment, its own file volume, no lateral access to other agents or tenants. Ask whether isolation is per agent or per customer, and what the vendor’s answer is for noisy-neighbor and data-bleed scenarios.
Then ask about observability. Can a supervisor watch what an agent is doing while it works, including its browser activity and terminal commands, rather than reading a summary afterward? Can you review the file changes an agent made, ideally with version control underneath, so a bad change can be rolled back? Platforms that expose git-style history over agent work give compliance teams something real to audit. For regulated industries, this is often the difference between a pilot that clears review and one that dies in legal.
A caveat worth writing down: several platforms advertise read-only access to uploaded source files while still allowing agents to create new files in their workspace. Both facts can be true at once. Make sure your security review covers what agents can write, not only what they can read.
Channel integrations
An agent nobody talks to produces nothing. In practice, adoption tracks how easily the agent appears inside the tools your staff already use, which for most organizations means some mix of Slack, Microsoft Teams, WhatsApp, Telegram, or Discord, plus regional platforms like Feishu and WeCom for teams operating in China.
Two evaluation points beyond the checkbox list. First, session mapping: when the agent is in a WhatsApp group and three separate direct messages, does each conversation get its own isolated context, or does everything blur together? Blurred context is a privacy incident waiting to happen in customer-facing use. Second, be clear that channels are entry points, not storage. If the platform treats a Slack thread as the agent’s memory, knowledge evaporates when the thread scrolls away. Durable material should land in files.
For custom surfaces, check for a real API. An OpenAPI-documented REST interface with embed tokens for frontend use means your developers can put the agent inside your own product without exposing master credentials in a browser.
Team permissions and organizational boundaries
Enterprise usage fails without boundaries. The questions to ask are unglamorous and decisive.
Can you separate departments, clients, or subsidiaries into distinct workspaces with independent files, members, and billing? Can an agent be scoped to one business function with only the files that function needs, rather than everything the company has ever uploaded? Least-privilege applies to agents exactly as it applies to employees, and platforms differ widely in how naturally they support it.
Also check the mobility of assets. If a team builds a valuable agent inside one workspace, can it be transferred to another workspace when the org chart changes, with its files and configuration intact? Small feature, large consequence at year three.
Pricing models: per seat, per agent, per credit
Three models dominate, and each distorts behavior differently.
Per-seat pricing charges for human users. Familiar to procurement, but it penalizes wide adoption and has little relationship to what agents actually cost the vendor, which is compute.
Per-agent pricing charges for each deployed agent, usually with a pool of usage credits included. Buda, a Drive-based agent platform, is a representative example of this model: a free tier with no credit card required, then $20 per agent per month on its Plus plan and $100 per agent per month on Pro, with usage credits shared at the workspace level. The per-agent model maps spending to deployed capability rather than to headcount, which tends to suit teams running a few heavily-used agents.
Pure consumption pricing charges for usage alone. It is the most honest model economically and the hardest to budget, and finance teams generally hate it (I used to think unpredictability was a minor objection; watching a quarterly forecast meeting cured me of that).
One definitional trap: usage credits are not tokens. On most platforms a credit is a composite unit covering model calls plus third-party services the agent uses while working, such as browsing or media generation. Ask for the conversion logic in writing, ask what happens when credits run out mid-task, and ask whether unused credits expire. The answers vary more than the pricing pages suggest.
Vendor lock-in and exit paths
Every platform in this market is young. Plan for the possibility that you will leave.
The practical test is export. Can you pull your files out in standard formats? Are agent instructions readable text you can take elsewhere, or configuration trapped in a proprietary UI? If the platform supports skills or workflow packages, are they inspectable, ideally stored as plain files in a repository you control? Open, file-based assets travel; embeddings in a vendor’s private index do not.
Ask also about the deployment spectrum. Some vendors offer self-hosted or on-premise options at the enterprise tier, which functions as both a compliance answer and an insurance policy. You may never exercise it. Its existence still changes the negotiation.
The operating model: who owns the agents
The evaluation criteria above are all about the platform. The most common failure I see is about the organization, so it belongs in the buyer’s guide too.
An agent grounded in files is only as current as its files. Someone has to own the knowledge base: adding the new return policy when it changes, retiring the old org chart, reviewing the questions the agent could not answer last month and filling the gaps. This is genuine recurring work, closer to maintaining internal documentation than to administering software. In organizations where documentation already has an owner, agents slot in naturally. In organizations where documentation is nobody’s job, the agent’s answers quietly rot, and the platform takes the blame.
Decide the ownership question before the pilot, not after. A workable minimal structure: one business owner per agent who is accountable for its knowledge and its escalations, and one platform administrator per workspace who handles members, permissions, and billing. Both roles can be fractions of existing jobs. Neither can be vacant.
Budget for the human side of rollout as well. Staff need to know what the agent is for, what it must never be used for, and how to flag a wrong answer so the flag actually reaches the file owner. A short feedback loop, wrong answer to corrected document, ideally within days, is what separates agents that improve from agents that plateau. Vendors can support this with usage analytics and unanswered-question reviews, and several do, but no vendor can supply the owner.
There is a procurement implication hiding here. Per-agent pricing makes this ownership model easy to reason about, since each line item on the invoice corresponds to a named function with a named owner. Per-seat pricing obscures it. Whichever model you choose, insist that your internal accounting can answer a simple question at renewal time: which agents earned their keep, and who vouches for each one?
Security review questions worth asking verbatim
A condensed list your security team can send as written questions:
- Is data encrypted in transit and at rest, and are uploaded files ever used to train models?
- What is the isolation boundary between tenants, and between agents within one tenant?
- What exactly can an agent write or modify, and how are those changes logged and reversed?
- How are chat sessions isolated between external users, especially in group-chat channels?
- Which subprocessors touch our data when an agent browses the web or calls third-party tools?
- What are the offboarding guarantees: deletion timelines, export formats, credential revocation?
- Is there an on-premise or self-hosted option, and what does it actually include?
Vendors comfortable in the enterprise answer these in a day. Hesitation is itself information.
Run the pilot like it will fail
My closing advice is procedural rather than technical. Pick one workflow with real stakes but a bounded blast radius: internal HR policy questions, first-line support triage with mandatory human escalation, weekly reporting from files your team already maintains. Load the actual documents, not sanitized samples. Give it thirty days, and define in advance what failure looks like, because “the team stopped using it” is the most common failure and the least measured one.
Then, before renewal, do the exercise almost nobody does: attempt an export. Pull the files, the instructions, and the audit history out of the platform and see what you are holding. If what comes out could be loaded into a competitor next quarter, you bought a platform. If not, you bought a dependency, and the price should reflect that.
