Over the past 12 months, generative AI (GenAI) technology has opened a door to an entirely new area of performace for Chief Information Officers & Chief Data Officers. As of September 2023, however, surprisingly few enterprises have moved past the ‘hackathon phase’ to produce GenAI models throughout the company. McKinsey’s state of AI in 2023 report indicates that less than a third of companies have deployed any form of AI across more than one business function. Even arguably one of the more basic applications of this technology – knowledge retrieval, often appearing as chatbots – hasn’t seen deployment at scale in most large organizations.
Common adoption roadblocks & model reliability
While cultural barriers play a part in delaying the early adoption of new technology, a more prevalent challenge we consistently see is model unreliability. Many of the world’s largest banks, Private Equity funds, and consultancies already know the frustrations of chatbots that work intermittently, only to later produce an incorrect or “hallucinated” response, wasting more time than saving it.
For users to trust these tools, they need consistent accuracy, particularly since many end-users might not have the context to assess a response’s credibility. Most unreliable performances stem from a primary issue: poor data governance.
You must have data governance to build any successful AI system. Without it, models run on potentially irrelevant and low-quality data (e.g., inconsistent, outdated, duplicated) but still try to give an answer. Large language models (LLMs) in chatbot applications run on diverse sources, from industry reports to employee policy documents to Slack messages and emails. Some CIOs we’ve talked to say they didn’t even see such information sources as ‘data’ until GenAI emerged, making the oversight in governance and quality controls around this data unsurprising. This oversight it poses a huge blocker to reliable model deployment.
The consequences of feeding unstructured data to LLMs
Consider a chatbot in a financial institution that answers questions about historical investments. If someone asks about a specific market size, the LLM might find several potential answers in its vast document database. Some might be estimates from a Slack conversation; others might come from official market reports, some of which are outdated or have significant footnotes with crucial assumptions. While the LLM tries to provide a reasonable answer, it might lose critical contextual information.
Another example might involve a query about the competitors of “Company X”. If the most relevant document refers to “Company X” as “X Ltd.” in a recent email chain, the LLM might prioritize another source. Generally, when sifting through extensive information, LLMs often struggle to pick out relevant details from the user’s context, a challenge exacerbated by the massive amounts of data that enterprises manage.
In addition to the unreliable retrieval of information, concerns about data safety also arise when implementing enterprise chatbots. We usually identify two primary obstacles: concerns about revealing sensitive data like personal identifiable information (PII) and challenges in controlling data access within an organization. Enterprises must ensure their chatbots don’t unintentionally disclose confidential or personal details to models and put measures in place to ensure specific teams or individuals don’t access data outside their designated areas.
Data quality is the missing link for trustworthy AI
We believe labeling the vast amounts of unstructured data with tags (‘metadata’) that capture essential contextual information, such as document version, purpose, and sensitivity considerations, is the initial step to address these data challenges. Only then can we intelligently assess data relevance, quality, and safety.
Large language models can help companies improve operations, enrich customer experiences, and foster various innovations and efficiencies in the long run. However, a gap still exists between the experimental use cases that excite leaders and the roll-out of trustworthy, ready-to-use knowledge assistants that can safely assist vast groups in an organization. To bridge this gap, enterprises must establish solid data governance frameworks that focus on data quality and safety. Otherwise, we find it hard to envision a safe and scalable enterprise GenAI deployment.