Artificial intelligence

Generative AI, The Next Big Thing in Intelligent Document Processing: Real-world Applications from an AI Solutions Architect

By James Andrew

Posted on November 6, 2023

The days of manual data entry and traditional document processing are long gone. In their place, advanced Artificial Intelligence (AI) and Machine Learning (ML) algorithms have emerged, promising efficiency, accuracy, and scalability. Anjanava Biswas, a senior AI Specialist Solutions Architect at Amazon Web Services (AWS), is leading this revolution. His work has been groundbreaking, particularly in Intelligent Document Processing (IDP).

Biswas emphasizes that the sheer volume of documents industries handle today is staggering and that traditional manual method and even legacy technologies just can’t keep up. That’s where AI and ML come into play. His projects in Amazon for various sectors have been pivotal in elevating the document processing capabilities of many industries, evident from operations such as healthcare and finance.

AI and ML in Real-Life Document Processing

The ability to quickly and accurately process information is paramount in today’s competitive business sphere. Organizations across various sectors, including healthcare, finance, legal, retail, and manufacturing, are inundated with vast documents daily. These documents, rich in critical data, are pivotal in timely decision-making, ensuring customer satisfaction, expediting customer onboarding, and minimizing customer complaints. However, the predominant manual processing of these documents poses significant challenges. Biswas mentions that the current situation is time-consuming and susceptible to costly and critical errors. The limited automation available today further exacerbates the problem.

In the mortgage industry, Black Knight Inc. aims to standardize and validate bank statements by leveraging Amazon Web Services (AWS) automation and machine learning (ML) solutions. Their AI product, AIVA, boosts efficiency by reducing manual comparison tasks and using ML to decrease loan processing expenses. With the help of AI, Black Knight is able to offer faster and more dependable mortgage origination solutions, helping their clients experience better service and grow more efficiently.

At the same time, fintech company Paytm, India’s leading digital payments platform, collaborated with AWS to streamline its manual user onboarding process, specifically the Know Your Customer (KYC) procedure. Utilizing Amazon Textract, Paytm can extract user data from intricate identity documents with a 97% accuracy rate and detect image imperfections in real time, prompting onsite agents to retake photos if needed, thus avoiding repeated visits. With an ML-driven Optical Character Recognition system, it can swiftly and accurately process text from various identity documents, which eliminates manual data entry, reduces errors, and cuts down user authentication time from days to minutes.

With a background rooted in computer vision and language AI, Biswas has gained a nuanced understanding of document processing intricacies and the role of AI in navigating these complexities. Specialized in Applied AI, Biswas has honed the ability to tackle real-world document processing and ML integration challenges. Through his AI and ML projects, Biswas focuses on providing pragmatic solutions that address the core challenges faced in document handling and processing, bridging the gap between theoretical AI capabilities and tangible operational improvements.

Core innovations in intelligent document processing

Advance natural language processing (NLP) capabilities

One of the core NLP capabilities within intelligent document processing comes from a product known as Amazon Comprehend. It is a cloud-based AI service offering from AWS that has a long list of NLP features. Drawing from Biswas’ expertise, Amazon Comprehend has introduced advanced features like one-step document classification and entity recognition from native documents.

These innovative features are designed explicitly for intelligent document processing, automatically categorizing documents and extracting business-specific information, such as dates, names, and other pertinent details. According to Biswas, the beauty of this set of features lies in its simplicity and efficiency, as it drastically reduces the overhead and time it takes to train Comprehend’s NLP-based AI models. Organizations no longer need to invest time and resources in gathering large amounts of documents just to train and experiment with the AI model.

Computer vision-based optical character recognition (OCR)

Biswas’ commitment to revolutionizing document processing is evident in Amazon’s IDP offerings. These tools, built on AWS’s AI infrastructure, are designed to extract information from many document types and formats seamlessly. What distinguishes these solutions is their capability to function without specialized ML knowledge. A prime example of this is Amazon Textract, which is an OCR system built with computer-vision technology.

Amazon Textract is a comprehensive ML service that adeptly extracts text, forms, and tables from scanned documents. It transcends the capabilities of conventional OCR systems by identifying, understanding, and extracting intricate data from forms and tables. Whether it’s a straightforward form or a complex legal document, Textract precisely discerns its structure.

Biswas highlights Textract’s prowess in detecting diverse data elements, such as form labels and tabulated data. This proficiency enables the transformation of traditional paper documents and PDFs into structured datasets, streamlining document workflows, augmenting content discovery, and facilitating the development of advanced document processing applications. Additionally, Biswas has championed enhancements to Textract’s Tables feature, ensuring even more efficient extraction of tabular data from diverse document types.

Combined with other AWS services, this integration capability positions Textract as an indispensable asset in document processing. With tools like Textract, businesses can effortlessly interpret and process documents, extracting text, handwriting, tables, and more with unparalleled accuracy. For instance, the Govt. of the UK’s HM Land Registry department is already innovating with Textract and Comprehend. With an intelligent document processing solution built using these AI tools, they’ve reduced document review time by 50% and alleviated manual workload for caseworkers.

Applications of generative AI in document processing

Generative AI has garnered much attention in the field of AI in 2023. With popular software like ChatGPT, these AI models are trained on a vast data set and can perform text generation tasks that traditional AI models were incapable of until now. Biswas emphasizes that generative AI models, aka Large Language Models (LLM), unlock possibilities in intelligent document processing that were otherwise considered as gaps even with advanced AI tools like Textract and Comprehend.

According to Biswas, while intelligent document processing uses AI tools like Amazon Textract and Comprehend to efficiently and accurately extract data from diverse documents, generative AI, with Large Language Models (LLM), enables enhancements, essentially filling gaps that existed even with these advanced ML models.

Aside from effectively extracting data, generative AI in document processing unlocks capabilities such as summarizing hundreds of pages of a document, contextual question answering, and more. Biswas shared his research insights on generative AI through ODSC, highlighting the AI community’s shift towards generative AI-based document processing using Amazon Bedrock, Amazon’s newest generative AI offering. This research article delves deep into the possibilities of how LLMs are used to enhance AI-powered document processing, and the industry is not holding back on these innovations. Quantiphi, an award-winning Applied AI company, has already improved its document processing platform QDox with generative AI using these AWS AI models.

Finding the Right Balance

Biswas acknowledges that as innovation is a constant, so is the challenge of complexity. While he has pioneered numerous groundbreaking research works and solutions, he recognizes that the increasing intricacy of data necessitates an unending journey of development and refinement.

“As we push the boundaries of what AI and ML can achieve, the data we grapple with becomes more multifaceted. No matter how advanced, our solutions must be in perpetual evolution to keep pace,” he says.

Recognizing the importance of collective knowledge and the power of community, Biswas extends his expertise beyond the confines of his technical role. He is not just an AI and ML expert but also an influential speaker, taking to the stage to share insights, foster discussions, and glean learnings that can propel the industry forward.

Biswas has appeared in prominent events like the AWS re:Invent Conference 2022 in Las Vegas and the AWS Global Summit Conference 2022 in San Francisco, where he discussed how AI innovations are changing the document processing landscape. He also presented how generative AI transforms intelligent document processing at the AWS AI/ML Solutions Day Conference 2023 in Palo Alto. For Biswas, these speaking engagements serve as platforms where he not only imparts knowledge but also engages in meaningful dialogues.

“By broadening the discourse, we not only share but also absorb, ensuring that our industry remains at the cutting edge,” Biswas emphasizes.

The AI expert highlights his commitment to continuously improving the AI and ML domain, ensuring that businesses, regardless of their ML proficiency, can benefit from faster and more accurate data extractions, leading to high-quality business decisions while reducing costs.

Related Items:AI Solutions Architect, Generative AI