TESTHEAD: RAG to the Rescue: Reimagining Enterprise Unit Test Management with AI (a PNSQC live blog)

Wednesday, October 15, 2025

RAG to the Rescue: Reimagining Enterprise Unit Test Management with AI (a PNSQC live blog)

There is an entire paper with a lot of great information behind this session, so rather than try to recap all of that, I actively encourage all to Read the paper (PDF). Additionally, this paper is a joint effort with Gaurav Rathor, Ajay Chandrakant Bhosle, and Nikhil Y Joshi. With that out of the way...

RAG is a practical framework that blends Retrieval Augmented Generation into the Model Context Protocol to help generate and manage unit tests in large enterprises. Test creation, governance, auditability, and measurable business impact are key components considered by this approach. Again, there's a lot to dissect here, so check out the paper specifically for details.

Unit testing in big organizations is challenging, no question. Legacy code, multiple frameworks, technical debt, uncertain code coverage, and long fix times are all issues that may of us are familiar with. The authors suggest using AI to assist with test generation as well las test management.

Neat. So how would that work?

The proposal consists of four major parts that work together as a microservice.

AI Agents interpret intent, plan steps, use memory, and invoke tools. They orchestrate the workflow rather than directly hitting data stores.
Model Context Protocol acts as a secure, typed gateway between agents and external systems. Think of it as a translator and auditor that controls what context reaches the model.
RAG Knowledge Layer retrieves code, prior tests, specs, domain docs, and compliance rules from a vector database to ground prompts in the organization’s reality.
LLM Generator builds unit tests from that curated context, returning deployable test classes to clients.

Given a request to create tests for a class, the agent pulls the file and metadata, queries the vector database by package and class, and retrieves related classes, utility code, mocks, example tests, and notes. That bundle becomes a structured prompt, delivered to the LLM through MCP. The output is a test class that aligns with domain logic and existing patterns.

The pilot for this system was aimed towards a specific and critical micocroservice. The team accomplished automated unit test generation, which created many quick coverage wins, and also helped track time saved and defect trends discovered.

According to Gaurav, ROI modeling compared manual test creation to be about four hours per case. With this new RAG driven approach, they were able to cut this time down to about one hour, including review.

Again, there is a lot to take in from these examples and I encourage checking out the paper (linked above) for the specifics.