Transforming Quality Assurance with Agentic AI

TL;DR: In many organisations, small QA teams are responsible for writing and maintaining test suites across multiple engineering streams. QA capacity remains fixed even as AI agents accelerate development velocity. This article explores ways to integrate AI agents into existing testing processes to remove the QA bottleneck and achieve faster time-to-market with greater reliability and lower costs.

During an AI-enabled mainframe modernisation engagement with a public sector client, we observed that the legacy environment had evolved with limited documentation, leaving testing activities heavily dependent on critical business knowledge concentrated among a small group of subject-matter experts. Attrition created knowledge gaps that further slowed testing efforts and increased delivery risk.

We leveraged AI to analyse existing system artefacts and build a current-state understanding of application behaviour. In some cases, AI agents created test cases based solely on business requirements documents and architecture plans.

The testing team was able to go from nothing to a solid test case suite within hours rather than days or weeks, accelerating speed to market at significantly lower cost.

Introducing AI agents in testing creates uncertainties because of the emphasis on the human role in quality validation. However, there are opportunities for AI to take over time-consuming execution tasks, while humans stay in the loop for control.

Let's rethink the entire testing workflow as a 2 layer system.

Reasoning Layer - Predominantly AI

AI agents perform the cognitive heavy lifting in the QA process, providing the scale, speed, and consistency needed to keep pace with AI-accelerated code generation.

They can analyse requirements, user stories, architecture artefacts, code changes, historical defects, and test results to:

Generate test cases and write the actual tests
Diagnose failures
Navigate the live DOM looking for things that could go wrong.

Importantly, AI capability need not be confined to the developer workstation. AI agents can operate directly within CI/CD pipelines, automatically analysing pull requests, understanding code changes, generating or updating tests, and identifying potential regressions before software reaches production.

Traditional Testing Tools

Existing testing tools and automation frameworks, such as Playwright, CI/CD pipelines, test runners, and observability platforms, continue to function as usual in the new QA workflow. However, AI changes the quality and quantity of inputs these tools receive, making them significantly more effective.

Governance Layer - Predominantly Human

Human testers help organisations understand what software behaviour means for customers, employees, and business outcomes. The agent surfaces the work; the human governs the outcome.

Traditional QA has often been associated with repetitive execution: running regression suites, validating workflows, maintaining selectors, and documenting defects. AI agents are increasingly capable of automating much of this operational work at scale.

We are now entering a phase where human QA work becomes less about checklists and more about intent.

Human Judgment Still Matters

AI agents can read the DOM, APIs, logs, and test conditions, but do not inherently understand business intent.

An AI testing agent cannot naturally determine whether:

A tax calculation aligns with regulatory expectations
A claims workflow feels trustworthy to a customer
A banking approval journey creates unnecessary friction
An accessibility experience feels intuitive for real users

Quality extends beyond functional correctness into trust, usability, fairness, and experience design. It requires human judgments grounded in empathy and business understanding.

Creative Exploration Remains Human Strength

AI systems excel at systematic testing. They can follow patterns, execute instructions consistently, and scale repetitive validation. Humans, however, remain uniquely capable of creative exploration.

Real users rarely behave in perfectly logical ways. Experienced QA professionals instinctively test systems to surface unexpected behaviours:

Behaving like confused first-time users
Exploring edge cases outside documented requirements
Introducing unpredictable action sequences
Identifying workflows that technically function but feel frustrating in practice

Human behavioural thinking becomes even more valuable in AI-native applications, where interactions are conversational and less deterministic than those in traditional software systems.

Humans Define the Rules AI Operates Against

AI testing systems are only as effective as the context and guardrails surrounding them. Humans remain the custodians of intent.

As agentic testing matures, organisations will need humans to explicitly define:

Behavioural expectations
Business invariants
Compliance requirements
Accessibility standards
Acceptable risk boundaries

Humans must create context artefacts such as .cursorrules, CLAUDE.md, governance policies, and testing specifications to encode organisational intent into workflows AI agents can follow.

AI-Powered Testing in Practice

Some examples of how AI agents can support the testing process.

Test Case Generation

An AI agent reads user stories, requirements, and code changes and generates structured test cases. A human then reviews and refines them before anything moves forward. Approved test cases can either go straight to manual execution or feed directly into test code generation.

Test Code Generation

Once you have approved test cases, AI agents generate boilerplate tests in your stack of choice, like Playwright for UI flows or integration tests for API and service-level coverage. A human checks the generated tests before integration.

Always-On Exploratory Testing

Agents continuously perform hypothesis-driven exploratory testing against every pull request or deployment candidate. Instead of exploratory testing occurring only during late-stage QA cycles, organisations gain continuous behavioural validation throughout development.

Self-Healing Test Infrastructure

When frontend refactors break selectors or UI structures change, agents can analyse failures, identify updated DOM patterns, repair test suites automatically, and raise pull requests for the team to review.

Final Words

The organisations that succeed in AI-enabled delivery will treat QA as a strategic human-AI capability embedded throughout the software lifecycle.

AI agents handle scale, speed, execution, and maintenance
Humans provide judgment, context, creativity, and governance.

Humans move higher up the abstraction layer and ensure software remains aligned to human expectations and business outcomes. The future of QA is human-guided but agent-accelerated.