AI Infrastructure

AI Code Generation Is Outpacing QA. SmartBear’s BearQ Aims to Close the Gap | TFiR

AI codegen is outpacing QA. SmartBear CEO Dan Faulkner explains how BearQ's agentic testing system delivers continuous application integrity at AI speed.

By Monika Chauhan May 11, 2026

0

Software has never shipped faster. AI coding tools like Claude Code and Codex have compressed development timelines so dramatically that features once built in days now emerge in hours. But that velocity is revealing a structural flaw: the rest of the software development lifecycle — particularly quality assurance — was never designed to operate at this speed. The result is a widening quality gap. Code accumulates faster than it can be validated. Manual testers can’t scale. Script-based automation breaks every time an application is updated. And the cost of escaped bugs — in lost revenue, regulatory exposure, and brand damage — is growing alongside the pace of development.

This is not a temporary adjustment problem. SmartBear‘s survey of 273 software quality decision-makers found that 93% have adopted AI coding tools, yet 92% still depend on manual testing as part of their QA process. Seven in ten say application quality is already suffering. Sixty-eight percent fear the bottleneck will worsen. The math is simple: AI has industrialized code generation without a corresponding leap in how that code gets validated.

SmartBear, a company trusted by more than 16 million developers and testers across 32,000 organizations — including 75% of the world’s largest financial institutions — is now betting that the answer to an AI-created problem is an AI-native solution. Its new product, BearQ, is an agentic QA system that deploys always-on AI agents to autonomously explore, learn, and continuously test applications. It is the centerpiece of the SmartBear Application Integrity Core, a platform designed to deliver measurable assurance that software does what it was intended to do — at any speed, at any scale.

The conceptual shift BearQ represents is significant. Traditional testing asks: does the code work? Application integrity asks a harder question: does the application experience match what the developer actually intended? That reframing — from code-level validation to intent validation — sits at the heart of how SmartBear is repositioning the role of QA in the AI era.

The stakes are high. Organizations that cannot close this gap face a binary and painful choice: give back the velocity gains that AI code generation delivered, or ship undertested software and absorb the consequences. Neither is acceptable at enterprise scale.

The Guest: Dan Faulkner, CEO at SmartBear

Key Takeaways

AI code generation has created a massive QA bottleneck: 93% of enterprises use AI coding tools, but 92% still rely on manual testing — a structural mismatch that is getting worse, not better.
Script-based testing tools like Selenium and Playwright are structurally inadequate for AI-speed development — they’re brittle, have no memory, and can’t scale to cover the volume of code AI generates.
BearQ deploys a coordinated four-agent system (Explorer, QA Lead, Tester, Orchestration) that autonomously learns an application, generates and maintains tests, self-heals when the application changes, and reports continuously — without a human sitting in front of a GUI.
Intent validation is the new governance frontier: BearQ verifies that what AI code generators actually produce matches what the developer asked for — catching hallucinations, scope creep, and silent additions from frontier models.
Application integrity is the new quality standard: the industry is shifting from “does the code pass unit tests?” to “does the application continuously behave as intended across real user journeys?”

***

[expander_maker]

In this exclusive interview with Swapnil Bhartiya at TFiR, Dan Faulkner, CEO at SmartBear, discusses the widening quality gap created by AI-driven software development, explains why traditional QA approaches — manual testing, script-based automation, DIY testing stacks — are structurally unequipped to keep pace, and introduces BearQ™, SmartBear’s agentic, always-on QA system built to deliver continuous application integrity at AI speed and scale.

The State of QA in an AI-Accelerated SDLC

AI code generation has transformed one leg of the software development lifecycle — the writing of code itself. But every other step, from specification to review to testing to deployment, remains largely unchanged. The result is a bottleneck that was visible before AI and has now become a crisis.

Q: What are you seeing happen to traditional QA teams as AI code generation accelerates development?

Dan Faulkner: “We recently conducted quite a large survey and 92% of the enterprises who responded told us that they had a hybrid approach to QA for their applications, meaning they’re using some automation, but 92% of them were still using manual testing as part of their process. And so the vast majority of organizations continue to use manual testing. And at the same time, even before AI, nobody felt like they were doing as much testing as they would like to. There’s always a trade-off between velocity requirements — I’ve got to get this thing out there in the wild and into the hands of my customers — and coverage. So very few companies actually were achieving the coverage goals, functional coverage goals that they wanted to with their testing strategies. So now AI codegen has been adopted by almost everybody. 93% of the people in our survey said they were using AI codegen. And that has massively accelerated one leg of the SDLC — the creation of code. So we have essentially created an even bigger bottleneck in the quality department. And that’s really the problem that SmartBear is trying to address: make sure you don’t have to forego all of the velocity benefits that you achieved with AI codegen by having to have this big bottleneck in QA.”

Why DIY and Script-Based Testing Falls Short

Many engineering teams built their quality workflows around open source automation tools — Selenium, Playwright, and similar scripting frameworks. These approaches served a pre-AI world reasonably well. They do not serve the AI-accelerated SDLC.

Q: What happens to homegrown and script-based testing approaches as teams scale under AI-driven development?

Dan Faulkner: “What happens frequently is people will use an open source solution, a Selenium, a Playwright, something like that. And you can do a lot with an SDET and those scripts. The challenge with them is that script-based testing tends to be brittle. And so people often find that rather than increasing their functional coverage, they tend to go back and just have to rebuild the tests because every time the application is updated. The second is they don’t have memory. So you actually need a repository of your tests that shows the evolution of the test suite over time so that you’re not starting from scratch every time. So the combination of brittleness and a lack of memory in open source solutions really just renders those as not adequate for solving the whole problem in an AI-enabled lifecycle. So we are not here to say that nobody should ever use script-based testing, but it’s not going to solve the problem that the increased velocity of AI codegen poses for us.”

Q: So what is the gap, and why can’t it be solved by hiring more people or writing more scripts?

Dan Faulkner: “If you’re using it, you can keep using it for what you’ve got. Similarly, if you are using lots of manual testing, you can absolutely keep using your manual testers. But the gap — the gap between what you want to cover and what’s really being covered — that can’t be solved by more scripts and it can’t be solved by throwing people at the problem because it would require an impossible amount of hiring. And so that’s where our autonomous testing solution, BearQ, fits in. We’re essentially saying we will help you keep pace with the accelerated AI codegen and enable you to fill the gap that the AI codegen bottleneck creates in your quality coverage.”

How BearQ Works: The Four-Agent Architecture

BearQ is not a traditional automated testing tool with a new AI wrapper. It is a fundamentally different architecture — a team of coordinated AI agents that operate continuously and autonomously, modeled explicitly on how a human QA team would be structured.

Q: How does BearQ differ from traditional test automation, and how does it integrate with existing setups?

Dan Faulkner: “BearQ is a completely different approach to quality. We view it as the agentic analog to AI code generation and agentic code generation. What BearQ literally is is an always-on set of QA teammates. So the minimum unit in BearQ is a team of four agents. There’s an agent that autonomously explores your application — you just give it credentials, it will learn how your application works and understand what your application is intended to do, and it builds a very thorough understanding. A second agent creates a testing strategy and a set of test cases from the knowledge that the exploration agent has gleaned. And then there’s a test execution agent which actually runs the tests, and they’re all managed by an orchestration agent. Just like a QA team has a QA manager, there’s this orchestration agent that keeps them in sync.”

Q: What does BearQ actually do once it’s deployed — what is the continuous cycle?

Dan Faulkner: “What that enables this little team of agents to do is keep pace at any scale and velocity that you want to with your application, fill in all the functional coverage gaps that you have. And every time your application is changed, automatically update the knowledge about the application, retire the tests that no longer make sense, add new tests that are needed, self-heal tests that need self-healing. And so it’s just a fundamentally different approach because all the other approaches — script-based, using low-code/no-code testing tools, or manual testing — are all predicated on the idea of a person sitting in front of a GUI and interacting with the application. That is not what happens with BearQ. BearQ runs fully autonomously and actually just prepares reports and data for our customers that say, ‘Here’s what I’ve done.’ So it can fit in with and absorb the tests that you already have. Or if you want to start from scratch, it can start from scratch.”

Governance, Human Oversight, and Intent Validation

The shift to autonomous testing raises immediate governance questions — particularly for organizations in regulated industries. SmartBear has designed BearQ with structured guardrails, contextual controls, and complete transparency into agent activity. But Faulkner also identifies a deeper governance issue the industry has not yet fully grappled with: intent validation.

Q: How does BearQ handle human oversight and compliance requirements for organizations in regulated industries?

Dan Faulkner: “The human in the loop is absolutely required. You cannot run an AI SDLC as a black box. BearQ always shows its work. It will prepare summary reports, but it’s possible for people to drill in to any level of detail that they want to, to see how it’s made its decisions and it shows its work.”

Q: You mentioned intent validation — what is that, and why does it matter for AI-generated code specifically?

Dan Faulkner: “A second point of governance that I don’t think is getting enough coverage is this notion of intent validation. When you are doing prompt engineering and driving code generation at the prompt level — and power users of Claude Code and Codex are now working with no IDE, they’re just having discussions with these agents — how do you validate that what has been written is everything that you asked for and nothing you didn’t ask for? This notion of intent validation, has it really built what I wanted, is very, very important. And even the frontier models have a propensity to hallucinate, to ignore certain instructions, to add some stuff in because they’re trying to be helpful that maybe you didn’t ask for. So it’s very, very important that we actually validate the thing that comes out the backend of a codegen system is exactly what you wanted. And BearQ can really help there because it explores the application autonomously and builds this knowledge representation of what’s really been built, and then the tests reflect what’s really been built. So it’s a great way to validate that what has come out is really what you wanted. And we think that’s a super important part of the governance story as well.”

Defining Application Integrity

SmartBear has introduced a new framing for software quality in the AI era: application integrity. It is a shift from code-level validation to continuous, intent-driven assurance that the full application experience works as intended.

Q: Your survey found 70% of software experts say application quality is already suffering. What does “application integrity” actually mean, and what are you validating against?

Dan Faulkner: “Our notion is that as both the velocity of coding increases and the abstraction of coding increases, the integrity of applications will start to break down. And what we mean by integrity is: does it work, and does it do what you want it to do? So when we strive for application integrity, we are seeking to have the assurance that our application just works as intended — and to have that continuous assurance, because if the application is being iterated on time and time and time again, with multiple rewrites and huge amounts of code churn, we need to know continuously that it works and it’s doing what we intended. With respect to the validation, we’re not trying to validate against a benchmark. We’re trying to enable our customers to validate that the code and the application that has come out of the AI coding agent is actually what they intended — validating the functionality as what the developer intended the application to do.”

The Business Risk of Doing Nothing

The consequences of not closing the quality gap are not abstract. They manifest as a forced choice between two outcomes organizations cannot sustainably sustain: give back the velocity gains AI delivered, or ship undertested software and absorb the consequences.

Q: What are the business risks if organizations don’t close this gap — what is actually at stake?

Dan Faulkner: “The risk of not closing the gap are essentially a massive amplification of the risks that have always been there. There are two types of business risk that an organization faces when deciding to release their application. The first thing they have to do is say, ‘Well, if I release this and I haven’t tested it fully, what’s the probability and the impact of an error for us?’ Now that could be lost revenue, brand reputation damage, lost customers. It could be a regulatory fine. They have to balance the desire to avoid those downsides against speed. They can’t just hold their application internally forever. They’ve got to get the thing out there so people can actually use it. And so they’ve constantly been making this trade-off. Now that trade-off was difficult before AI came around. Right now, they are generating all of these improvements in the application faster than they ever have, and it’s become much harder for them to actually answer that question. There’s this pileup of new capabilities that they want to get out to the market, but they just don’t have the capacity to test it properly. So you will either find delays that they don’t feel like they can live with — they’ll give back all of the velocity gains they got from code generation — or they’re going to have to take bigger risks and deploy untested software.”

Warning Signals Your QA Strategy Is No Longer Fit for Purpose

For engineering and QA leaders, Faulkner offers two concrete signals that a current testing strategy has been outpaced by the AI-accelerated SDLC.

Q: What are the earliest warning signals that a team’s QA strategy isn’t built for AI-speed development?

Dan Faulkner: “It would be time pressure. Are they getting pressure from the business to move application releases out more quickly? And if they find that that’s worsening as AI codegen is speeding things up, then they’re certainly going to have to make adjustments — and most companies will not allow them to go and hire endlessly to fill that gap. The second issue, and this is something that the market is seeing, is a reduction in quality — more bugs escaping into the wild. And that’s where people are starting to see what we mean when we say the integrity of the application starts to break down. The quality that their customer base is used to degrades because there’s so much time pressure to get the application out and there’s not the capacity to deal with it.”

Practical First Steps for Teams Moving to Agentic Testing

Faulkner’s advice for teams evaluating a move to autonomous testing begins not with a vendor selection but with an honest audit of where they currently stand — and an acknowledgment that the world changed.

Q: What are the practical first steps for organizations that want to move toward agentic testing without disrupting delivery?

Dan Faulkner: “I would actually just do a survey, do an audit of where you are currently. We think about testing in a few different ways. There’s code-level testing — your unit tests and all of the stuff that essentially tests the code in isolation, tells you that a function does what it’s meant to do, tells you that the code is clean. And actually the AI codegen tools are pretty good at that. Next, you’ve got to think about your application — your end-to-end testing, your performance testing, your cross-browser testing, and real device testing if you’re deploying your application across different operating systems and different endpoints — and just be honest about what kind of coverage do we have today and who’s doing it. Are we using tools? Are we using people? And get an honest sense of your level of comfort and recognize that you built that system for a pre-Claude Code/Codex world, and you are now in a post-Claude Code and Codex world. So you will need a new strategy to keep pace and to maintain the integrity of your applications. And I would say you don’t have to throw away the work that you’ve done. You don’t have to get rid of the people that you have, but you do need to address that bottleneck — and that’s where we think agentic testing and autonomous testing will come into play.”

The Broader Picture: A Chaotic but Historic Moment

Faulkner closes with a candid view of the current state of AI adoption in software development — one that acknowledges the board-level pressure, the spending without strategy, and the fundamental imbalance in where AI has and has not yet arrived in the SDLC.

Q: How do you personally view this moment in AI-driven software development?

Dan Faulkner: “It’s chaos. People are rushing to adopt a technology in ways that in prior generations would’ve been viewed as cavalier. There’s a tremendous amount of top-down pressure from boards to just do something with AI. There are pieces of IT budget being preserved as long as it’s AI without really understanding what the outcomes of that will be. If we think about the SDLC as a set of discrete tasks — whether you’re doing agile or waterfall or anything in between — you’ve got to figure out what you want to build, specify it, code it, review it, test it, test the compiled application in all of those environments. You’ve got to make sure that it’s secure and it’s following privacy regulations, then it’s got to be deployed and monitored. Only one of those areas in the SDLC has had this massive boost in acceleration from AI, and that is the writing of the code itself. Every other step of the SDLC has to catch up, and there’s going to be a lot of change and churn and expensive mistakes and surprising successes. So the world is all learning together at the fastest rate of adoption I’ve ever seen in technology.”

[/expander_maker]

You may also like

Why AI Agents Fail in Production Without Trusted Telemetry | Shahar Azulay, groundcover | TFiR

By Monika Chauhan24 hours ago

Observability

Why OpenTelemetry Is Now the Foundation for AI and Cloud Observability | Chris Aniszczyk, CNCF | TFiR

By Monika Chauhan1 day ago

Cloud Native

How Self-Improving AI Works Without Human Intervention | Kunal Bhatia, Hexo Labs | TFiR

By Monika Chauhan2 days ago

AI Infrastructure

Why HA Health Checks Fail as Clusters Grow | Trey Isaac, SIOS Technology | TFiR

By Monika Chauhan2 days ago

Cloud Native

Why AI Agents Fail in Production and What the Meta Harness Actually Fixes | Amit Naik, CData | TFiR

By Monika Chauhan2 days ago

AI Infrastructure

85% of Domains Are Failing DNS Security Controls: Akamai’s Steve Winterfeld on the Hidden Threat | TFiR

By Monika Chauhan3 days ago