Agentic Testing — AI Agents That Write Tests, Find Bugs and Self-Heal
Posted on: 5/14/2026 10:17:26 AM
Table of contents
- 1. What Is Agentic Testing?
- 2. Three Generations of Testing — From Scripts to Agents
- 3. Agentic QA Architecture
- 4. Playwright Test Agents — The Planner, Generator, Healer Trio
- 5. MCP — Connecting LLMs to Testing Infrastructure
- 6. Self-Healing Tests — How It Works Under the Hood
- 7. Risk-Based Test Prioritization — Test Smarter, Not More
- 8. Agentic Testing Platforms Compared (2026)
- 9. CI/CD Pipeline Integration
- 10. Challenges and Limitations
- 11. The Future of QA — From Testing to Continuous Assurance
- 12. Conclusion
How many test cases run in your CI/CD pipeline every day? 500? 2,000? And how many of those are flaky — failing today, passing yesterday, nobody knows why? In 2026, a revolution is underway in Quality Assurance: Agentic Testing — where AI Agents don't just run tests, but write tests autonomously, find bugs, and self-heal when the UI changes. Forrester has officially renamed its category from "Continuous Automation Testing" to "Autonomous Testing Platforms", and over $1.5 billion has poured into AI testing startups in the first half of 2026 alone.
1. What Is Agentic Testing?
Agentic Testing is a software testing paradigm where autonomous AI Agents handle the entire testing lifecycle — from reading user stories, generating test cases, executing tests, analyzing results, to self-healing tests when the UI changes — with minimal human intervention.
Unlike traditional test automation where humans write scripts and machines execute them, Agentic Testing leverages LLMs (Large Language Models) combined with reasoning loops to understand the intent of a test rather than just following hardcoded selectors.
The Core Difference
Traditional automation: Human writes page.click('#submit-btn') → machine clicks → if id changes to #btn-submit → test fails → human fixes.
Agentic Testing: Agent understands "click the Submit button" → uses the accessibility tree to find the right element → if UI changes → agent re-discovers the new element → test still passes.
2. Three Generations of Testing — From Scripts to Agents
| Generation | Approach | Characteristics | Representatives |
|---|---|---|---|
| Gen 1: Manual + Scripted | Humans write test scripts manually | Hardcoded selectors, brittle, high maintenance cost | Selenium, Cypress |
| Gen 2: AI-Assisted | Traditional scripts + AI features | Self-healing locators, AI suggestions, still script-based | Testim, Mabl, Katalon |
| Gen 3: AI-Native (Agentic) | Agents write, run, and fix tests | No scripts, no selectors — agents navigate by intent | Momentic, Playwright Agents, QA.tech |
3. Agentic QA Architecture
A complete Agentic QA system consists of 4 main layers, operating in a Plan → Act → Verify loop.
graph TB
subgraph Product["Product Layer"]
US["User Stories / Acceptance Criteria"]
end
subgraph Agent["Agentic Layer"]
P["Planner Agent"]
G["Generator Agent"]
H["Healer Agent"]
end
subgraph Manage["Management Layer"]
TC["Test Cases Store"]
R["Results + Reports"]
end
subgraph Execute["Execution Layer"]
PW["Playwright / Selenium"]
CI["CI/CD Pipeline"]
end
US --> P
P -->|"Plan"| G
G -->|"Generate Tests"| TC
TC --> PW
PW -->|"Run"| R
R -->|"Failure"| H
H -->|"Self-Heal"| PW
style US fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style P fill:#e94560,stroke:#fff,color:#fff
style G fill:#e94560,stroke:#fff,color:#fff
style H fill:#e94560,stroke:#fff,color:#fff
style TC fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50
style R fill:#f8f9fa,stroke:#2c3e50,color:#2c3e50
style PW fill:#4CAF50,stroke:#fff,color:#fff
style CI fill:#4CAF50,stroke:#fff,color:#fff
Figure 1: 4-Layer Architecture of an Agentic QA System — From User Story to Self-Healing
3.1. The Plan - Act - Verify Loop
The heart of Agentic Testing is a continuous reasoning loop:
graph LR
A["Plan: Read story, create test plan"] --> B["Act: Generate test code, execute on browser"]
B --> C["Verify: Compare results with expected"]
C -->|"Pass"| D["Report: Log results"]
C -->|"Fail"| E["Analyze: Root cause analysis"]
E -->|"UI changed"| F["Heal: Update locators"]
E -->|"Real bug"| G["Alert: Create bug report"]
F --> B
style A fill:#e94560,stroke:#fff,color:#fff
style B fill:#2c3e50,stroke:#fff,color:#fff
style C fill:#4CAF50,stroke:#fff,color:#fff
style D fill:#f8f9fa,stroke:#4CAF50,color:#2c3e50
style E fill:#ff9800,stroke:#fff,color:#fff
style F fill:#e94560,stroke:#fff,color:#fff
style G fill:#ff9800,stroke:#fff,color:#fff
Figure 2: Plan-Act-Verify Loop — Agent distinguishes between UI changes and real bugs
The critical step is Analyze: the agent must distinguish between "UI changed but logic is correct" (needs healing) and "logic is wrong, this is a real bug" (needs alerting). This is where LLM reasoning shines — the agent reads the DOM diff, compares it with the original acceptance criteria, and makes a judgment call.
4. Playwright Test Agents — The Planner, Generator, Healer Trio
Playwright — the most popular testing framework from Microsoft — now ships with 3 specialized AI Agents, forming a complete agentic testing pipeline.
4.1. Planner Agent
The Planner Agent explores your application and automatically generates test plans for one or more user flows.
import { planTests } from '@playwright/test/ai';
const testPlan = await planTests({
baseURL: 'https://myapp.com',
scenarios: [
'User logs in successfully',
'User adds product to cart',
'User completes checkout with credit card'
]
});
// Output: detailed test steps
// [
// { action: 'navigate', url: '/login' },
// { action: 'fill', selector: '[name="email"]', value: 'test@example.com' },
// { action: 'click', selector: 'button:has-text("Sign In")' },
// { action: 'assert', condition: 'url contains /dashboard' }
// ]
4.2. Generator Agent
The Generator takes the test plan from the Planner and produces complete, runnable Playwright test code, including assertions, error handling, and test data.
import { generateTests } from '@playwright/test/ai';
const testCode = await generateTests({
plan: testPlan,
framework: 'playwright',
language: 'typescript',
includeAssertions: true
});
// Output: runnable test file
// test('User logs in successfully', async ({ page }) => {
// await page.goto('/login');
// await page.getByRole('textbox', { name: 'Email' }).fill('test@example.com');
// await page.getByRole('textbox', { name: 'Password' }).fill('secure123');
// await page.getByRole('button', { name: 'Sign In' }).click();
// await expect(page).toHaveURL(/dashboard/);
// });
Why Role-Based Locators?
Notice the agent generates code using getByRole() instead of hardcoded CSS selectors like #login-btn. This is a core strategy — role-based locators hook into the accessibility tree (semantic structure), making them far more resilient when developers rename classes or restructure the DOM.
4.3. Healer Agent — Self-Healing Tests on UI Change
The Healer is the most critical agent in the pipeline. When a test fails, Healer:
- Captures an accessibility tree snapshot at the point of failure
- Analyzes the cause: broken selector? Removed element? Changed flow?
- Generates a corrected interaction using new role-based locators
- Re-runs the test automatically
import { test } from '@playwright/test';
import { healOnFailure } from '@playwright/test/ai';
test('Checkout flow', async ({ page }) => {
await healOnFailure(async () => {
await page.goto('/cart');
// If "Checkout" button was renamed to "Proceed to Payment"
// Healer auto-detects and fixes
await page.getByRole('button', { name: 'Checkout' }).click();
await page.getByRole('textbox', { name: 'Card Number' }).fill('4242...');
await page.getByRole('button', { name: 'Pay Now' }).click();
});
});
According to Microsoft's published benchmarks, the Healer Agent achieves a success rate of over 75% on selector-related failures — meaning 3 out of 4 tests broken by UI changes are self-healed without any developer intervention.
5. MCP — Connecting LLMs to Testing Infrastructure
Model Context Protocol (MCP) plays a crucial role in modern Agentic QA architecture, serving as the middleware connecting LLMs with the entire testing infrastructure.
graph LR
LLM["LLM (Claude, GPT)"] -->|"JSON-RPC"| MCP["MCP Server"]
MCP --> PW["Playwright APIs"]
MCP --> TD["Test Data Store"]
MCP --> CI["CI/CD Pipeline"]
MCP --> DT["Defect Tracker"]
style LLM fill:#e94560,stroke:#fff,color:#fff
style MCP fill:#2c3e50,stroke:#fff,color:#fff
style PW fill:#4CAF50,stroke:#fff,color:#fff
style TD fill:#4CAF50,stroke:#fff,color:#fff
style CI fill:#4CAF50,stroke:#fff,color:#fff
style DT fill:#4CAF50,stroke:#fff,color:#fff
Figure 3: MCP Server as the orchestration layer connecting LLMs to testing tools
Microsoft has officially released the Playwright MCP Server, enabling AI Agents to control browsers through a standardized protocol. This means any LLM supporting MCP — Claude, GPT, Gemini — can become a test agent without custom integration.
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"],
"env": {
"PLAYWRIGHT_HEADLESS": "true"
}
}
}
}
Practical Benefits of MCP in Testing
Portable: A test agent suite can switch between LLMs without rewriting code.
Stateful: MCP supports persistent state, ideal for exploratory testing and long-running sessions.
Composable: Combine Playwright MCP with other MCP servers (database, API, file system) to create complex test scenarios.
6. Self-Healing Tests — How It Works Under the Hood
Self-healing is where Agentic Testing truly outshines traditional test automation. Here's the detailed flow when a test fails:
graph TB
A["Test Run Fails"] --> B["Capture Accessibility Tree"]
B --> C["Compare with previous snapshot"]
C --> D{"Failure type?"}
D -->|"Selector not found"| E["Find equivalent element"]
D -->|"Assertion fail"| F["Compare actual vs expected"]
D -->|"Timeout"| G["Check network, loading state"]
E --> H["Generate new locator (role-based)"]
H --> I["Retry test"]
F --> J{"Logic changed?"}
J -->|"Yes"| K["Create Bug Report"]
J -->|"No"| H
G --> L["Adjust wait strategy"]
L --> I
I --> M{"Passes?"}
M -->|"Yes"| N["Update test + commit"]
M -->|"No"| K
style A fill:#ff9800,stroke:#fff,color:#fff
style D fill:#2c3e50,stroke:#fff,color:#fff
style E fill:#e94560,stroke:#fff,color:#fff
style H fill:#e94560,stroke:#fff,color:#fff
style K fill:#ff9800,stroke:#fff,color:#fff
style N fill:#4CAF50,stroke:#fff,color:#fff
style I fill:#4CAF50,stroke:#fff,color:#fff
Figure 4: Self-Healing Flow — from failure detection to auto-fix or bug report
6.1. Accessibility Tree — The Key to Self-Healing
Instead of relying on CSS selectors or XPath (which break when the DOM changes), Agentic Testing uses the accessibility tree — a semantic structure browsers build for screen readers. Each element is described by:
- Role: button, textbox, link, heading...
- Name: visible text or aria-label
- State: enabled, disabled, checked, expanded...
When a developer changes <button id="submit">Submit</button> to <button class="btn-primary" data-action="submit">Submit</button>, the CSS selector #submit breaks immediately, but the accessibility tree still sees button[name="Submit"] — the agent locates the element precisely.
7. Risk-Based Test Prioritization — Test Smarter, Not More
Agentic Testing doesn't just write and fix tests — it also decides what to test first. Instead of running all 5,000 test cases on every commit, the agent analyzes:
- Code diff: Which files changed? Which modules are affected?
- Historical failure rate: Which tests fail most frequently?
- Business impact: Which flows matter most (checkout, payment, auth)?
- Dependency graph: Which modules depend on the changed code?
Result: up to 40% reduction in test execution time in CI/CD while maintaining coverage for high-risk areas.
Don't Skip Low-Risk Entirely
Risk-based prioritization doesn't mean dropping low-risk tests entirely. Best practice: run high-risk tests on every commit, full regression nightly or before release. The agent can auto-adjust schedules based on team velocity and deadlines.
8. Agentic Testing Platforms Compared (2026)
| Platform | Generation | Key Strengths | Best For |
|---|---|---|---|
| Playwright Agents | Gen 3 (Open-source) | Built-in Planner/Generator/Healer, free, large community | Teams wanting control and customization |
| Momentic | Gen 3 (AI-native) | Zero-script, natural language authoring, E2E + visual + API | Startups, fast-shipping teams |
| Katalon | Gen 2 - 3 (All-in-one) | Web + mobile + API + desktop, mature ecosystem, AI layer | Enterprise, multi-platform portfolios |
| QA.tech | Gen 3 | Autonomous agent explores app and finds bugs without test cases | Exploratory testing, MVPs |
| Testim (Tricentis) | Gen 2 | Smart locators, visual testing, Tricentis ecosystem | Enterprises already using Tricentis |
9. CI/CD Pipeline Integration
Agentic Testing reaches its full potential when integrated directly into CI/CD pipelines, creating a fully automated feedback loop.
# .github/workflows/agentic-test.yml
name: Agentic QA Pipeline
on:
pull_request:
branches: [main]
jobs:
agentic-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Start application
run: docker compose up -d
- name: Run Planner Agent
run: |
npx playwright test --ai-plan \
--scenarios="checkout,login,search" \
--output=test-plan.json
- name: Run Generator Agent
run: |
npx playwright test --ai-generate \
--plan=test-plan.json \
--output=tests/generated/
- name: Execute with Healer
run: |
npx playwright test tests/generated/ \
--ai-heal \
--reporter=json \
--output=results.json
- name: Upload Results
if: always()
uses: actions/upload-artifact@v4
with:
name: test-results
path: results.json
Real-World Pipeline Flow
PR opened → Planner reads changed files, generates test plan for affected modules → Generator produces test code → Executor runs tests on headless browser → on failure, Healer self-fixes and retries → results reported as PR comment. Developers only intervene when Healer can't fix it (real bug).
10. Challenges and Limitations
Agentic Testing is powerful but not a silver bullet. Key challenges to keep in mind:
10.1. LLM Reliability
LLMs can hallucinate — generating tests with incorrect logic, wrong assertions, or missed edge cases. Agentic Testing is a collaborator, not a complete replacement for QA engineers. Humans still need to review test plans and validate critical assertions.
10.2. Token Costs
Every plan/generate/heal cycle consumes LLM tokens. A large test suite can cost hundreds of dollars per month in API fees. Cost strategy: use lightweight models (Haiku, GPT-4o-mini) for daily heal/generate, powerful models (Opus, GPT-4) for planning and review.
10.3. Security
Agents need access to your application (credentials, test data, staging environment). You must ensure:
- Tests run on isolated environments (staging/)
- Credentials managed via secret managers, never hardcoded
- Agents have no access to real production data
Common Implementation Mistakes
1. Over-trusting AI: Skipping test plan review → agent tests wrong flow → false confidence.
2. Dropping manual testing entirely: Human exploratory testing still catches bugs that automation never thinks of.
3. No timeout for healing: Agent tries to heal indefinitely → CI pipeline stuck for hours.
11. The Future of QA — From Testing to Continuous Assurance
12. Conclusion
Agentic Testing marks a turning point in QA history — from "humans write tests, machines run tests" to "AI agents autonomously handle the entire testing lifecycle." With open-source Playwright Test Agents, MCP standardizing connectivity, and AI-native platforms like Momentic already production-ready, the barrier to entry has never been lower.
However, the most important takeaway: Agentic Testing accelerates QA, it doesn't replace QA engineers. The QA role is shifting from "test writer" to "AI agent supervisor" — setting strategy, reviewing results, and intervening when needed. It's evolution, not replacement.
References:
Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.