I remember the first time I actually understood the testing pyramid. It was during a release, when our E2E suite took approximately 40 minutes and failed halfway through. Nobody trusted the results anyway: we had numerous unit tests, multiple integration tests, and a small number of flaky Selenium flows that broke after a few sprints of tweaks and changes. At that point, I truly got the pyramid. In fact, the whole experience was the pyramid working exactly as intended, steering us away from a layer that was too costly to rely on.
However, if you are working with modern-day AI testing solutions, you may have noticed that some of the old assumptions behind the testing pyramid don’t hold up anymore. It seems like rethinking it is long overdue.
The Classic Pyramid
The testing pyramid has been one of the most widely accepted models in QA.

At the bottom, you have unit tests that are fast, independent, and cheap. In the middle, there is integration testing, which determines whether parts of the system interact properly with each other. Finally, at the top, you have end-to-end (E2E) testing, which validates how an entire application behaves. However, this type of testing is the most expensive and should be done as infrequently as possible.
So, the core idea behind it is simple: don’t rely on E2E tests very much, because they’re costly and fragile. And that advice used to make complete sense.
In his piece on the testing trophy, Kent C. Dodds shifted some of the focus from unit to integration tests and suggested that your tests should reflect how people actually use your application. It’s a neat way to look at things; however, both models still have E2E at the top, as an elite layer of tests used sparingly because of their cost. Now, AI is challenging this assumption.
At their core, both paradigms were built around the same reality: writing an E2E test used to mean grinding through a long Selenium script, hunting for selectors that constantly broke, watching your tests fail after every little UI change, and spending more time debugging the test than actually writing it. Given all that, it actually made sense to minimize E2E: it was seen as a cost-benefit decision. Rethinking the testing pyramid only became sensible after the cost structure changed.
Why Agentic AI Changes the Economics
Most testing pyramid AI discussions focus on what these agents can do. I, on the other hand, would like to focus on what they change economically, specifically the “cost per test,” which includes the cost of writing, maintaining, and running tests.
AI testing tools like QA.tech change those cost ratios in a fundamental way.
Faster Test Creation
Traditional E2Es test used to take hours just to write the scripts. With agentic AI testing, you don’t write the script at all. Rather, you describe the test intent (goal) using the natural language prompt, and the agent handles the rest.
The test creation process that used to take hours or days can now be done in minutes, which removes one of the biggest barriers to writing the test in the first place.
Lower Maintenance Overhead
Maintenance effort drops significantly with agentic systems. Traditional E2E tests fail due to changes in the UI, timing issues, or missing selectors, whereas AI agents adapt to the new UI updates. As a result, fewer tests fail over time for reasons unrelated to actual bugs. This saves a lot more time than you would expect and eliminates a huge cause of frustration.
Quick Execution
Execution time also improves. AI agents can now run tests in parallel across browsers and environments, without manual infrastructure setup. This is something that scripted models rarely managed in practice.
Less Flakiness
Traditional E2E tests often failed simply because they couldn’t handle real-world conditions, such as waiting dynamically for an element to be present, handling popups, or adapting to layout changes. With agentic AI, these flaky failures are much less common, and teams begin to trust their E2E suites again.
So, if your E2E tests are now:
-
Faster to write
-
Easier to maintain
-
More stable to run
… then the original advice to “keep them to a minimum,” starts to lose weight, not because it was wrong, but because the constraints it was based on have changed.
The Agentic Testing Pyramid
And as cost constraints change, the pyramid stops looking like a pyramid. With agentic AI, the structure starts to resemble a diamond, or maybe even a pentagon.

Unit Tests (Foundational)
Unit tests remain at the base, as they are fast, deterministic, and written by developers to catch logic errors. Naturally, the value of a unit test, which pinpoints exactly what a function does in dynamic conditions, cannot be replaced by any AI agent, which is why this layer stays put.
Integration and Contract Tests (Unchanged)
Integration and contract tests remain critical, sitting just above unit tests. This aligns well with the testing trophy approach, which emphasizes writing tests that mirror how the software is actually used.
The boundaries of an API, the existence of service contracts, and the shape of the data being passed have always required dedicated testing. Agentic AI tools do not replace integration tests, but unlike before, this layer no longer has to give up resources to fund E2E testing.
Agentic E2E Tests (Growing Layer)
This is where we see the biggest shift: agentic E2E tests grow as a layer and can’t be ignored. The shape of this layer expands to cover broad user flows; that is, common test cases across the industry that AI agents maintain automatically. You can test login, checkout, onboarding, and edge flows without building a fragile suite that constantly breaks.
Exploratory Agent Tests (New Layer)
I call this layer the new layer because it didn’t exist before (or, at least, wasn’t part of the traditional testing pyramid).
The exploratory agent is a part of the agentic testing pyramid that can take a more thorough approach, trying paths that you didn’t explicitly script in the traditionally set up testing environment.
Airpelago faced similar challenges with their Playwright scripts, especially around complex map-based interactions: things like planning drone inspection routes or exporting reports. After transitioning to QA.tech, all 20 of their primary user journeys are being handled by agents, including interactive map widgets that manual scripts couldn’t reliably manage. This is a level of coverage that no one has scripted before. Not only are the tests faster to run, but they’ve also unlocked testing scenarios the team didn’t even consider possible.
The pyramid was shaped by limitations. If we take away the expenses, the optimal shape evolves on its own.
What This Means for Your Team
The test pyramid 2026 that teams are now working with doesn’t look like Fowler’s original, and it shouldn’t. If you’re a QA lead or a dev, your job is to move away from scripting toward writing intent.
Now, your focus should be on coverage, and instead of asking, “Do we really need another E2E test?” you should be asking, “What user behavior hasn’t been validated yet?” You’re the architect, and agents handle the build.
Unit tests will still play an important role in catching small logic issues, and integration tests continue to protect system boundaries. However, E2E tests are cheaper now, so teams write more of those. This means broader coverage and less risk of missing parts of the flow in production because these flows are now fully covered by E2E tests.
Wrap-up
The testing pyramid was not wrong, but it was designed for a time when E2E tests were time-consuming, fragile, and costly to keep running.
With the idea of the agentic testing pyramid, these costs have changed significantly. Since it is now simpler to create and maintain E2E tests, there is less reason to limit their use. This results in more comprehensive and behavior-driven coverage rather than gaps and disorder.
Unit and integration tests remain essential, but now, it’s possible to focus on what truly matters: the real user experiences.
If you’d like to explore this further, check out What is Agentic Testing? and Designing Tests for Intent, Not Selectors. These are some of the clearest examples of how testing moves from maintaining scripts to defining intent.
And if you want to see what all this looks like in a real user flow, try QA.tech for free.
