Our goal at QA.tech is clear – we want to push the boundaries of what’s possible with AI in software testing.

I sat down with Vilhelm von Ehrenheim, Chief AI Officer and Co-Founder of QA.tech, to discuss advancements in AI currently, how autonomous AI agents are reshaping testing of web applications and the challenges that come with building these systems.

AI’s Progress: What’s Working and What Isn’t

AI has made significant progress lately, but Vilhelm pointed out that the hype around AI breakthroughs is starting to level off.

“The developments are real, but they’re more incremental than people expected,” he said. While models like OpenAI’s o1 have improved in handling reasoning tasks, the leap many were expecting hasn’t fully materialized.

Multimodality in AI: A Growing but Challenging Frontier

One of the more interesting trends in AI right now is the push towards multimodality—building models that can handle different types of input like text, images, and video. While this has the potential to change how AI deals with complex tasks, progress hasn’t been as fast as some had hoped.

“We still see quite a bit of focus on multimodality,” noted Vilhelm von Ehrenheim, Chief AI Officer at QA.tech. “Especially with images and sound.” OpenAI’s GPT-4 models, for example, aim to integrate all modalities into a single model, enabling users to interact with the AI through text, images, or even video. “The idea is that you could talk to it or send it a picture or video, and it would understand and respond accordingly.”

But, as Vilhelm pointed out, building truly effective multimodal systems has hit some major roadblocks. “The capabilities of multimodality are still quite limited,” he explained. While the potential of these models is immense, the reality is more complex. “It seems to be a bigger challenge than many predicted,” Vilhelm added, acknowledging that integrating such varied data types into a cohesive model remains difficult.

Still, there have been some wins. “At the same time, the models are being integrated into more platforms,” Vilhelm said, noting that companies are starting to adopt multimodal AI solutions. Progress hasn’t been lightning-fast, but the continuous integration into business applications shows these models are becoming more useful in real-world use cases.

“Instead of requiring developers to write endless test scripts, our AI agents can run and adapt based on changes in the web app.”

For testing complex web apps, where user interactions often mix text, images, and interactive elements, the advances in multimodal AI could be a game-changer. As these models mature, their ability to simulate diverse user behaviors and interactions could significantly improve the coverage and accuracy of automated testing systems.

The Transition from Scripted to Autonomous Testing

Traditional QA testing methods have long relied on scripts to automate tests, but these scripts often come with a downside: they need constant maintenance and updates.

“With every feature change, you have to rewrite or modify the scripts, and it becomes a tedious cycle,” Vilhelm explained. QA.tech’s autonomous agents were developed to address this very issue.

“Instead of requiring developers to write endless test scripts, our AI agents can run and adapt based on changes in the web app.”

The platform uses these AI agents to simulate how a human might interact with a web application, finding issues that might be missed with script-based tests.

“Our agents don’t follow predefined scripts. They simulate real user interactions with your web app, running tests dynamically.” Vilhelm said, and added: “The beauty of this is that the AI learns from every test it runs, improving with each iteration.”

“We’ve focused a lot on making sure our agents are dependable. It’s about continuous improvement—each test cycle feeds back into the system, making the agents smarter and more accurate.”

QA.tech’s development team has invested heavily in training these agents, using data from repeated test cases to fine-tune their decision-making processes. “It’s not something that works perfectly out of the box, but we’re making progress,” Vilhelm said.

The platform focuses on testing web applications, where the complexity of user interactions can make it hard to catch every possible bug using traditional methods. By mimicking human behavior, the agents can identify issues that might otherwise be missed.

“Edge cases and unpredictable user behavior are where we really see the benefit. Things that slip through and are difficult to cover with traditional automated tests get caught with autonomous agents,” Vilhelm added.

The Challenges of Multi-Agent Systems

One area that remains complex is deploying multi-agent systems effectively in production environments.

“In theory, you’d want different agents collaborating—one for test execution, one for test planning, another for reporting—but getting them to sync reliably in a live environment has been tough,” Vilhelm noted.

This concept mirrors how a development team operates, with each member handling specific tasks. However, making AI agents perform at this level in real-world testing requires considerable refinement. “We’ve focused a lot on making sure our agents are dependable. It’s about continuous improvement—each test cycle feeds back into the system, making the agents smarter and more accurate,” he explained.

“The real challenge in AI right now is not just about improving the models themselves but structuring data in a way that makes it searchable and useful for the model to retrieve when needed.”

To improve the accuracy and utility of these AI agents, there’s been a growing interest in methods that optimize how they retrieve and use information, especially as they handle increasingly complex tasks.

This is where Retrieval-Augmented Generation (RAG) comes into play. RAG is a method to improve fact-checking in AI models by providing them with contextual information. The challenge lies in structuring data so that it becomes searchable and useful for AI models. By incorporating RAG, AI systems can access relevant data in real-time, improving their ability to deliver accurate results.

“There’s been a lot of focus on this lately,” Vilhelm explained. “Instead of training models on vast amounts of data, RAG enables the model to retrieve relevant information from specific documents in real-time. This means you’re feeding the model the information it needs in the moment, making the context more relevant.”

This approach is becoming particularly important as AI models handle larger and more complex tasks, such as QA testing, where accuracy is critical.

“The real challenge in AI right now is not just about improving the models themselves but structuring data in a way that makes it searchable and useful for the model to retrieve when needed,” Vilhelm added.

By organizing internal documents well, companies can make their data more accessible for AI to use in real-time, which helps create more reliable systems.

“It’s similar to tools like Perplexity that create search engines behind the scenes, indexing and retrieving information that the model needs to generate accurate answers. We’re seeing more companies build systems around RAG to make their AI more practical and scalable,” Vilhelm said.

Why Multi-Agent Systems Aren’t Fully There Yet

Building effective multi-agent systems is still one of the more complex areas in AI development. When asked about the state of these systems, Vilhelm gets straight to the point:

“It’s promising but still unreliable in real-world production environments.”

The concept is similar to how a team of developers works, with each agent handling a specific part of the testing process—whether it’s planning, execution, or reporting.

“Developers can now do more with less effort, but this also means that junior developers may find it harder to get their foot in the door.”

“You’d think having multiple agents collaborating would solve a lot of problems, but getting them to work together consistently is a huge challenge,” Vilhelm said.

How AI is Changing How Developers Work

As AI becomes a bigger part of the development process, it’s shifting how developers approach their work. Tools like GitHub Copilot, which suggests code as you write, are already reducing the manual workload for repetitive coding tasks, although they do not replace deeper architectural and problem-solving skills. AI agents are doing the same for testing. According to Vilhelm, this is just the beginning.

“AI is taking over more of the grunt work, letting developers focus on more strategic tasks,” Vilhelm explained. “Developers want to focus on building, not testing. AI is here to take over the repetitive, tedious tasks that bog them down.”

However, while shift offers many advantages, it’s not without challenges.

“Developers can now do more with less effort, but this also means that junior developers may find it harder to get their foot in the door. It’s easier for a senior developer to handle these kinds of tasks because they have the architectural knowledge and know how to piece things together. You can manage with one experienced person rather than needing multiple juniors. This could lead to a growing skills gap.” Vilhelm observed.

Apart from freeing up time for more creative work, AI-driven QA is improving code quality.

“With AI-driven QA, developers get immediate feedback. Our agents run continuously, flagging potential issues early on in the development process,” Vilhelm added. This proactive approach helps prevent bugs from reaching production, leading to more stable and reliable products.

The road ahead for QA.tech

At QA.tech, the team has spent considerable time refining their agents.

“We’re getting there, but there’s a lot of iteration involved. What we’re doing is essentially training these agents by feeding them data from repetitive test cases,” Vilhelm explained. This continuous learning process helps improve the accuracy and reliability of the AI over time.

QA.tech’s platform has been publicly available since summer, and Vilhelm is optimistic about what’s coming next.

“We’re gearing up for a new release in October, and it’s going to focus on improving our agent functionality even further, making them more adaptable to different types of web applications,” he said.

With several customers already onboard and using the platform for their web applications, QA.tech is focusing on scaling and refining its AI-driven testing solutions.

“The feedback has been great so far. We’re learning a lot from how real teams are using the product, and that’s driving our next set of improvements,” Vilhelm noted.

It’s clear that autonomous systems are set to play a key role in the future of software testing. For developers, CTOs, and engineering teams, the integration of AI into QA workflows offers a powerful way to reduce manual effort, increase test coverage, and accelerate the development process.

“The future of QA is autonomous. We’re just getting started, but we’re excited about where this technology is headed,” Vilhelm concluded.