Release note·

    Multi-turn evaluation foundations

    Evaluation went from a single-shot decision to a real reasoning loop.

    Behind the scenes, evaluation moved from a single LLM call deciding pass/fail to a multi-turn agent that can fetch screenshots from specific steps, expand summarized history, look at step metadata, and decide when it has enough information to commit to a verdict. This is the architectural shift that made the structured verdicts (March) and the Issue Reporter foundation (March) possible.

    Ready to end the QA bottleneck?

    See how QA.tech agents test your product in a 30-minute demo – and leave with a plan to reclaim those hours.

    Get a demo