Release note·

    Multi-turn evaluation foundations

    Evaluation went from a single-shot decision to a real reasoning loop.

    Behind the scenes, evaluation moved from a single LLM call deciding pass/fail to a multi-turn agent that can fetch screenshots from specific steps, expand summarized history, look at step metadata, and decide when it has enough information to commit to a verdict. This is the architectural shift that made the structured verdicts (March) and the Issue Reporter foundation (March) possible.

    Your code ships daily. Can your testing keep up?

    QA.tech agents test your product autonomously, so moving fast never means shipping broken. See it run on your own app in a 45-minute demo.

    Get a demo