Abstract
Computer Use Agents represent a paradigm shift in software interaction: AI models trained to operate interfaces visually, mimicking human interaction rather than relying on technical APIs. This presentation explores the underlying mechanics of these agents and demonstrates how to orchestrate them programmatically to execute common frontend testing tasks.
The session illustrates how agents — using Claude 3.5 Sonnet, Opus 4.6 and the Chrome-MCP — navigate complex graphical user interfaces, identify creative workarounds for missing functionalities, and document their findings. However, there are significant "visual reasoning gaps" where current models struggle, such as interpreting angular displays like gauges or complex spatial networks like street maps, which can quickly exceed context limits. Attendees will gain a realistic assessment of the performance and reliability of Computer Use Agents in modern automation workflows.
Speaker
Stefan Dirnstorfer
CTO @Thetaris GmbH, Architect of ThetaML & Thetaris’ Testing Application, 42 Years into Software
Stefan Dirnstorfer is CTO at testup.io, a UI testing service that uses images and screenshots for automated testing. With 42 years of software development experience—starting at age 8—he has both followed and helped shape numerous technological trends. His expertise spans software development, operations, data analysis, and interface design, with his current focus on automating visual tasks in test automation.