Computer Use Agents: The Frontier of Vision-Based Automation

Abstract

Computer Use Agents represent a paradigm shift in software interaction: AI models trained to operate interfaces visually, mimicking human interaction rather than relying on technical APIs. This presentation explores the underlying mechanics of these agents and demonstrates how to orchestrate them programmatically to execute common frontend testing tasks.

The session illustrates how agents — using Claude 3.5 Sonnet, Opus 4.6 and the Chrome-MCP — navigate complex graphical user interfaces, identify creative workarounds for missing functionalities, and document their findings. However, there are significant "visual reasoning gaps" where current models struggle, such as interpreting angular displays like gauges or complex spatial networks like street maps, which can quickly exceed context limits. Attendees will gain a realistic assessment of the performance and reliability of Computer Use Agents in modern automation workflows.


Speaker

Stefan Dirnstorfer

CTO @Thetaris GmbH, Architect of ThetaML & Thetaris’ Testing Application, 42 Years into Software

Stefan Dirnstorfer is CTO at testup.io, a UI testing service that uses images and screenshots for automated testing. With 42 years of software development experience—starting at age 8—he has both followed and helped shape numerous technological trends. His expertise spans software development, operations, data analysis, and interface design, with his current focus on automating visual tasks in test automation.

Read more
Find Stefan Dirnstorfer at:

From the same track

Session

Architecting AI Driven Game Creation: From Research to Production Scale Systems

Wednesday Mar 18 / 02:45PM GMT

The next generation of games hasn’t been invented yet, but we can be pretty sure their creation will be heavily democratized by generative tools. The hard part isn’t just building smart models—it’s turning research-grade AI into production-ready, high‑fidelity creation tools.

Speaker image - Danielle An

Danielle An

Principal Engineer / GenAI Architect @Meta, Ph.D. with 15 years of professional experience in film, MR and gaming

Session

Optimizing Performance with Smart Middleware and Edge Handlers

Wednesday Mar 18 / 01:35PM GMT

Details coming soon.

Speaker image - Julie Qiu

Julie Qiu

Uber Tech Lead, Google Cloud SDK @Google, Building Client Libraries and Command Line Tools Across Different Language Ecosystems

Session

How React Internals are Adapting to Fine-Grained Reactivity

Wednesday Mar 18 / 10:35AM GMT

Details coming soon.

Speaker image - Eduardo Bouças

Eduardo Bouças

Distinguished Software Engineer @Netlify

Session

The Frontend Architect’s Guide to Multi-modal UIs

Wednesday Mar 18 / 03:55PM GMT

Details coming soon.