Computer Use Agents: The Frontier of Vision-Based Automation

Abstract

Computer Use Agents represent a paradigm shift in software interaction: AI models trained to operate interfaces visually, mimicking human interaction rather than relying on technical APIs. This presentation explores the underlying mechanics of these agents and demonstrates how to orchestrate them programmatically to execute common frontend testing tasks.

The session illustrates how agents — using Claude 3.5 Sonnet, Opus 4.6 and the Chrome-MCP — navigate complex graphical user interfaces, identify creative workarounds for missing functionalities, and document their findings. However, there are significant "visual reasoning gaps" where current models struggle, such as interpreting angular displays like gauges or complex spatial networks like street maps, which can quickly exceed context limits. Attendees will gain a realistic assessment of the performance and reliability of Computer Use Agents in modern automation workflows.

Slides from this presentation can be found here.


Speaker

Stefan Dirnstorfer

CTO @Thetaris GmbH, Architect of ThetaML & Thetaris’ Testing Application, 42 Years into Software

Stefan Dirnstorfer is CTO at testup.io, a UI testing service that uses images and screenshots for automated testing. With 42 years of software development experience—starting at age 8—he has both followed and helped shape numerous technological trends. His expertise spans software development, operations, data analysis, and interface design, with his current focus on automating visual tasks in test automation.

Read more
Find Stefan Dirnstorfer at:

Date

Wednesday Mar 18 / 11:45AM GMT ( 50 minutes )

Location

Mountbatten (6th Fl.)

Topics

GUI Agents computer vision Frontend Testing

Share

From the same track

Session GenAI

Architecting AI Driven Game Creation: From Research to Production Scale Systems

Wednesday Mar 18 / 02:45PM GMT

I spent the last two years as an architect of the next mobile gaming platform at meta, with the ambition that anyone can create, share, and play AI-generated mobile games. What I learned is that the hard part is no longer the generation — it's everything that happens after.

Speaker image - Danielle An

Danielle An

Principal Engineer / GenAI Architect @Meta, Ph.D. with 15 years of professional experience in film, MR and gaming

Session

Panel: Who Builds the Frontend Now? (And What Breaks When They Do)

Wednesday Mar 18 / 01:35PM GMT

This panel is for frontend and mobile engineers trying to make sense of what's actually happening versus what's hype. Practitioners from platform infrastructure, observability, serverless architecture, digital consultancy, and mobile development will share what they're seeing on the ground and debate what it means.

Speaker image - Luca Mezzalira

Luca Mezzalira

Principal Serverless Specialist Solutions Architect @AWS, Author of “Building Micro-Frontends”, International Speaker

Speaker image - James Hall

James Hall

Founder and Director @Parallax, Author of jsPDF

Speaker image - Danielle An

Danielle An

Principal Engineer / GenAI Architect @Meta, Ph.D. with 15 years of professional experience in film, MR and gaming

Speaker image - Ivan Zarea

Ivan Zarea

Director of Platform Engineering @Netlify

Session AI

Tools That Enable the Next 1B Developers

Wednesday Mar 18 / 10:35AM GMT

The developer population is growing by orders of magnitude. The new AI tooling is turning domain experts and operators into builders. Unfortunately, the tools and the platforms they are building on weren't designed for them.

Speaker image - Ivan Zarea

Ivan Zarea

Director of Platform Engineering @Netlify

Session

Running AI at the Edge: Running Real Workloads Directly in the Browser

Wednesday Mar 18 / 03:55PM GMT

Running AI today often means choosing between Anthropic or OpenAI, and accepting the cost and privacy concerns that come with shipping data to third-parties. What if there was another way?

Speaker image - James Hall

James Hall

Founder and Director @Parallax, Author of jsPDF