Your Agent Sandbox Doesn't Know My Authz Model: A Standard-Shaped Hole

Abstract

Sandboxes are the first line of defence for agentic systems: restrict the bash commands, filter the URLs, lock down the filesystem. But sandboxes operate on the syntax of requests, not the semantics of your authorization model. They don't know whether GET /customer/id_1234 is in scope for the task at hand, and they can't tell the downstream system whether the request is coming from a human, a delegated agent, or a rogue process. The same applies to MCP tool calls — allowing read_file or create_issue tells you nothing about whether those actions make sense in context.

MCP has given the industry a common protocol for tool calls: and in doing so, made agent authorization problems impossible to ignore. As auth lead for MCP at Anthropic, these problems aren't theoretical for me. They come up in enterprise deployments, in working group discussions, and in the gap between what the spec says and what teams actually need to build. This talk isn't about MCP specifically: it's about the authorization primitives that any agentic system needs, and the standards work needed to provide them.

We'll examine four quirks that make agent authorization hard: granularity (scopes are too coarse for resource-level enforcement), attribution (is this the user, a delegated agent, or a service account?), information leakage (coarse tokens let agents see more than the task requires), and cardinality (consent flows don't scale to ephemeral agent sessions).

A key design decision sits at the heart of all of this: should your agent act as a delegated extension of the user, or as its own service account? The answer has deep implications for auditability, blast radius, and what standards you need.

Then we'll do a whistle-stop of the emerging standards landscape trying to close these gaps — Rich Authorization Requests and draft RAR metadata extensions, Cross-App Access (XAA) for frictionless token acquisition, and how patterns from SPIFFE and Workload Identity Federation might translate into agent identity. None of these fully solve the problem alone, but together they sketch the shape of what's missing.
 


Speaker

Paul Carleton

Member of Technical Staff @Anthropic, Core Maintainer of MCP

Paul Carleton is a Core Maintainer of the Model Context Protocol and Auth Nerd at Anthropic, where he leads auth implementations across Anthropic's clients and the TypeScript and Python SDKs and consults on agent identity. He drives MCP conformance testing to ensure consistent behavior across the ecosystem.

Read more

Date

Wednesday Mar 18 / 02:45PM GMT ( 50 minutes )

Location

Fleming (3rd Fl.)

Share

From the same track

Session agentic coding

The Right 300 Tokens Beat 100k Noisy Ones: The Architecture of Context Engineering

Wednesday Mar 18 / 10:35AM GMT

Your agent has 100k tokens of context. It still forgets what you told it two messages ago.

Speaker image - Patrick Debois

Patrick Debois

AI Product Engineer @Tessl, Co-Author of the "DevOps Handbook", Content Curator at AI Native Developer Community

Speaker image - Baruch Sadogursky

Baruch Sadogursky

DevRel Team and Context Engineering Management @Tessl AI, Co-Author of #LiquidSoftware and #DevOps Tools for #Java Developers, Java Champion, Microsoft MVP

Session

Explicit Semantics for AI Applications: Ontologies in Practice

Wednesday Mar 18 / 03:55PM GMT

Modern AI applications struggle not because of a lack of models, but because meaning is implicit, fragmented, and brittle. In this talk, we’ll explore how making semantics explicit (using ontologies and knowledge graphs) changes how we design, build, and operate AI systems.

Speaker image - Jesus Barrasa

Jesus Barrasa

Field CTO for AI @Neo4j

Session

Building an AI Ready Global Scale Data Platform

Wednesday Mar 18 / 01:35PM GMT

As organizations move from single-cloud setups to hybrid and multi-cloud strategies, they are under pressure to build data platforms that are both globally available and AI-ready.

Speaker image - George Peter Hantzaras

George Peter Hantzaras

Engineering Director, Core Platforms @MongoDB, Open Source Ambassador, Published Author

Session

Beyond Benchmarks: How Evaluations Ensure Safety at Scale in LLM Applications

Wednesday Mar 18 / 11:45AM GMT

As LLM systems move from prototypes to production, the gap between benchmark performance and real-world reliability becomes impossible to ignore. Models that score well on benchmarks can still fail unpredictably when facing the complexity, ambiguity, and edge cases of real users.

Speaker image - Clara Matos

Clara Matos

Director of Applied AI @Sword Health, Focused on Building and Scaling Machine Learning Systems