Can Claude Fix Itself? Using LLMs for Incident Response

Abstract

Details coming soon.


Speaker

Alex Palcuie

Member of Technical Staff in AI Reliability Engineering @Anthropic, Previously Staff Site Reliability Engineer on Google Cloud Platform

Alex Palcuie is a Member of Technical Staff in AI Reliability Engineering at Anthropic, where he works on keeping Claude reliable at scale. He has the unenviable task of having to fix Claude without Claude when it goes down. Previously, he was a Staff Site Reliability Engineer on Google Cloud Platform (GCP) and a member of Google's Tech IRT (Incident Response Team), handling large-scale infrastructure incidents including the kind where datacentres flood.

Read more

From the same track

Session

Sociotechnical Practices for Debugging

Tuesday Mar 17 / 10:35AM GMT

Details coming soon.

Speaker image - Hazel Weakly

Hazel Weakly

Fellow @Nivenly Foundation; Director, Haskell Foundation; Experienced Leader Focusing on Organizational Change, Developer Experience, and Resilience Engineering

Session Distributed Tracing

Observability in Microservices & Gaming

Tuesday Mar 17 / 11:45AM GMT

Details coming soon.

Speaker image - Nicholas Herring

Nicholas Herring

Technical Director, Eve Online @CCP Games, Refiner of Internet Spaceships and Explorer of Feral Gordian Knots of Python

Session

Debugging Transient Failures in Serverless Workflows

Tuesday Mar 17 / 03:55PM GMT

Details coming soon.

Speaker image - Colin Douch

Colin Douch

Site Reliability Engineer @DuckDuckGo

Session

Real-Time Observability for Cross-Border Payment Rails

Tuesday Mar 17 / 02:45PM GMT

Details coming soon.

Session

Unconference: Debugging Distributed Systems

Tuesday Mar 17 / 01:35PM GMT