How Eve Online Leverages Head Based Sampling to Observe "Fun"

Summary

Disclaimer: This summary has been generated by AI. It is experimental, and feedback is welcomed. Please reach out to info@qconlondon.com with any comments or concerns.

Nicholas Herring discusses the incorporation of observability into the longstanding game infrastructure of Eve Online. The presentation covers various aspects of maintaining and enhancing a 20-year-old game system while introducing modern elements.

Key Highlights:

  • Observability in Legacy Systems: Nicholas emphasizes how Eve Online integrated head-based sampling into its architecture, bypassing tail-based sampling to reduce complexity and avoid additional maintenance burdens.

  • Dynamic vs. Deterministic Sampling: The game uses a blend of dynamic and deterministic sampling. Challenges associated with Python’s limitations on processing capabilities influenced the shift towards head-based sampling.

  • Player Expression and System Performance: The presentation highlights Eve's unique player-driven experiences like creating missions and showcasing ship skins, which contribute to player expression in the virtual environment.

  • Monolithic to Modern Ecosystem Transition: Nicholas discusses wrapping legacy systems and new technologies under an umbrella termed 'Quasar.' This includes using contemporary tech stacks like Kubernetes and gRPC.

  • Focus on Player Interactions: The game tracks significant in-game events, such as battles involving numerous costly Titans, to balance operational costs against experience fidelity.

The presentation concludes with a discussion of the strategies and technological choices that enable Eve Online to balance performance and cost while fostering a vibrant player-driven universe.

This is the end of the AI-generated content.


Abstract

A unique pattern in video game software is real-time interactions to express the personality of users.

Here we will talk about how we instrument the universe of New Eden to identify the traffic that matters, even the "fun" parts!

We'll explore how we achieve performant observability in a 20 year old legacy system running along side modern technologies:

  • 100% Head base sampling ecosystem
  • Blend of dynamic and deterministic sampling techniques
  • The blessing and curse that is the Exponential Moving Average (EMA)
  • Observing "fun" without breaking the bank

Speaker

Nicholas Herring

Technical Director, Eve Online @CCP Games, Refiner of Internet Spaceships and Explorer of Feral Gordian Knots of Python

With over twenty years combined across military and commercial application of video game technologies, Nicholas has built patented distributed system technologies which bridge military tactical networks and real-time game engine simulation networks. In addition to running multiple research and development projects surrounding real-time human bioinformatics, he has also delivered multiple game titles running at cloud scale. Currently applying this experience to modernizing imaginary spaceships for Eve Online.

Read more
Find Nicholas Herring at:

Date

Tuesday Mar 17 / 11:45AM GMT ( 50 minutes )

Location

Whittle (3rd Fl.)

Topics

Distributed Tracing sampling legacy systems video games real-time systems

Share

From the same track

Session Sociotechnical Leadership

Orienting, Understanding, Playing, Thriving: Debugging your Organisation

Tuesday Mar 17 / 10:35AM GMT

Debugging is both an art and a science. But more than that, it's an activity undertaken with deep intention: to understand and improve your systems. In the purely technical realm, we have an extraordinary range of tooling and techniques that can help us tackle this problem.

Speaker image - Hazel Weakly

Hazel Weakly

Fellow @Nivenly Foundation; Director, Haskell Foundation; Experienced Leader Focusing on Organizational Change, Developer Experience, and Resilience Engineering

Session

Can Claude Fix Itself? Using LLMs for Incident Response

Tuesday Mar 17 / 02:45PM GMT

Can you throw an LLM at a production incident and expect useful results? A candid look from someone who runs a distributed AI system and reaches for Claude before reaching for a dashboard. Surprises, failures, and why the answer matters for every engineer carrying a pager.

Speaker image - Alex Palcuie

Alex Palcuie

Member of Technical Staff in AI Reliability Engineering @Anthropic, Previously Staff Site Reliability Engineer on Google Cloud Platform

Session Observability

Wrangling Telemetry at Scale: A Guide to Self-Hosted Observability

Tuesday Mar 17 / 03:55PM GMT

Observability is supposed to help you tame complexity, but your Observability stack can quickly become just as complex as the systems it's meant to watch. For most teams, the answer is to pay someone else to deal with it.

Speaker image - Colin Douch

Colin Douch

Site Reliability Engineer @DuckDuckGo

Session Observability

Are We All on the Same Page? Let’s Fix That - With AI Assistance

Tuesday Mar 17 / 05:05PM GMT

In distributed systems, incidents rarely fail because of missing signals - they fail because the right people aren’t mobilised quickly enough, and teams struggle to build a shared understanding under pressure.

Speaker image - Luis Mineiro

Luis Mineiro

Director of Digital Foundation @ASOS.com, SRE Charmer, Previously @Delivery Hero and @Zalando

Session

Unconference: Debugging Distributed Systems

Tuesday Mar 17 / 01:35PM GMT