Keynote: Monkeys in Lab Coats: Applying failure testing research @Netflix
Location:
- Fleming / Whittle, 3rd fl.
Day of the Week:
- Wednesday
Industry and academia need each other. Far from the tire fires of production, university researchers have the time to ask big questions. Sometimes they get lucky and obtain answers that change how we think about large-scale systems! But detached from real world constraints, systems research in academia risks irrelevance: inventing and solving imaginary problems. Industry owns the data, the workloads and the know-how to realize large-scale infrastructures. They want answers to the big questions, but often fear the risks associated with research. Academics, for their part, seek real-world validation of their ideas, but are often unwilling to adapt their “beautiful” models to the gritty realities of production deployments. Collaborations between industry and academia -- despite their deep interdependence -- are rare.
In this talk, we present our experience: a fruitful industry/academic collaboration. We describe how a “big idea” -- lineage-driven fault injection -- evolved from a theoretical model into an automated failure testing system that leverages Netflix’s state-of-the-art fault injection and tracing infrastructures. This collaboration required us to take risks, to accept defeats, and to constantly evolve our approach to “make it work”. We sketch the architecture of the automated failure testing system we built and some of its discoveries, while providing intuition for why it works. Along the way, we will describe the challenges (expect as well as unexpected, technical as well as ideological) that arose, and how we overcame them.
Similar Talks
Similar Talks
Tracks
Covering innovative topics
Monday, 7 March
-
Back to Java
What to expect in Java 9 and Spring 5
-
Stream Processing @ Scale
Big data, fast-moving data. Practical implementation lessons on Real-time Data
-
DevOps & CI/CD
Lessons/stories on optimizing the deployment pipeline
-
Head-to-Tail Functional Languages
Free-range Monads, Tackling immutability, tales from production, and more...
-
Architecting for Failure
Your system will fail. Take control before it takes you with it
-
21st Century Culture from Geeks on the Ground
New ways to organise technology companies and workplace culture
Tuesday, 8 March
-
Architectures You've Always Wondered about
In-depth technical case studies from giants like: Microsoft, Netflix, Google, Twitter, and more...
-
Close to the Metal
Get efficiency back into your code, concepts like: cache efficient algorithm and lock free data structures
-
Containers (in production)
Real-world lessons on scalability and reliability in production container deployments
-
Modern CS in the real world
Real-world Industry adoption of modern CS ideas
-
Security, Incident Response & Fraud Detection
Master-level classes on building security into your system and responding to incidents when things go wrong.
-
Optimizing You
Keeping life in balance is always a challenge. Learning lifehacks
Wednesday, 9 March
-
Disrupting Finance
Technology advances in finance (blockchain, P2P, Machine Learning, API's)
-
Modern Native Languages
Modern native languages: Safe efficiency with Go, Rust, Swift
-
Full Stack Javascript
Level up Javascript with topics like Angular, React/ReactNative, Node, Mongo/Couch/Other, Falcor, GraphQL, etc
-
Data Science & Machine Learning Methods
A developer's data science and machine learning toolkit
-
Microservices for Mega-Architectures
Practical lessons on Microservices success.
-
Modern Agile Development
Revisiting Agile today and tackling challenges we are seeing in the wild