Presentation: Distributed Consensus: Making Impossible Possible

Location:

Duration

Duration: 
4:10pm - 5:00pm

Day of week:

Key Takeaways

  • Review the major work done in the area of consensus, and how that work has evolved in building modern resilient distributed systems. 
  • Gain a better understanding of what makes and does not make a consensus problem. 
  • Understand which algorithms are best suited to different situations.

Abstract

In this talk, we explore how to construct resilient distributed systems on top of unreliable components.

Starting, almost two decades ago, with Leslie Lamport’s work on organising parliament for a Greek island. We will take a journey to today’s datacenters and the systems powering companies like Google, Amazon and Microsoft. Along the way, we will face interesting impossibility results, machines acting maliciously and the complexity of today’s networks.

Ultimately, we will discover how to reach agreement between many parties and from this, how to construct new fault-tolerance systems that we can depend upon everyday.

Interview

Question: 
QCon: What’s the motivation for your talk?
Answer: 
Heidi: What I would like people to take away is an understanding of distributed consensus. So when they see a consensus problem, they know (roughly) what kind of problem it is. Audience members should be able to consider what might be the right algorithm to fit that type of consensus problem. I would also like them to be able to identify when they don't need consensus. It’s not uncommon for people to believe everything is a consensus problem, and therefore use these really big, heavy weight algorithms. But, in practice, often you don't actually need consensus. Being able to recognize when you do need these solutions and don't, is very important skill.
I'd like people to understand that the consensus is non trivial but very much achievable, if you understand your setting and choose the right approach to it.
Question: 
QCon: What are some of the tips and tricks that help people recognize consensus systems?
Answer: 
Heidi: The key tip is to identify the guarantees that you need from the system. Does the system actually require strong consistency for all the requests? Can it tolerate a weaker model? Is it sometimes okay to read stale data? Is it sometimes not okay to read stale data? What kind of scale are we talking about? What kind of network is there between these nodes? How reliable is that network? Do we need to worry about machines possible acting maliciously or arbitrarily? What kind of fault guarantees do you need? Do you actually need to formally verify guarantees that if X failures occur under Y conditions then it would be okay? Or is a weaker idea that it would be up 99.9% of the time ok? These are types of questions we will discuss when you consider what guarantees a system requires.
Question: 
QCon: Can you give me perhaps some of the patterns or idioms you plan to discuss?
Answer: 
Heidi: Yes. The plan is to go through a chronological tour of distributed consensus. I'll introduce the popular algorithms in different areas. Then, in the end, look back and say: "What kind of settings do these algorithms fit?" So which algorithms would win the award for 'best in a low latency setting'? Or which algorithm would win the award for 'best for throughput'? That kind of thing.
Question: 
QCon: Who do you envision being the primary audience for this talk?
Answer: 
Heidi: Someone who works with distributed systems on a regular basis, but wants to understand their options on a deeper level when it comes to consensus.

Tracks

Covering innovative topics

Monday, 7 March

Tuesday, 8 March

Wednesday, 9 March

Conference for Professional Software Developers