Presentation: Streaming a Million likes/second: Real-time Interactions on Live Video

Track: Streaming Data Architectures

Location: Churchill, G flr.

Duration: 11:50am - 12:40pm

Day of week: Monday

Share this on:

What You’ll Learn

  1. Find out about LinkedIn’s Real-time Distribution Platform, what it does and how it does it.
  2. Learn how to scale up to millions of users globally.

Abstract

When a broadcaster like BBC streams a live video on LinkedIn, tens of thousands of viewers will watch it concurrently. Typically, hundreds of likes on the video will be streamed in real-time to all of these viewers. That amounts to a million likes/second streamed to viewers per live video. How do we make this massive real-time interaction possible across the globe? In this talk, I’ll do a technical deep-dive into how we use the Play/Akka Framework and a scalable distributed system to enable live interactions like likes/comments at massive scale at extremely low costs across multiple data centers.

Topics I will cover include:

  • Server-side and client-side frameworks for persistent connections.
  • Managing persistent connections with millions of active clients.
  • Pub/Sub architecture for real-time streaming with less than 100ms end to end latency to millions of connected clients. Hint: No Kafka!
  • Leveraging the same platform for other dynamic experiences like Presence.
Question: 

What is the work you're doing today?

Answer: 

I'm the Tech Lead for LinkedIn Messaging and LinkedIn’s Real-time Distribution Platform. This is a platform that we use to deploy server-to-client streaming technology to power many dynamic experiences on LinkedIn. This includes instant distribution of likes, comments and concurrent viewer counts on live videos, instant messaging, typing indicators, seen receipts, and even online presence, those green online indicators that you see when you message someone.

Question: 

What are the goals you have for the talk?

Answer: 

The talk is centered around the platform that I just described, which supports real-time distribution of likes, comments, concurrent viewer counts, and notifications to millions of connected viewers that are watching live videos on LinkedIn at any given time. I have a couple of goals. I want to get the audience really excited about just the importance of dynamic interactive experiences in their apps. So these days, Instagram, Twitch, Facebook, LinkedIn learning, they're all trying to go towards this concept of getting people to interact with each other, getting people to learn from each other, especially LinkedIn. In the professional context, we want people to learn from each other, build those networks and connect with each other. I want to get all the audience excited about how they can apply such technology to their apps. Secondly, I want to do that by introducing them to the fundamental building blocks that you would need to build such a system. Fundamentally, I believe that the building blocks that you need is a persistent connection with your clients, which allows you to actually stream data to the clients, some methodology of allowing clients to subscribe to the topics that they're interested in and a methodology to allow publishers to publish to those topics so that you can stream relevant data down to the clients at scale. And thirdly, I want to discuss challenges in distributed systems like and how to solve these challenges by starting small and then adding layers of simple architecture to reach a massive scale. There's no magic there. I will do that by sharing real practical experiences that we had in doing so.

Question: 

In the abstract, you mention you're using Akka and Play. Why were those particular frameworks used?

Answer: 

The biggest reason is scale. Both Play and Akka are completely asynchronous event-driven frameworks, which means that there are no blocking operations or shared states anywhere. That's the fundamental reason we did so. Play, specifically, is a completely asynchronous web server framework that allows you to use a very small number of threads to serve a large number of requests because a thread is used only when you're doing some work in each of the respective requests. Akka enables a concurrent, message-driven system with the concept of actors. Actors have state which is modified only via messages, and therefore each actor can do a little task without having to worry about what is happening in the rest of the system. A thread is used only while processing these messages and re-assigned to the next actor that needs it when idle. So, a small number of threads can serve a large number of actors. In my case, I use it for maintaining these connections. Each connection is maintained by one actor and you're able to scale very effectively because those actors are working independently and serving those connections independently only when activity happens.

Question: 

Do you think with the actor model is easier to reason about let's say threading as an example?

Answer: 

Absolutely. The other alternative is to have, for example, scheduled executor services and thread pools to manage these connections. The thing that you start to struggle with there is managing shared state, sizing the thread pool, making sure that it has sufficient resources and preventing starvation. And as the system grows, having a thread dedicated to each connection can result in very poor scaling characteristics. If you don't do it in the specific context that actors do it in, which is to pass messages to each other, act only when there is something to process and have no concept of a shared state across these actors, these things become really hard to scale.

Question: 

What do you want people to leave the talk with?

Answer: 

As I said above, I think I really want the audience to walk away with practical advice because we built this system from the ground up and I want to show real examples of how we did that and how we layered on top of the simple systems that we built. Practical advice for building distributed systems that can support the distribution of events to millions of connected clients, and go and apply this to their applications directly. Secondly, I would like them to see how seemingly impossible scale can be achieved with simple building blocks. You take these building blocks, use powerful asynchronous frameworks like Play and Akka, and suddenly you're able to scale to a system that can serve viewers across the globe.

Speaker: Akhilesh Gupta

Sr. Staff Software Engineer @LinkedIn

Akhilesh is the technical lead for LinkedIn's Real-time delivery infrastructure and LinkedIn Messaging. He has been working on the revamp of LinkedIn’s offerings to instant, real-time experiences. Before this, he was the head of engineering for the Ride Experience program at Uber Technologies in San Francisco. He holds a Master's degree in CS from Stanford University.

Find Akhilesh Gupta at

Similar Talks

Scaling N26 Technology Through Hypergrowth

Qcon

Software Engineer and Tech Lead @N26

Folger Fonseca

How to Debug Your Team

Qcon

Senior VP Engineering @spring_health, former VP Engineering @Meetup

Lisa van Gelder

Trust, the Secret Ingredient in High Performing Team

Qcon

Site Reliability Engineer @immersivelabsuk

Jaycee Cheong

Monitoring All the Things: Keeping Track of a Mixed Estate

Qcon

Principal Engineer Operations and Reliability Programme @FT

Luke Blaney

3 Disciplines for Leading a Distributed Agile Organization

Qcon

Distributed Coach/Mentor & Community Cultivator

Mark Kilby

Tracks

Monday, 2 March

Tuesday, 3 March

Wednesday, 4 March

Download the QCon App!

Available on iOS and Android

The QCon app helps you make the most of your conference experience. Easily browse and follow the conference schedule, star the talks you want to attend, and keep tabs on your personal itinerary. Download the app now for free on iOS and Android.

QCon - iOS QCon - Android
QCon