How We Created a High-Scale Notification System at Duolingo That Delivered Millions of Messages Within Seconds During a Super Bowl Commercial Break

Building a notification system may seem trivial, but what about building one that could reach million of users within a few seconds? What about doing that right after your advertisement airs?

Event-based notification systems are not uncommon anymore, but there’s rarely a cost-effective example of an on-demand, highly parallel notification system. The complexity of building such a system comes from the intersection of system design, site reliability, and cloud resource management. All of that while being pressured by the demands of an unhinged marketing campaign over TV and the Web.

In this presentation, we will focus on:

  • How we built/test a robust on-demand notification system
  • What it takes to manage cloud resources/site-reliability at the same time.
  • How to mitigate reliability issues with “zombie mode" and other relevant internal tooling we created.

Speaker

Vitor Pellegrino

Site Reliability Engineering @Duolingo

Vitor Pellegrino is a Staff Site Reliability Engineer at Duolingo. He leads different FinOps and Resilience initiatives in our Infrastructure Platform area, which is the group responsible for developing the platform that powers all of Duolingo's systems. 

He has been working in various engineering leadership roles for over two decades with companies in Brazil, Germany, and the US. Most of his career was spent with highly distributed Microservices, high-traffic scenarios, and the challenges of implementing FinOps, SRE, and Continuous Delivery practices at an organizational level.

Read more
Find Vitor Pellegrino at:

Speaker

Zhen Zhou

Software Engineer at @Duolingo, Previous Theoretical Computer Science Enthusiast @CMU

Zhen is a senior software engineer at Duolingo. He is a core member of the Growth Infrastructure team, which is responsible for backend microservices that fuels Duolingo’s vibrant social features and retention strategies.

He joined Duolingo in 2020 after graduating from Carnegie Mellon University. He is deeply passionate about designing and implementing robust backend infrastructure that directly impacts millions of users globally. He embraces challenges that come with evolving system designs and adapts services to meet the needs of a growing and diverse global audience.

Read more
Find Zhen Zhou at:

Date

Monday Apr 8 / 02:45PM BST ( 50 minutes )

Location

Fleming (3rd Fl.)

Topics

platforms Platform Engineering architecture scalability Reliability Asynchronous System Design Notification System Cloud Resource Management python

Share

From the same track

Session architecture

Beyond Platform Thinking at RB Global – Build Things No One Expects, in a Place No One Expects It

Monday Apr 8 / 10:35AM BST

Ever wondered what is it really like to move from a poorly integrated COTS legacy architecture to a well factored API driven cloud native platform and event driven architecture, globally deployed in AWS EKS with modern observability and daily software releases?

Speaker image - Ranbir Chawla

Ranbir Chawla

SVP of Engineering @RB Global, Previously SME @Thoughtworks Digital Platform Strategy Team, 30+ Years in Software Engineering and Entrepreneurship

Session Data Architecture

Architecting for Data Products

Monday Apr 8 / 01:35PM BST

Data Mesh principles have brought data products to the forefront of data architecture discussions. However, due to the variety of applications and the vast technology landscape, it is hard to bring those principles into practice.

Speaker image - Danilo Sato

Danilo Sato

Global Head of Technology - Data & AI @Thoughtworks, Author of DevOps in Practice, CD4ML, and DataIQ UK’s 100 Most Influential People in Data in 2022–2023

Session

Everything Is a Plugin: How the Backstage Architecture Helps Platform Teams at Spotify and Beyond Spread Ownership and Deliver Value

Monday Apr 8 / 03:55PM BST

Back in 2014, platform engineers at Spotify identified a problem: the number of internal tools was growing rapidly, and as a result engineers around the company were struggling to find the tools and information they needed to work efficiently.

Speaker image - Pia Nilsson

Pia Nilsson

Director of Engineering @Spotify

Speaker image - Mike Lewis

Mike Lewis

Staff Engineer @Spotify

Session

Unconference: Architectures You've Always Wondered About

Monday Apr 8 / 05:05PM BST

An unconference is a participant-driven meeting. Attendees come together, bringing their challenges and relying on the experience and know-how of their peers for solutions.

Session

0 → 1, shipping Threads in 5 months

Monday Apr 8 / 11:45AM BST

In Jan 2023, we received word that we’d need to build a microblogging service to compete with Twitter in a couple of months. This is the (technical) story of the small team that was assembled to take on that challenge, and that shipped a new social network in July.

Speaker image - Zahan Malkani

Zahan Malkani

Software Engineer @Meta