Building a notification system may seem trivial, but what about building one that could reach million of users within a few seconds? What about doing that right after your advertisement airs?
Event-based notification systems are not uncommon anymore, but there’s rarely a cost-effective example of an on-demand, highly parallel notification system. The complexity of building such a system comes from the intersection of system design, site reliability, and cloud resource management. All of that while being pressured by the demands of an unhinged marketing campaign over TV and the Web.
In this presentation, we will focus on:
- How we built/test a robust on-demand notification system
- What it takes to manage cloud resources/site-reliability at the same time.
- How to mitigate reliability issues with “zombie mode" and other relevant internal tooling we created.
Speaker
Vitor Pellegrino
Site Reliability Engineering @Duolingo
Vitor Pellegrino is a Staff Site Reliability Engineer at Duolingo. He leads different FinOps and Resilience initiatives in our Infrastructure Platform area, which is the group responsible for developing the platform that powers all of Duolingo's systems.
He has been working in various engineering leadership roles for over two decades with companies in Brazil, Germany, and the US. Most of his career was spent with highly distributed Microservices, high-traffic scenarios, and the challenges of implementing FinOps, SRE, and Continuous Delivery practices at an organizational level.
Find Vitor Pellegrino at:
Speaker
Zhen Zhou
Software Engineer at @Duolingo, Previous Theoretical Computer Science Enthusiast @CMU
Zhen is a senior software engineer at Duolingo. He is a core member of the Growth Infrastructure team, which is responsible for backend microservices that fuels Duolingo’s vibrant social features and retention strategies.
He joined Duolingo in 2020 after graduating from Carnegie Mellon University. He is deeply passionate about designing and implementing robust backend infrastructure that directly impacts millions of users globally. He embraces challenges that come with evolving system designs and adapts services to meet the needs of a growing and diverse global audience.