Conference:March 6-8, 2017
Workshops:March 9-10, 2017
Presentation: Scaling Instagram Infrastructure
Location:
- Fleming, 3rd flr.
Duration
Day of week:
- Tuesday
Level:
- Intermediate
Persona:
- Architect
Key Takeaways
- Learn some of the issues and solutions Instagram’s Infrastructure team had around scalability.
- Hear how Instagram improved single server capacity and improved network latency.
- Learn some of the tools, techniques, and metrics Instagram uses to support 500 million monthly users.
Abstract
Instagram is a social network mobile app that allows people to share the world's moments as they happen. It serves 300 millions users on a daily basis throughout the world.
In this talk, we will give an overview on the infrastructure that supports its users on this large scale.
Topics will include:
- a brief history of infrastructure evolution
- overall architecture and multi-data center support
- tuning of uwsgi parameters for scaling
- performance monitoring and diagnosis
- and django/python upgrade (why, challenges and lessons learned)
Interview
Lisa: I am a software engineer on the Instagram Infrastructure Team. Our team’s main purpose is to keep the scalability of our systems up. While doing that, we identify both short term and long term fixes around scale. Additionally, we work closely with many other teams on the product side to help them to identify bottlenecks and make suggestions related to scale when they are shipping new features to our users.
Lisa: We are serving more than 500 million monthly active users, with 300M of them on Instagram every day.
Lisa: Our web tier stack is Django with Python, and we have backend services using Cassandra, MySQL, and MemCache. Those are basically our storage devices. We use Facebook’s Ever store as our photo storage. We also have an async tier with RabbitMQ and Celery.
Lisa: We do use containers. We basically use Linux LXC, a variant of it. Facebook has its own Tupperware container which is also a publicly talked topic. it’s a wrap around of LXC.
We moved from AWS to Facebook’s data center about two years ago. When we made that move we expanded to multi datacenters.
Lisa: The rationale is really just about accessing Facebook’s servers more conveniently. Otherwise, you always have the firewall and things like that in between. So we really could not take advantage of some of the things Facebook had like monitoring and scaling. Aside from that, I think there was were some VDM limitations that caused us issues around data replication.
Lisa: I will discuss different aspects of scaling, horizontal, vertical, and scale of dev team. I will talk about how we scaled to multiple data centers; how we define scale up and what tools we use and built to identify scaling bottlenecks; what we have done to enable product development velocity and our release process. Along with the things we have achieved, we’ll discuss some of the continued challenges and our plans to address them.
Similar Talks
Tracks
-
Architecting for Failure
Building fault tolerate systems that are truly resilient
-
Architectures You've Always Wondered about
QCon classic track. You know the names. Hear their lessons and challenges.
-
Modern Distributed Architectures
Migrating, deploying, and realizing modern cloud architecture.
-
Fast & Furious: Ad Serving, Finance, & Performance
Learn some of the tips and technicals of high speed, low latency systems in Ad Serving and Finance
-
Java - Performance, Patterns and Predictions
Skills embracing the evolution of Java (multi-core, cloud, modularity) and reenforcing core platform fundamentals (performance, concurrency, ubiquity).
-
Performance Mythbusting
Performance myths that need busting and the tools & techniques to get there
-
Dark Code: The Legacy/Tech Debt Dilemma
How do you evolve your code and modernize your architecture when you're stuck with part legacy code and technical debt? Lessons from the trenches.
-
Modern Learning Systems
Real world use of the latest machine learning technologies in production environments
-
Practical Cryptography & Blockchains: Beyond the Hype
Looking past the hype of blockchain technologies, alternate title: Weaselfree Cryptography & Blockchain
-
Applied JavaScript - Atomic Applications and APIs
Angular, React, Electron, Node: The hottest trends and techniques in the JavaScript space
-
Containers - State Of The Art
What is the state of the art, what's next, & other interesting questions on containers.
-
Observability Done Right: Automating Insight & Software Telemetry
Tools, practices, and methods to know what your system is doing
-
Data Engineering : Where the Rubber meets the Road in Data Science
Science does not imply engineering. Engineering tools and techniques for Data Scientists
-
Modern CS in the Real World
Applied, practical, & real-world dive into industry adoption of modern CS ideas
-
Workhorse Languages, Not Called Java
Workhorse languages not called Java.
-
Security: Lessons Learned From Being Pwned
How Attackers Think. Penetration testing techniques, exploits, toolsets, and skills of software hackers
-
Engineering Culture @{{cool_company}}
Culture, Organization Structure, Modern Agile War Stories
-
Softskills: Essential Skills for Developers
Skills for the developer in the workplace