Conference:March 6-8, 2017
Workshops:March 9-10, 2017
Presentation: cgroupv2: Linux's New Unified Control Group System
Location:
- Whittle, 3rd flr.
Duration
Day of week:
- Tuesday
Level:
- Advanced
Persona:
- Developer
Key Takeaways
- Understand the changes implemented in cgroup v2.
- Learn what v2 offers tooling developers
- Hear how SRE’s can improve reliability in their systems leveraging cgroup v2
Abstract
cgroupv1 (or just "cgroups") has helped revolutionise the way that we manage and use containers over the past 8 years. A complete overhaul is coming -- cgroupv2. This talk will go into why a new control group system was needed, the changes from cgroupv1, and practical uses that you can apply to improve the level of control you have over the processes on your servers.
We will go over:
- Design decisions and deviations for cgroupv2 compared to v1
- Pitfalls and caveats you may encounter when migrating to cgroupv2
- Discussion of the internals of cgroupv2
- Practical information about how we are using cgroupv2 inside Facebook
Interview
I work in the Web Foundation team. We are technically responsible for the operation of the web servers for Facebook, but we’ve morphed into a team that acts as the guardian of overall reliability at Facebook. We are the people who get involved when there are severe production incidents or when there are service-level issues. As things ultimately tend to end up at the web server, we get involved in systematic issues across Facebook that affect users. Most of my work is in Linux debugging, so I spend a lot of my time on the command line working out problems in our stack. We also have people who are more focused on debugging related to RPC, caching, PHP, and those kinds of things, so we really have a broad team. The main idea of the team is that we have domain experts from across the company, and we all come together to figure out how we can drive Facebook’s reliability forward across a variety of areas.
I see cgroups as one of the two pillars of improving resilience against failure domains, the other being dependencies between different applications and services. cgroups in general are really important there too in helping different services on the same machine operate in a harmonious manner. The use of cgroups as a resource management mechanism is a potentially invaluable tool in any SRE’s toolbox.
There are two kinds of things to consider here. First, the reason that cgroupv2 is called version 2 is that it provides a API breaking change from a userspace perspective. As such, one of the main things I want to get across in the talk is for the audience to understand why we chosen to have a new version of cgroups instead of just making incremental improvements to version 1. One of the problems with adopting new kernel technology is that unless people really see the motivation and value in adopting it, you end up with many continuing to use the legacy API, which is really not an ideal situation for either side. Hopefully the talk will give those folks the motivation to go and check it out, and see how they can use it to concretely benefit their organisations. cgroups are also used by all sorts of other technology like Docker, systemd, Yarn, and others that require resource management. Most of those currently still only support version 1 (with the exception of systemd which supports v2 now and has since v226). We really need buy-in for people that are influential to get widespread adoption, understanding, and mature tooling. I see issues every now and then on various projects’ bug trackers where the true resolution for the issue posed is to move to cgroupv2, but this is not always obvious to those not familiar with current developments in kernel technologies. As such, I want to incentivise people to give version 2 a look and see if their product could benefit from using it.
Version 2 has been in development since 2011, but we only made things stable in 4.5. Previously the filesystem was experimentally available via a developer flag, but 4.5 is the point where we have a stable API (and no developer flag), and I’d feel comfortable telling people they can use it in production. In 4.5, we made a commitment that the API will not change dramatically, or in a way that breaks people. Additional tooling was also added from 4.6 onwards, so this is a good kernel to start with if possible. While we do have a stable API in 3.5, we still have a way to go before we have all the features we want in version 2 (like some features around domain accounting). Some of changes in cgroupv2 are in preparation for future improvements and developments which we simply could not implement under the design of the v1 hierarchy. As such it’s important that this talk helps to inform people about the future developments for cgroupv2 that have motivated our design decisions.
I’d like to reach three main audiences: people who are building tooling, people who are building CDN’s and doing site reliability work, and other current cgroups users. The first group are definitely people who are building development tools like Docker. While we already have some buy-in here, this is the chance to let them see and touch cgroupv2 and make them feel comfortable with using it, and migrate their APIs to support it. At Facebook, we use cgroupv2 not only for containerisation, but also for general system management. As such, the second group that I’m also looking to speak towards is people like me at companies who are interested in improving their reliability, and wondering how cgroup version 2 might help them achieve that goal. The last group is people who use cgroups currently. In their case, the talk is mostly about knowledge sharing. I’d like them to have a clear incentive to start using cgroup v2 if they are already using v1, and a large part of that is helping them to understand the improvements and changes surrounding v2.
Similar Talks
Tracks
-
Architecting for Failure
Building fault tolerate systems that are truly resilient
-
Architectures You've Always Wondered about
QCon classic track. You know the names. Hear their lessons and challenges.
-
Modern Distributed Architectures
Migrating, deploying, and realizing modern cloud architecture.
-
Fast & Furious: Ad Serving, Finance, & Performance
Learn some of the tips and technicals of high speed, low latency systems in Ad Serving and Finance
-
Java - Performance, Patterns and Predictions
Skills embracing the evolution of Java (multi-core, cloud, modularity) and reenforcing core platform fundamentals (performance, concurrency, ubiquity).
-
Performance Mythbusting
Performance myths that need busting and the tools & techniques to get there
-
Dark Code: The Legacy/Tech Debt Dilemma
How do you evolve your code and modernize your architecture when you're stuck with part legacy code and technical debt? Lessons from the trenches.
-
Modern Learning Systems
Real world use of the latest machine learning technologies in production environments
-
Practical Cryptography & Blockchains: Beyond the Hype
Looking past the hype of blockchain technologies, alternate title: Weaselfree Cryptography & Blockchain
-
Applied JavaScript - Atomic Applications and APIs
Angular, React, Electron, Node: The hottest trends and techniques in the JavaScript space
-
Containers - State Of The Art
What is the state of the art, what's next, & other interesting questions on containers.
-
Observability Done Right: Automating Insight & Software Telemetry
Tools, practices, and methods to know what your system is doing
-
Data Engineering : Where the Rubber meets the Road in Data Science
Science does not imply engineering. Engineering tools and techniques for Data Scientists
-
Modern CS in the Real World
Applied, practical, & real-world dive into industry adoption of modern CS ideas
-
Workhorse Languages, Not Called Java
Workhorse languages not called Java.
-
Security: Lessons Learned From Being Pwned
How Attackers Think. Penetration testing techniques, exploits, toolsets, and skills of software hackers
-
Engineering Culture @{{cool_company}}
Culture, Organization Structure, Modern Agile War Stories
-
Softskills: Essential Skills for Developers
Skills for the developer in the workplace