Track:

Location:

Whittle, 3rd flr.

Duration

Duration:

2:55pm - 3:45pm

Day of week:

Tuesday

Level:

Advanced

Persona:

Developer

Key Takeaways

Understand the changes implemented in cgroup v2.
Learn what v2 offers tooling developers
Hear how SRE’s can improve reliability in their systems leveraging cgroup v2

Abstract

cgroupv1 (or just "cgroups") has helped revolutionise the way that we manage and use containers over the past 8 years. A complete overhaul is coming -- cgroupv2. This talk will go into why a new control group system was needed, the changes from cgroupv1, and practical uses that you can apply to improve the level of control you have over the processes on your servers.

We will go over:

Design decisions and deviations for cgroupv2 compared to v1
Pitfalls and caveats you may encounter when migrating to cgroupv2
Discussion of the internals of cgroupv2
Practical information about how we are using cgroupv2 inside Facebook

Interview

Question:

What is the focus of you’re doing at Facebook today?

Answer:

I work in the Web Foundation team. We are technically responsible for the operation of the web servers for Facebook, but we’ve morphed into a team that acts as the guardian of overall reliability at Facebook. We are the people who get involved when there are severe production incidents or when there are service-level issues. As things ultimately tend to end up at the web server, we get involved in systematic issues across Facebook that affect users. Most of my work is in Linux debugging, so I spend a lot of my time on the command line working out problems in our stack. We also have people who are more focused on debugging related to RPC, caching, PHP, and those kinds of things, so we really have a broad team. The main idea of the team is that we have domain experts from across the company, and we all come together to figure out how we can drive Facebook’s reliability forward across a variety of areas.

Question:

How did you get involved with cgroups?

Answer:

I see cgroups as one of the two pillars of improving resilience against failure domains, the other being dependencies between different applications and services. cgroups in general are really important there too in helping different services on the same machine operate in a harmonious manner. The use of cgroups as a resource management mechanism is a potentially invaluable tool in any SRE’s toolbox.

Question:

What’s the main goal of your talk?

Answer:

There are two kinds of things to consider here. First, the reason that cgroupv2 is called version 2 is that it provides a API breaking change from a userspace perspective. As such, one of the main things I want to get across in the talk is for the audience to understand why we chosen to have a new version of cgroups instead of just making incremental improvements to version 1. One of the problems with adopting new kernel technology is that unless people really see the motivation and value in adopting it, you end up with many continuing to use the legacy API, which is really not an ideal situation for either side. Hopefully the talk will give those folks the motivation to go and check it out, and see how they can use it to concretely benefit their organisations. cgroups are also used by all sorts of other technology like Docker, systemd, Yarn, and others that require resource management. Most of those currently still only support version 1 (with the exception of systemd which supports v2 now and has since v226). We really need buy-in for people that are influential to get widespread adoption, understanding, and mature tooling. I see issues every now and then on various projects’ bug trackers where the true resolution for the issue posed is to move to cgroupv2, but this is not always obvious to those not familiar with current developments in kernel technologies. As such, I want to incentivise people to give version 2 a look and see if their product could benefit from using it.

Question:

v2 was released with 4.5 of the Linux kernel correct?

Answer:

Version 2 has been in development since 2011, but we only made things stable in 4.5. Previously the filesystem was experimentally available via a developer flag, but 4.5 is the point where we have a stable API (and no developer flag), and I’d feel comfortable telling people they can use it in production. In 4.5, we made a commitment that the API will not change dramatically, or in a way that breaks people. Additional tooling was also added from 4.6 onwards, so this is a good kernel to start with if possible. While we do have a stable API in 3.5, we still have a way to go before we have all the features we want in version 2 (like some features around domain accounting). Some of changes in cgroupv2 are in preparation for future improvements and developments which we simply could not implement under the design of the v1 hierarchy. As such it’s important that this talk helps to inform people about the future developments for cgroupv2 that have motivated our design decisions.

Question:

Who is the target audience of this talk? Are you talking to a tooling vendor (like a Docker) or are you talking to developer who are using a tool that might incorporate cgroups v2? Who’s the main focal point of the talk?

Answer:

I’d like to reach three main audiences: people who are building tooling, people who are building CDN’s and doing site reliability work, and other current cgroups users. The first group are definitely people who are building development tools like Docker. While we already have some buy-in here, this is the chance to let them see and touch cgroupv2 and make them feel comfortable with using it, and migrate their APIs to support it. At Facebook, we use cgroupv2 not only for containerisation, but also for general system management. As such, the second group that I’m also looking to speak towards is people like me at companies who are interested in improving their reliability, and wondering how cgroup version 2 might help them achieve that goal. The last group is people who use cgroups currently. In their case, the talk is mostly about knowledge sharing. I’d like them to have a clear incentive to start using cgroup v2 if they are already using v1, and a large part of that is helping them to understand the improvements and changes surrounding v2.

Speaker: Chris Down

Production Engineer @ Facebook's Web Foundation team

Chris Down is a Production Engineer on Facebook's Web Foundation team, based in London. He is responsible for debugging and resolving major production issues, and improving the reliability and efficiency of Facebook's systems. He also is a contributor to Facebook's open source efforts, including osquery, an operating system instrumentation framework for OS X and Linux.

Find Chris Down at

Speaker page

https://www.linkedin.com/in/chrisldown

CTO and CO-Founder @Aerospike

Brian Bulkowski

SQL Server On Linux: Will It Perform Or Not?

Core Developer Behind Porting SQL Server to Linux @Microsoft

Slava Oks

Scaling Facebook Live Videos to a Billion Users

Engineering Director @Facebook focused on Live, Videos, and Messenger

Sachin Kulkarni

Tracks

Architecting for Failure

Building fault tolerate systems that are truly resilient
Architectures You've Always Wondered about

QCon classic track. You know the names. Hear their lessons and challenges.
Modern Distributed Architectures

Migrating, deploying, and realizing modern cloud architecture.
Fast & Furious: Ad Serving, Finance, & Performance

Learn some of the tips and technicals of high speed, low latency systems in Ad Serving and Finance
Java - Performance, Patterns and Predictions

Skills embracing the evolution of Java (multi-core, cloud, modularity) and reenforcing core platform fundamentals (performance, concurrency, ubiquity).
Performance Mythbusting

Performance myths that need busting and the tools & techniques to get there

Dark Code: The Legacy/Tech Debt Dilemma

How do you evolve your code and modernize your architecture when you're stuck with part legacy code and technical debt? Lessons from the trenches.
Modern Learning Systems

Real world use of the latest machine learning technologies in production environments
Practical Cryptography & Blockchains: Beyond the Hype

Looking past the hype of blockchain technologies, alternate title: Weaselfree Cryptography & Blockchain
Applied JavaScript - Atomic Applications and APIs

Angular, React, Electron, Node: The hottest trends and techniques in the JavaScript space
Containers - State Of The Art

What is the state of the art, what's next, & other interesting questions on containers.
Observability Done Right: Automating Insight & Software Telemetry

Tools, practices, and methods to know what your system is doing

Data Engineering : Where the Rubber meets the Road in Data Science

Science does not imply engineering. Engineering tools and techniques for Data Scientists
Modern CS in the Real World

Applied, practical, & real-world dive into industry adoption of modern CS ideas
Workhorse Languages, Not Called Java

Workhorse languages not called Java.
Security: Lessons Learned From Being Pwned

How Attackers Think. Penetration testing techniques, exploits, toolsets, and skills of software hackers
Engineering Culture @{{cool_company}}

Culture, Organization Structure, Modern Agile War Stories
Softskills: Essential Skills for Developers

Skills for the developer in the workplace

LAST YEAR'S SCHEDULE

Location:

Duration

Day of week:

Level:

Persona:

Key Takeaways

Abstract

Interview

Find Chris Down at

Similar Talks

Tracks

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: cgroupv2: Linux's New Unified Control Group System

Location:

Duration

Day of week:

Level:

Persona:

More talks on:

Key Takeaways

Abstract

Interview

Find Chris Down at

Similar Talks

Tracks

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World