Opening the Box: Diagnosing Operating-System Task-Scheduler Behavior on Highly Multicore Machines

An operating system task scheduler is responsible for placing tasks on cores and for selecting which task is allowed to run, at what time. As such, the scheduler is a critical component of any operating system and has a major impact on application performance. Still, scheduling decisions are buried deep within the operating system code, making it challenging to diagnose performance problems (or even performance improvements) to determine whether the scheduler is responsible and, if so, in what way. These challenges are compounded for highly multithreaded applications, running on large multicore machines, due to the huge amount of information available.

In this talk, we present some tools that we have developed for visualizing the behavior of the Linux kernel task scheduler, and illustrate how these tools can be used to help diagnose performance problems. The tools presented are freely available at https://gitlab.inria.fr/schedgraph/schedgraph

Interview:

What's the focus of your work these days?

I work on improving the quality of low-level systems software. This includes the Coccinelle tool for automating software evolution to allow APIs to change flexibly without inducing developer pain, tools for analyzing software performance, considering the impact of the operating system level, and approaches to formal verification of systems code.

What's the motivation for your talk at QCon London 2024?

Modern operating systems are very complex. Still, it is possible to understand their impact on application performance. I want to encourage people to be aware that this impact exists, and at the same time that tools exist for monitoring the impact of the operating system on application behavior, and that it is possible to organize the information in a way that facilitates problem diagnosis.

How would you describe your main persona and target audience for this session?

Someone who is curious about operating systems; who would like to understand the performance impact of operating system policies.

Is there anything specific that you'd like people to walk away with after watching your session?

Operating systems are not black boxes. It is possible to understand what they are doing and to anticipate problems.


Speaker

Julia Lawall

Senior Scientist @INRIA

Julia Lawall is a senior researcher at Inria Paris. Prior to joining Inria, she completed a PhD at Indiana University and was on the faculty at the University of Copenhagen. Her work focuses on issues around the correctness and performance of operating systems. She develops and maintains the Coccinelle program transformation system that has been extensively used on Linux kernel code, and has recently begun investigating the performance impact of the Linux kernel scheduler, as well as exploring formal verification of scheduler properties.

Read more

Date

Monday Apr 8 / 11:45AM BST ( 50 minutes )

Location

Windsor (5th Fl.)

Topics

Linux kernel scheduling visualization performance

Share

From the same track

Session performance

A Walk Along the Complexity-Performance Curve

Monday Apr 8 / 10:35AM BST

Software performance and complexity are related. It’s common for refactoring to introduce unanticipated regressions, and for performance optimisations to attract scrutiny in code review; how much performance improvement is worth a perceived loss of readability?

Speaker image - Richard Startin

Richard Startin

Senior Software Engineer @Datadog

Session

Panel: What Does the Future of Computing Look Like

Monday Apr 8 / 05:05PM BST

The future of computing promises to be revolutionary. This panel dives into cutting-edge advancements that will redefine how we interact with technology. We'll explore groundbreaking concepts and discuss their potential to transform our world.

Speaker image - Julia Lawall

Julia Lawall

Senior Scientist @INRIA

Speaker image - Matt Fleming

Matt Fleming

CTO @Nyrkiö, Former Linux Kernel Maintainer @Intel and @SUSE

Speaker image - Joe Rowell

Joe Rowell

Founding Engineer @poolside.ai, Low-Level Performance Engineer, Cryptographer and PhD Candidate @RHUL

Session

Practical Benchmarking: How To Detect Performance Changes in Noisy Results

Monday Apr 8 / 03:55PM BST

Finding statistically significant changes in performance results has always been challenging but now that most of our code runs on hardware and infrastructure we don't own, we need methods and tools for detecting performance changes in noisy data.

Speaker image - Matt Fleming

Matt Fleming

CTO @Nyrkiö, Former Linux Kernel Maintainer @Intel and @SUSE

Session

Pitfalls of Unified Memory Models in GPUs

Monday Apr 8 / 01:35PM BST

Modern GPUs offer support for so-called unified memory, providing a universal address space for both CPUs and GPUs.

Speaker image - Joe Rowell

Joe Rowell

Founding Engineer @poolside.ai, Low-Level Performance Engineer, Cryptographer and PhD Candidate @RHUL

Session

Unconference: Performance Engineering Unleashed

Monday Apr 8 / 02:45PM BST

An unconference is a participant-driven meeting. Attendees come together, bringing their challenges and relying on the experience and know-how of their peers for solutions.