linux.conf.au 2021 | Presentation: So you're a Linux kernel developer? Name all subsystems.

Presented by

Pia Eichinger

Pia Eichinger is a student of Computer Science at the University of Applied Research in Regensburg, currently in her Research-Masters programme. She was conducting process analysis with specific focus on Linux during her Bachelor thesis, developing a fascination for the kernel and its usages. Her current field of interest is researching the Linux kernel topology using quantitative software engineering methodology.
Ralf Ramsauer

Ralf Ramsauer is a PhD student at the University of Applied Sciences Regensburg. His academic research interests focus on finding successful long term maintenance strategies for Open Source Software in embedded industrial context. This covers the full software stack of embedded systems, from hardware-related low-level virtualisation technologies via kernel-space through to userland.
Stefanie Scherzinger

Stefanie Scherzinger is a professor at Uni Passau. Her research is influenced by her industry experience as a software engineer at IBM and Google. Currently, she focuses on maintaining applications backed by NoSQL data stores, as well as systematic support for traditional database schema evolution.
Wolfgang Mauerer

Wolfgang Mauerer is a professor of theoretical computer science at the Technical University Regensburg, and a senior key expert at Siemens Corporate Research, Competence Centre Embedded Linux. He serves on the technical steering committee of the Linux Foundation's Civil Infrastructure platform. His academic research deals with socio-technical software engineering, and the industrial use of open source.

Abstract

It's needless to mention that the kernel is obviously split into several subsystems. But what defines a subsystem? An entry in MAINTAINERS? Then there would be more than 2000 of them, which is clearly not the case. As there is no official definition of >subsystems<, we want to identify them: We are interested of what subsystems the kernel actually consists of and how they are related to each other. This is helpful for newcomers, to get a better insight in the kernel, but also for industrial vendors performing development process analysis. This promises benefits for developers and the community. But beyond this, a precise documentation and definition of subsystems is also necessary for upcoming challenges like certifications in safety critical environments (for instance, as aspired by the Linux Foundation's ELISA project). Proper documentation also eases general quality ensurance, provides help for longterm maintenance, and lowers the initial learning curve for newcomers. Therefore, we decided to take a look at the bigger picture. Quite literally, actually. Our talk discusses methods to visualise the entire repository subsystem topology using graphs based on data mining in the kernel. It measures intersections of responsibility for MAINTAINERS entries, and clusters them based on overlap intensity, effectively detecting de-facto subsystems. It reveals sensible, though sometimes surprising, structures, compares the differences between de-facto and documented subsystems, and shows numerous possibilities for using the data, ranging from improvements to the development process to formal safety-critical certification efforts.