When tools break the kernel

| categories: fedora

The kernel is really self-contained. This makes it great for trying experiments and breaking things. It also means that most bugs are also going to be self-contained. I say most because the kernel still has dependencies on other core system packages and when those change, the kernel can break as well.

All the low level packages on your system are usually so well maintained you don't even realize they are present1. binutils provides tools for working with binary files. The assembler will get updates for features such as instruction set updates. Changes like these can break the kernel unexpectedly though. glibc is another popular package for updates which break the kernel. The word 'break' here does not mean the changes from glibc/binutils were incorrect. The kernel makes a lot of assumptions about what's provided by external packages and things are bound to get out of sync occasionally. This is a big part of the purpose of rawhide: to find dependency problems and get them fixed as soon as possible.

Updates to the compiler can be more ambiguous about whether or not a change is a regression. Compiler optimizations are designed to improve code but may also change the behavior in unexpected ways. A good example of this is some recent optimizations related to constants. For those who haven't studied compilers, constant folding involves identifying expressions that can be evaluated to a constant at compile time. gcc provides a builtin function __builtin_constant_p to let code behave differently depending on if an expression can be evaluated to a constant at compile time. This sounds fairly simple for cases such as __builtin_constant_p(0x1234) but it turns out to be more complex for actual code when combined with more complex compiler analysis. The end result is that a new compiler optimization broke some assumptions about how the kernel was using __builtin_constant_p. One of the risks of using compiler builtin functions is that the behavior is defined but only to some degree. Developers may argue that a compiler is doing something incorrect but it turns out to be easier just to fix the kernel.

Sometimes the compiler is just plain wrong. New optimizations may eliminate critical portions of code. Identifying such bugs is a special level of debugging. Typically, you end up staring at the code wondering how it could end up in such a situation. Then you get an idea that staring at assembly will somehow be less painful at which point you notice that a critical code block is missing. This may be followed by yelling. For kernel builds, comparing what gets pulled into the buildroot of working and non-working builds can be a nice hint that something outside the kernel has gone awry.

As a kernel developer, I am appreciative to the fantastic maintainers of the packages the kernel depends on. All the times I've reported issues in Fedora the maintainers have been patient and helpful in helping me figure out how to get the right debugging information to determine whether an issue is in gcc/binutils/glibc or the kernel. The kernel may be self-contained but it still needs other packages to work.

  1. Until your remove them with the --force option, then you really miss them.