Kbuild tricks

| categories: fedora

Several of the tasks I've worked on recently have involved looking at some of the kernel's build infrastructure. This is all fairly well documented which makes it nice to work with.

The kernel automatically generates some files at build time. This is mostly set up to be transparent to developers unless they are looking for them. The majority of these files are headers at include/generated. A good example of something which needs to be generated is the #define representing the kernel version (e.g. 4.15.12). The header file include/generated/bounds.h contains #defines for several enum constants calculated at build time. Cleverly, most of these files are only actually replaced if the generated output changes to avoid unnecessary recompile. Most of this work is handled by the filechk macro.

The C preprocessor is typically used on C files, as one might obviously expect. It's not actually limited to C files though. Each architecture has to define a linker script which meets the architectural requirements. The linker language is common across architectures so it's beneficial to have common definitions for typical sections such as initcalls and rodata. There's a global rule to run the pre-processor on any .lds.S file. Devicetree files also get preprocessed which avoids a lot of copy and pasting of numerical defines.

The compiler flags are typically set in the top level Makefile and named as you might expect (CFLAGS, CXXFLAGS etc.). The process of building the kernel requires building a number of smaller programs. The c flags for these programs are controlled by a different set of variables (HOSTCFLAGS). It sounds incredibly obvious but I've lost time from my day trying to figure out why setting options in CFLAGS weren't being picked up by the host compiler. For more fun, it's possible to use environment variables to set different flags for compiling built-in vs. module code. The moral of the story is know what you're setting.

Debugging build infrastructure isn't always pleasant but the kernel build system isn't too bad overall. I'm at least beginning to understand more parts of it as I find increasingly more obscure things to modify.


Fun with gcc plugins

| categories: fedora

One of piece of infrastructure that's come in as part of the Kernel Self Protection Project (KSPP) is support for gcc plugins. I touched on this briefly in my DevConf talk but I wanted to discuss a few more of the 'practicalities' of dealing with compiler plugins.

At an incredibly abstract level, a compiler transforms a program from some form A to another form A'. Your A might be C, C++ and you expect A' to be a binary file you can run. Modern compilers like gcc produce the final result by transforming your program several passes, so you end up with A to A' to A'' to A''' etc. The gcc plugin architecture allows you to hook in at various points to make changes to the intermediate state of the program. gcc has a number of internal representations so depending on where you are hooking you may need to use a different representation.

Kernel development gets a (not undeserved) reputation for being poorly documented and difficult to get into. To write even a self-contained kernel module requires some knowledge about the rest of the code base. If you have some familiarity with the code base it makes things much easier. I've found compiler plugins to be similarly difficult. I'm not working with the gcc code base on a regular basis so figuring out how to do something practical with the internal structures feels like an uphill battle. I played around with writing a toy plugin to look at the representation and it took me forever to figure out how to get the root of the tree so I could do something as simple as call walk_tree. Once I figured that out, I spent more time figuring out how to actually do a switch on the node to see what type it was. Basically, I'm a beginner in an unfamiliar code base so it takes me a while to do anything.

Continuing the parallels between kernels and compilers, the internal ABI of gcc may change between versions, similar how the kernel provides no stable internal ABI. If you want to support multiple compiler versions in your plugin, this results in an explosion of #ifdef VERSION >= BLAH all throughout the code. Arguably, external kernel modules have the same problem but I'd argue the problem is slightly worse for compiler plugins. Kernel modules can be built and shipped for particular kernel versions but it's harder to require specific compiler versions.

With all this talk about how hard it is to use compiler plugins, there might be some questions about if it's really worth it to support them at all. My useless answer is "it depends" and "isn't that the ultimate question of any feature". If you have a plugin that can eliminate bug classes, is it worth the maintenance burden? I say yes. One long term option is to get features merged into the main gcc trunk so they don't have to be carried as plugins. Some of the tweaks are kernel specific though, so we're probably stuck carrying the more useful plugins. There is interest in new compiler flags and features so we'll have to see what happens in the future.


DevConf 2018

| categories: fedora

DevConf 2018 happened. A grand time was had by me (and hopefully others)

Robbie Harwood gave an overview of Kerberos for Developers. Kerberos has a reputation for being difficult to use and manage. As far as I can tell, maintaining a server can still be tricky but using it as a developer has improved significantly. There are several libraries available, including bindings in python which were demoed. Although I don't do much with Kerberos applications usually, it's good to know there are easy to use APIs available.

There was a joint presentation on Hardware Root of Trust. This was an overview of current TPM support. TPMs have historically been somewhat controversial as they have been associated with reducing user freedom. TPMs are also very good at providing a secure way to store keys for protecting data, which was much of the focus of the talk. There's been ongoing work to make TPMs do useful things such as disk encryption. The TPM software support has come a long way and I look forward to seeing new uses.

Ulrich Drepper gave a talk on processor architectures. This seemed very timely given the recent speculative execution shenanigans. There was a lot of focus on the existing Intel architecture and its limitations. We're beginning to hit physical limits to increase speed (see the slides about memory power use). As processor architectures get more complex, compilers and programmers have to improve as well. Sometimes I do miss working with hardware (until it breaks of course).

Don Zickus talked about some ongoing Kernel CI work. The upstream kernel project has had some level of continuous integration (CI) for a while now. One of the best known efforts is Intel 0-day testing. Don talked about why Red Hat is interested in supporting something similar for upstream patches (it's easier to prevent buggy patches from being merged than to fix them later!). I've been following this project for a while now and look forward to see it come to fruition in the near future.

Randy Barlow and Patrick Uiterwijk talked about rebuilding containers. This seems like a task that's very easy (You just rebuild them right?) but it turns out to be difficult to coordinate across the entire project. They talked about an abandoned approach and the current method using buildroot overrides.

Several members of the Fedora council ran a Fedora panel. This was an open QA session and all the panelists gave thoughtful answers to questions (as you'd expect). The video is worth watching to see the topics covered.

Thorsten Leemhuis talked about regressions in the Linux kernel. This is a task he's picked up somewhat recently and is important to me as both a kernel developer and a distro maintainer. His talk emphasized why users are so important to regression tracking and the basics of such work. This was a very good reference and I hope to link to it in the future.

There was a talk about out-of-tree modules. Fedora has a policy of not shipping out of tree modules mostly for practical reasons. Sometimes users have reasons for wanting to use out of tree modules though and they are free to do so. The biggest issue tends to be keeping the external module in sync with the tree. The talk covered ways maintainers can keep modules in sync as well as methods for users to rebuild (akmods etc.). Having good information on out of tree modules is important for those users who want/need them.

Transitioning packages from python2 to python3 is ongoing. There was a talk about some of this work. It's easy to get supposedly simple changes like a name change wrong. I don't have nearly as much experience with packaging as some people so this was a nice review of packaging in addition to a good set of lessons learned.

Patrick Uiterwijk talked about autosigning. Users rely on digital signatures to provide some level of trust on the packages they get from Fedora. Much of the signing work use to be done manually by humans who are prone to human failures. Patrick and others have worked hard over the last year to have more signing happen automatically. This talk was a nice overview of Fedora root of trust and a discussion of what exactly it takes to keep that trust.

I had a good time meeting everyone and look forward to another DevConf.


When the canary breaks the coal mine

| categories: fedora

Nobody likes it when kernels don't work and even less so when they are broken on a Friday afternoon. Yet that's what happened last Friday. This was particularly unsettling because at -rc8, the kernel is expected to be rock solid. An early reboot is particularly unsettling. Fortunately, the issue was at least bisected to a commit in the x86 tree. The bad commit changed code for an AMD specific feature but oddly the reboot was seen on non-AMD processors too.

It's easy to take debug logs for granted when you can get them. The kernel nominally has the ability for an 'early' printk but that still requires setup. If your kernel crashes before that, you need to start looking at other debug options (and your life choices). This was unfortunately one of those crashes. Standard x86 laptops don't have a nice JTAG interface for hardware assisted debugging. Debugging this particular crash was not particularly feasible beyond changing code and seeing if it booted.

I ended up submitting the bisection to the upstream developers. Nobody could immediately see anything wrong with the commit, and few people could reproduce the problem. Ingo Molnar suggested a bunch of reasons why very early boot code tends to break. One of his suggestions was to do a diff between good and bad object files to check the relocations. Interestingly, this showed a new call to __stack_chk_fail. When kernel (or any code) is compiled with -fstack-protector, the compiler adds code to verify the stack canary. If the stack canary is overwritten, the code branches to __stack_chk_fail. On x86, checking the stack canary is handled via per-cpu functions. These also need to be set up so using them in very early code is not going to work. This explained why the crash happened on a supposedly good commit: the code did some refactoring to put a structure on the stack which was big enough to trigger the stack protector checking. The developers who submitted the code probably weren't testing with the strong stack protector so they would not have caught this. The fix ended up being simple: put __nostackprotector on the refactored function.

This code came in as part of the ongoing work for Spectre/Meltdown. All of this work has been an important reminder of why the kernel (usually) follows a particular schedule of when new features are accepted. Bugs are always going to happen but the goal is to find them during the merge window or -rc1, not -rc8. This kernel got a -rc9 release in part due to this bug, hopefully the kernel comes out on schedule this Sunday.


There will always be hardware bugs

| categories: hottakes, fedora

By now everyone has seen the latest exploit, meltdown and spectre, complete with logos and full academic paper. The gist of this is that side channel attacks on CPUs are now actually plausible instead of mostly theoretical. LWN (subscribe!) has a good collection of posts about actual technical details and mitigations. Because this involves hardware and not just software, fixes get more complicated.

In my previous job, I worked on kernels for mobile phones. This involved working with new hardware. I love working with hardware but one thing you learn pretty quickly is that hardware will have bugs. Sometimes the hardware team has already found them and will give a workaround. Other times you spend weeks chasing weird crashes and going back and forth with the hardware team. One of the challenging issues when working across teams in any area is communicating your domain expertise and listening to others expertise. There can be a lot of "well how about we just..." and talking across each other. Once upon a time, some hardware was not working the way we expected and we were talking to the hardware team. They were having trouble reproducing the behavior seen on our complex Android stack so we were running a series of experiments on our setups. Much of the actual work was figuring out how to take the requests from the hardware team and translate them into something reasonable for the kernel (e.g. where does "after each TLB flush" apply). Sometimes the experiments weren't actually feasible due to how the kernel was written.

If you are lucky (or unlucky depending on your view), you may find a hardware bug. The question then comes what to do. There may, again, be back and forth about what's actually an acceptable workaround. "Just run this sequence of code sometimes" may sound simple to the hardware team but might be impractical to actually implement in the kernel. The performance penalties can be high if part of the microarchitecture needs to be turned off. Sometimes the answer turns out to be "pretty please don't run this sequence of code which should never get generated by a reasonable compiler". Obviously, if an issue has security implications you may need to just take a performance hit but not implementing a workaround can be a valid decision.

Part of the discussion around all this has been a call for more open source hardware. This is absolutely a worthwhile goal. Most processors support adjusting various microarchitecture features. This is mostly for verification purposes but it's also useful if there's a need to disable a feature such as a prefetcher or branch predictor. The microarchitecture is usually considered proprietary and as such it's next to impossible to figure out how to make changes without consultation from the hardware team. So an open source hardware design would allow for better insight into the microarchitecture. What most people miss about open hardware is that you still have all the problems of hardware. Unless you're running on an FPGA, you can't just drop in a new hardware revision immediately. You're still going to have to implement software workarounds. The value of open hardware comes from freedom of licensing but not freedom from bugs.

Calling all this an "Intelocolypse" is deeply unfair as basically all modern processors from multiple vendors were affected here. It's a fundamental flaw in most implementations. It's certainly possible for each vendor/architecture to give a workaround but because of the severity here, there are proposals to fix this in generic kernel code. As has been mentioned though, many of the fixes are still under review so we'll have to see what happens. A big shout out to all the hardware and software developers who spent time coming up with proposals.


Next Page ยป