Kernels need updates, no really

| categories: fedora

Google has been announcing new details about its next Android release, Oreo. One of the items that came out is a new requirement for a minimum kernel version. SoC manufacturers must now use a kernel that is greater than 4.4, one of the long term stable (LTS) kernels maintained by Greg Kroah-Hartman. Android has long prided itself on differentiation and given device makers a lot of latitude. This has not infrequently led to fragmentation and difficulties with device upgrades. Google has started to work towards fixing this with efforts like project treble.

One aspect that many people like about mandatory kernel versions is increased security. The argument is that newer kernel versions already have all the security fixes and features so they are going to be more secure. This is true, to a degree. Kernel 4.4.x should cover everything 3.18.x did plus more. The problem with this argument is that the kernel does not stop updating. A a mandatory kernel version ensures a base layer of protection but will not protect against new threats. A newer kernel will make it easier to apply fixes but this involves the device maker actually pushing out updates. Requiring a 4.4.x kernel isn't going to help against StackCowBleed if your device never gets the update. Mandating a newer kernel version isn't going to make device updates easier if you have to deal with a million lines of out of tree code either.

A move towards a standard for kernels is a step in the right direction for the Android ecosystem. This needs to be coupled with a continual effort to get code upstream and deliver regular updates though. Hats off to the Android team and device makers who continually work to make this better.

Flock 2017

| categories: fedora

Last week was Flock 2017 in Hyannis, MA. I was there!

I ran a session on kernel process for Fedora. This was designed to be an open session for discussion of whatever topics people wanted. We spent quite a lot of time on the future of Fedora kernel testing. Fedora has been discussing continuous integration across the project as a way to improve overall quality. The kernel has a set of tests that get run on every kernel build. There's interest from within Red Hat (my employer) to expand on this further. Red Hat recently publicly released one of their basic test suites for kernel testing. The ultimate goal is to use this plus other tests cases to run a service similar to the Intel 0-day testing for upstream kernels. This way, bugs can be found and hopefully fixed sooner.

There was some discussion about a potential increase in bugs with a move to CI. There are only two people full time on the Fedora kernel vs. a lot more bugs and reporters. How do we make additional reports scale? This has been a problem with Fedora for a long time and there still isn't a good answer. Trying to turn all contributors into kernel developers isn't very practical. What is practical is supporting contributors who do have the time and skills to bisect and report bugs to the upstream community. The hope is also that bugs which do get reported from the CI effort will be of high enough quality to reliably solve, or at least report.

The kernel session was very productive. One of the items that came out of it was the idea for a kernel test day. Details about this will be coming as soon as they are arranged.

Apart from my own session, I went to a couple of talks about ARM given by Peter Robinson and Robert Wolff. Peter gave his usual "State of Fedora on ARM" talk. The state is pretty great, thanks to his hard work. More and more boards are enabled with each release and hardware features continue to be added. There's an ongoing project to make installation more 'boring' by adding support to uBoot. Robert Wolff talked about supporting Fedora on 96boards based devices. As more and more devices get hardware support upstream, it gets more plausible to support them in Fedora. I expect support will only continue to improve as newer versions of the hardware specification come out.

I spent most of the rest of my time in the hallway track. Highlights there:

  • Chatting about Outreachy. The next round is coming up shortly so look for the CFP soon.
  • i686 kernels. The i686 SIG is slowly getting started. Justin and I gave some suggestions on what it might take for them to be successful.
  • Syncing up on a couple of ongoing bug reports.
  • Stories about hardware older than me.

Thanks to all the organizers for putting together a great conference and giving me an excuse to eat delicious Cape Cod food.

Fun with stacks

| categories: fedora

Like much of the kernel, most people don't think about the kernel stack until something goes wrong. Several topics have come up recently related to kernel stacks.

Back in June, a critical bug called stack clash was publicly disclosed by a security research firm. For purposes of this discussion, the runtime heap typically starts at low addresses and grows up. The stack starts at high addresses and grows down (shouting "the stack grows down" is a time honored tradition when working with the stack). The heap is typically managed by some memory manager (usually your libc malloc) with explicit calls to brk to increase the heap size. The program stack grows automatically as it is used. The logic for determining if an access is part of the automatic stack or a bogus access is approximately "if it's close enough to the bottom of the existing stack, it's probably fine. Trust me." As you might expect, things go poorly if the stack grows next to the heap memory and starts using that as a stack. Several years ago, the kernel added a guard page to help mitigate this problem. Instead of immediately growing into the heap right below the stack, the program would access an unmapped page and then fault. "page" here refers to a region of memory that is literally a PAGE_SIZE, typically 4K. The stack clash researchers discovered several vulnerabilities in userspace programs that allowed a jump of larger than a PAGE_SIZE, thus defeating the guard page.

The biggest issue with this vulnerability is that it's essentially a design limitation. There's nothing guaranteeing any behavior to completely mitigate the problem. Userspace can allocate as much junk on the stack as it wants and call alloca to its heart content until it runs out of space. The kernel added a work around to increase the gap between the stack and VMAs. The commit text freely admits this isn't a full fix since it only decreases the chance of some userspace program managing to grow the stack pointer into another region. The gcc developers have a proposal for an actual mitigation by probing the stack at regular intervals to make sure the guard page gets hit. This does require recompiling programs with the appropriate flag so the kernel work around is still important to have.

In the self-protection area, Alexander Popov posted a port of the stackleak plugin from Grsecurity/PaX. Information leaks from the kernel to userspace can be combined with other bugs to give full kernel exploits. A common source of information leaks is copying uninitialized stack data to userspace. The stackleak plugin aims to mitigate this by clearing the stack after each system call, reducing the chance of kernel data getting leaked to userspace. The plugin part of the stackleak plugin is a gcc plugin to call track_stack on functions with a stackframe over a certain size. track_stack updates the lowest value of the stack pointer. When a system call finishes, the area between the top of the stack and the lowest stack pointer is cleared. The Grsecurity/PaX version only included support for x86. I made a first pass attempt at a version for arm64. Apart from being useful for full architecture support, this was a helpful exercise to figure out what assumptions the existing code was making. Hopefully feedback will continue to come in so the series can make progress towards merging.

When tools break the kernel

| categories: fedora

The kernel is really self-contained. This makes it great for trying experiments and breaking things. It also means that most bugs are also going to be self-contained. I say most because the kernel still has dependencies on other core system packages and when those change, the kernel can break as well.

All the low level packages on your system are usually so well maintained you don't even realize they are present1. binutils provides tools for working with binary files. The assembler will get updates for features such as instruction set updates. Changes like these can break the kernel unexpectedly though. glibc is another popular package for updates which break the kernel. The word 'break' here does not mean the changes from glibc/binutils were incorrect. The kernel makes a lot of assumptions about what's provided by external packages and things are bound to get out of sync occasionally. This is a big part of the purpose of rawhide: to find dependency problems and get them fixed as soon as possible.

Updates to the compiler can be more ambiguous about whether or not a change is a regression. Compiler optimizations are designed to improve code but may also change the behavior in unexpected ways. A good example of this is some recent optimizations related to constants. For those who haven't studied compilers, constant folding involves identifying expressions that can be evaluated to a constant at compile time. gcc provides a builtin function __builtin_constant_p to let code behave differently depending on if an expression can be evaluated to a constant at compile time. This sounds fairly simple for cases such as __builtin_constant_p(0x1234) but it turns out to be more complex for actual code when combined with more complex compiler analysis. The end result is that a new compiler optimization broke some assumptions about how the kernel was using __builtin_constant_p. One of the risks of using compiler builtin functions is that the behavior is defined but only to some degree. Developers may argue that a compiler is doing something incorrect but it turns out to be easier just to fix the kernel.

Sometimes the compiler is just plain wrong. New optimizations may eliminate critical portions of code. Identifying such bugs is a special level of debugging. Typically, you end up staring at the code wondering how it could end up in such a situation. Then you get an idea that staring at assembly will somehow be less painful at which point you notice that a critical code block is missing. This may be followed by yelling. For kernel builds, comparing what gets pulled into the buildroot of working and non-working builds can be a nice hint that something outside the kernel has gone awry.

As a kernel developer, I am appreciative to the fantastic maintainers of the packages the kernel depends on. All the times I've reported issues in Fedora the maintainers have been patient and helpful in helping me figure out how to get the right debugging information to determine whether an issue is in gcc/binutils/glibc or the kernel. The kernel may be self-contained but it still needs other packages to work.

  1. Until your remove them with the --force option, then you really miss them. 

Boring rpm tricks

| categories: fedora

Several of my tasks over the past month or so have involved working with the monstrosity that is the kernel.spec file. The kernel.spec file is about 2000 lines of functions and macros to produce everything kernel related. There have been proposals to split the kernel.spec up into multiple spec files to make it easier to manage. This is difficult to accomplish since everything is generated from the same source packages so for now we are stuck with the status quo which is roughly macros all the way down. The wiki has a good overview of what all goes into the kernel.spec file. I'm still learning about how RPM and spec files work all the time but I've gotten better at figuring out how to debug problems. These are some miscelaneous tips that are not actually novel but were new to me.

Most .spec files override a set of default macros. The default macros are defined at @RPMCONFIGDIR@/macros which typically gets expanded to /usr/lib/rpm/macros. More usefully, you can put %dump anywhere in your spec file and it will dump out the current set of macros that are defined. While we're talking about macros, be very careful about whether to check if a macro is undefined vs. set to 0. This is a common mistake in general but I seem to get bit by it more in spec files than anywhere else.

Sometimes you just want to see what the spec file looks like when it's expanded. rpmspec -P <spec file> is a fantastic way to do this. You can use the -D option to override various macros. This is a cheap way to see what a spec file might look like on other archictectures (Is it the best way to see what a spec file looks like for another arch? I'll update this with a note if someone else me another way).

One of my projects has been looking at debuginfo generation for the kernel. The kernel invokes many of the scripts directly for historical reasons. Putting bash -x before a script to make it print out the commands makes it much easier to see what's going on.

Like I said, none of these are particularly new to experienced packagers but my day gets better when I have some idea of how to debug a problem.

Next Page ยป