The cabbage patch for linker scripts

| categories: fedora

Quick quiz: what package provides ld? If you said binutils and not gcc, you are a winner! That's not actually the story, I just tend to forget which package to look at when digging into problems. This is actually a story about binutils, linker scripts, and toolchains.

Usually by -rc4, the kernel is fairly stable so I was a bit surprised when the kernel was failing on arm64:

ld: cannot open linker script file ldscripts/aarch64elf.xr: No such file or directory

There weren't many changes to arm64 so it was pretty easy to narrow down the problem to a seemingly harmless change. If you are running a toolchain on a standard system such as Fedora, you will probably expect it to "just work". And it should if everything goes to plan! binutils is a very powerful library though and can be configured to allow for emulating a bunch of less standard linkers, if you run ld -V you can see what's available:

$ ld -V
GNU ld version 2.29.1-23.fc28
  Supported emulations:

This is what's on my Fedora system. Depending on how your toolchain is compiled, the output may be different. A common variant toolchain setup is the 'bare metal' toolchain. This is (generally) a toolchain that's designed to compile binaries to run right on the hardware without an OS. The kernel technically meets this definition and provides all its own linker scripts so in theory you should be able to compile the kernel with a properly configured bare metal toolchain. What the harmless looking change did was switch the emulation mode from linux to one that works with bare metal toolchains.

So why wasn't it working? Looking across the system, I found no trace of the file aarch64elf.xr, yet clearly it was expecting it. Because this seemed to be something internal to the toolchain, I decided to try another one. Linaro helpfully provides toolchains for compiling arm targets. Turns out the Linaro toolchain worked. strace helpfully showed where it was picking up the file1:

lstat("/opt/gcc-linaro-7.1.1-2017.08-x86_64_aarch64-linux-gnu/aarch64-linux-gnu/lib/ldscripts/aarch64elf.xr", {st_mode=S_IFREG|0644, st_size=5299, ...}) = 0

So clearly the file was supposed to be included. Looking at the build log for Fedora's binutils, I could definitely see the scripts being installed. Further down the build log, there was also a nice rm -rf removing the directory where these scripts were installed to. This very deliberately exists in the spec file for building binutils with a comment about gcc. The history doesn't make it completely clear, but I suspect this was either intended to avoid conflicts with something gcc generated or it was 'borrowed' from gcc to remove files Fedora didn't care about. Linaro, on the other hand, chose to package the files with their toolchain. Given Linaro has a strong embedded background, it would make sense for them to care about emulation modes that might be used on more traditional embedded hardware.

For one last piece of the puzzle, if all the linker scripts are rm -rf'd why does the linker work at all, shouldn't it complain? The binutils source has the answer. If you trace through the source tree, you can find a folder with all the emulation options, along with the template they use for generating the structure representation. There's a nice check for $COMPILE_IN to actually build a linker script into the binary. The file is actually responsible for generating all the linker scripts and will compile in the default script. This makes sense, since you want the default case to be fast and not hit the file system.

I ended up submitting a revert of the patch since this was a regression, but it turns out Debian suffers from a similar problem. The real take away here is toolchains are tricky. Choose yours carefully.

  1. You also know a file is a bit archaic when it has a comment about the Solaris linker 

What's a kernel devel package anyway

| categories: fedora

One of the first concepts you learn when building open source software is the existance of -devel packages. You have package foo to provide some functionality and foo-devel for building other programs with the foo functionality. The kernel follows this pattern in its own special kernel way for building external modules.

First a little bit about how a module is built. A module is really just a fancy ELF file compiled and linked with the right options. It has .text, .data, and other kernel specific sections. Some parts of the build environment also get embedded in modules. Modules are also just a socially acceptable way to run arbitrary code in kernel mode. Modules are loaded via a system call (either by fd or an mmaped address). The individual sections (.text., .data etc.) get placed based on the ELF header. The kernel does some basic checks on the ELF header to make sure it's not complete crap (loading for an incorrect arch etc.) but can also do some more complicated verification. Each module gets a version magic embedded in the ELF file. This needs to match the running kernel but can be overridden with a force option. There's also CONFIG_MODVERSIONS which will generate a crc over functions and exported symbols to make sure they match the kernel that was built. If the CRC in the module and kernel don't match, the module loading will fail.

Now consider an out of tree module. The upstream Linux kernel doesn't provide an ABI guarantee. In order to build an external module, you need to use the same tree that was used to build the kernel. You might be able to get away with using a different base but it's not guaranteed to work. These requirements are well documented. Actually packaging the entire build tree would be large and unecessary. Fedora ends up packaging a subset of the build tree:

  • Kconfigs and Makefiles
  • header files, both generic and architecture specific
  • Some userspace binaries built at make modules_prepare time
  • The kernel symbol map
  • Module.symvers
  • A few linker files for some arches

Annoyingly, because each distribution does something different, all of this has to be done manually. This also means we find bugs when there are new dependencies that need to be packaged. I really wish we could just get away with building the module dependencies at runtime but doesn't work with the requirements.

More kbuild for reproducible builds

| categories: fedora

I'm still working on patches to deal with build ids for the kernel. One issue I spent way too long figuring out was that if you just do a basic make for the kernel, some local environment information will be picked up on each build. This means that the build id will not be the same between builds of the same source tree because the sha1 sum is going to be different. This has the funny effect of meaning that the problem of unique build ids is actually solved for the vmlinux itself but still not modules or the vDSO.

Among the list of common commands you learn for Linux is uname. If you run uname -a you'll see something like

Linux localhost.localdomain 4.17.0-0.rc3.git4.1.fc29.x86_64 #1 SMP
Fri May 4 19:41:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

What's most interesting for this discussion is a subset with uname -v

#1 SMP Fri May 4 19:41:58 UTC 2018

This is some version information about when this kernel was built. All this can technically be namespaced but by default these values come from generated defines at compile time, specifically UTS_VERSION. You can see how this gets generated from scripts/mkcompile_h

The timestamp is fairly obvious and the Kbuild infrastructure provides an easy override to set it to a fixed value (KBUILD_BUILD_TIMESTAMP= some string that can be passed to date -d). A bit more obtuse (at least for me) was the #1. This is a value stored in a file called .version. This gets updated every time scripts/ is run. It is, in fact, designed to be a release number to differentiate between builds. After too many hours of debugging it also ends up feeling like some sort of achievement for a video game ("You have managed to compile the kernel .version times while working on this particular issue.") This can also be set with KBUILD_BUILD_VERSION.

The short and sweet summary is that if I actually want to verify things with build ids I can set KBUILD_BUILD_TIMESTAMP and KBUILD_BUILD_VERSION to fixed values to get a consistent build id across compiles. It's worth noting that modules can end up with a consistent build id without setting anything extra because they (typically) don't use UTS_VERSION anywhere. Now all I need to do is finish cleaning up some patches.

Fantastic kernel patches and where to find them

| categories: fedora

I've griped before about kernel development being scattered and spread about. A quick grep of MAINTAINERS shows over 200 git trees and even more mailing lists. Today's discussion is a partial enumeration of some common mailing lists, git trees and patchwork instances. You can certainly find some of this in the MAINTAINERS file.

  • LKML. The main mailing list. This is the one everyone thinks of when they think 'kernel'. Really though, it mostly serves as an archive of everything at this point. I do not recommend e-mailing just LKML with no other lists or people. Sometimes you'll get a response but think of it more as writing to your blog that has 10 followers you've never met, 7 of which are bots. Or your twitter. There is a patchwork instance and various mail archives out there. I haven't found one I actually like as much as GMANE unfortunately. The closest corresponding git tree is the master where all releases happen.

  • The stable mailing list. This is where patches go to be picked up for stable releases. The stable release have a set of rules for how patches are picked up. Most important is that the patch must be in Linus' tree before it will be applied to stable. Greg KH is the main stable maintainer. He does a fantastic job for taking care of the large number of patches that come in. In general, if a patch is properly tagged for stable yes it will show up eventually. There is a tree for his queue of patches to be applied along with stable git trees

  • Linux -next. This is the closest thing to an integration tree right now. The goal is to find merge conflicts and bugs before they hit Linus' tree. All the work of merging trees is handled manually. Typically subsystem maintainers have a branch that's designated for -next which gets pulled in on a daily basis. Running -next is not usually recommended for anything more than "does this fix your problem" unless you are willing to actively report bugs. Running -next and learning how to report bugs is a great way to get involved though. There's a tree with tags per day.

  • The -mm tree. This gets its name from memory management but really it's Andrew Morton's queue. Lots of odd fixes end up getting queued through here. Officially, this gets maintained with quilt. The tree for -next "mmotm" (mm of the moment) is available as a series. If you just want the memory management part of the tree, there's a tree available for that.

  • Networking. netdev is the primary mailing list which covers everything from core networking infrastructure to drivers. And there's even a patchwork instance too! David Miller is the top level networking maintainer and has a tree for all your networking needs. He has a separate -next tree. One thing to keep in mind is that networking patches are sent to stable in batches and not just tagged and picked up by Greg KH. This sometimes means a larger gap between when a patch lands in Linus' branch and when it gets into a stable release.

  • Fedora tree. Most of the git trees listed above are "source git/src-git" trees, meaning it's the actual source code. Fedora officially distributes everything in "pkg-git" form. If you look at the official Fedora kernel repository, you'll see it contains a bunch of patches and support files. This is similar to the -mm and -stable-queue. Josh Boyer (Fedora kernel maintainer emeritus) has some scripts to take the Fedora pkg-git and put it on This gets updated automatically with each build.

  • DRM. This is for anything and everything related to graphics. Most everything is hosted a, including the mailing list. Recently, DRM has switched to a group maintainer model (Daniel Vetter has written about some of this philosophy before). Ultimately though, all the patches will come through the main DRM git repo. There's a DRM -tip for -next like testing of all the latest graphics work. Graphics maintainers may occasionally request you test that tree if you have graphics problems. There's also a patchwork instance.

Some notes on recent random numbers

| categories: fedora

By now people may have seen complaints of boot slowdown on newer kernels. I want to explain a little more about what's going on and why Fedora seems to be particularly hard hit.

The Linux kernel has a random number generator in drivers/char/random.c. This provides several interfaces for random numbers to the system. There are two main interfaces for random numbers: /dev/random and /dev/urandom. /dev/random is designed to be "secure", meaning it is sufficiently random that it can be used for things like cryptography keys. /dev/urandom is "random" in the sense that most humans won't detect a pattern but sufficient mathematical analysis might find a weakness.

Random number generators rely on entropy to work properly. You can't just make up entropy, the system has to get it from somewhere. At boot the kernel assumes it has no entropy and relies on various parts of the system (interrupts, timer ticks etc.) to give it entropy. /dev/random is supposed to block if there is not sufficient entropy in the system (entropy is a finite resource and it can be drained). Google Project 0 recently discovered several flaws in the Linux RNG, among them that the RNG was marked as being available for cyptographically secure generation earlier than it should have. They provided patches to fix this which were applied by the RNG maintainer.

And then people started seeing issues, mostly a lot of messages about crng_init. It turns out, there were a lot of places in the kernel that were trying to get random numbers early in the kernel boot process that weren't as random as they might expect. Fedora had a particularly nasty problem where the compose machines were getting stuck. Trying to get more logs from the systemd journal didn't help. Eventually after some debugging with the infrasturcture team (and the help of sendkey alt-sysrq- t in the qemu monitor window), we were able to see that init was blocked on the getrandom systemcall for secure entropy. Interestingly enough, systemd only made non-blocking (insecure) random calls in its code.

I was lucky I could re-build kernels to reproduce the issue, so I decided to experiment a bit and return something unexpected from getrandom (-ENOMEM). This gave me an error message that (luckily) uniquely mapped to gcrypt. systemd links against gcrypt for some features, such as calculating an HMAC for the journal entries. None of that involved random numbers at bootup though so it didn't explain why things were getting stuck. After some more back and forth, Patrick Uiterwijk found a patch that gcrypt was carrying. If FIPS mode is enabled, the cryptographic system is initalied at constructor time (i.e. when it gets loaded by systemd). It turns out, the default images ship with dracut-fips which will put gcrypt into FIPS mode. So the very first time systemd went to open the journal to write something, it would load gcrypt which would attempt to initialize the random number system. (Fun fact, it also looks like the default in systemd is to do a write to a journal before the commandline options are parsed. So even adding an option to not write to the journal didn't help this case. I might be wrong here though?)

Despite the fact that these patches have some side-effects, they do fix a real issue and can't exactly just permanently reverted. So what to do? One easy answer is to give the system more entropy. On virtualized systems, this can be provided by the CONFIG_HW_RANDOM_VIRTIO option. Part of the fix also involves making sure userspace isn't actually trying to rely on secure random number generation too early since randomness is hard to come by early in boot. At least for now in Fedora, we've temporarily reverted the random series on stable (F27/F28) releases. The plan is to continue working with upstream and userspace developers to find a workable solution and bring back the patches when things are fixed.

Next Page ยป