A rawhide debugging story

| categories: fedora, rawhide

Usually by this time in the kernel cycle, most of the major kernel work is done and rawhide 'just works'. I was grumpy to see that today's rawhide build failed:

scripts/recordmcount.c: In function 'do_file':
scripts/recordmcount.c:466:28: error: 'R_METAG_ADDR32' undeclared
(first use in this function)
  case EM_METAG:  reltype = R_METAG_ADDR32;
                            ^~~~~~~~~~~~~~
scripts/recordmcount.c:466:28: note: each undeclared identifier is
reported only once for each function it appears in
scripts/recordmcount.c:468:20: error: 'R_METAG_NONE' undeclared
(first use in this function)
     rel_type_nop = R_METAG_NONE;
                    ^~~~~~~~~~~~

I expected this to be some last minute change that snuck in but there were no changes that came in which would affect this. So what gives?

This is at the top of scripts/recordmcount.c

#ifndef EM_METAG
/* Remove this when these make it to the standard system elf.h. */
#define EM_METAG      174
#define R_METAG_ADDR32                   2
#define R_METAG_NONE                     3
#endif

The way this is setup, if EM_METAG is defined that means the relocation symbols should be defined as well, if not recordmcount.c includes the defintion. Looking at the #defines and preprocessed output would be really helpful here. Generally the kernel makes it easy to do this. You can do

$ make path/to/file/name.i
$ make mm/page_alloc.i

and file_name.i will contain the preprocessed output. This file was a little bit different so it wasn't being picked up as expected. It was easier to run a modified version of the command. V=1 on the make command shows the commands that are being run which gave the command

gcc -Wp,-MD,scripts/.recordmcount.d -Wall -Wmissing-prototypes
-Wstrict-prototypes -O2 -fomit-frame-pointer -std=gnu89
-I./tools/include  -o scripts/recordmcount scripts/recordmcount.c

This also gave me the hint that no other unusual include paths were being added (another cause of "why is this #defined"). Adding -E to that command will stop at preprocessing and -dM will dump all the #defines. Run this, and yup

#define EM_METAG 174

There it is without any of the relocation symbols defined. So what's defining this? There aren't any many header files in recordmcount.c but a good candidate is <elf.h> which is a system header file. The expanded preprocessor output shows it as /usr/include/elf.h. A call to dnf provides /usr/include/elf.h says that glibc-headers provides this file.

glibc did get an update recently which included a new glibc snapshot. Looking at the commit log for glibc, yes, there was a commit which added the EM_METAG macro but did not add the #defines for relocation symbols. The workaround/fix is pretty simple: give each relocation symbol its own #ifdef check until the rest of the relocation symbols actually get added.

Once again, the kernel is not an island. It depends on other packages. This represents why rawhide exists. We run at the bleeding edge so we can find these bugs before anything ever goes stable. Hopefully this will be the last actual work for this rawhide release.


Module filtering and depmod

| categories: fedora, rawhide

Rawhide has been quiet since the first week of the merge window. The 2nd week had a smattering of kernel options to be enabled but almost no conflicts. -rc1 and -rc2 have been fairly easy as wel. The most significant work was getting the module filtering correct last week.

Sometime in 2014 the kernel was split up into kernel-core and kernel-modules subpackages. The motivation was that systems that wanted a smaller footprint (e.g. cloud) could install only the kernel-core package and get a reasonably running system. Kernel modules are just chunks of kernel-code that get loaded at runtime. Modules are not completely self-contained though. They have dependencies on the core kernel and possibly other modules1.

The depmod tool is designed to find dependency problems (among other uses). The Fedora kernel flow goes roughly

  • build modules

  • generate a list of modules using some shell scripts. That list is what will go in kernel-modules.

  • take that list of modules out of the tree. What's left will go in kernel-core.

  • run depmod to verify all modules still in kernel-core are still loadable.

Typically problems arise when new modules are added or modules are renamed. Case in point cxgbit from 4.7.0-0.rc1.git1.1.fc25 (edited slightly for ease of reading):

depmod: WARNING: drivers/target/iscsi/cxgbit/cxgbit.ko needs unknown symbol cxgb4_clip_get
depmod: WARNING: drivers/target/iscsi/cxgbit/cxgbit.ko needs unknown symbol cxgb4_l2t_send
depmod: WARNING: drivers/target/iscsi/cxgbit/cxgbit.ko needs unknown symbol cxgb4_port_viid
depmod: WARNING: drivers/target/iscsi/cxgbit/cxgbit.ko needs unknown symbol cxgb4_alloc_stid
depmod: WARNING: drivers/target/iscsi/cxgbit/cxgbit.ko needs unknown symbol cxgbi_ppm_init
depmod: WARNING: drivers/target/iscsi/cxgbit/cxgbit.ko needs unknown symbol cxgb4_ofld_send
depmod: WARNING: drivers/target/iscsi/cxgbit/cxgbit.ko needs unknown symbol cxgb4_remove_tid
depmod: WARNING: drivers/target/iscsi/cxgbit/cxgbit.ko needs unknown symbol cxgb4_port_chan
depmod: WARNING: drivers/target/iscsi/cxgbit/cxgbit.ko needs unknown symbol cxgb4_unregister_uld
depmod: WARNING: drivers/target/iscsi/cxgbit/cxgbit.ko needs unknown symbol cxgb4_free_stid
depmod: WARNING: drivers/target/iscsi/cxgbit/cxgbit.ko needs unknown symbol cxgbi_ppm_ppod_release
depmod: WARNING: drivers/target/iscsi/cxgbit/cxgbit.ko needs unknown symbol cxgb4_create_server6
depmod: WARNING: drivers/target/iscsi/cxgbit/cxgbit.ko needs unknown symbol cxgb4_l2t_release
depmod: WARNING: drivers/target/iscsi/cxgbit/cxgbit.ko needs unknown symbol cxgb4_clip_release
depmod: WARNING: drivers/target/iscsi/cxgbit/cxgbit.ko needs unknown symbol cxgbi_ppm_ppods_reserve

The cxgbit module was enabled in the kernel config but other modules it depends on were filtered into the kernel-modules package. The fix is usually simple, just filter the cxgbit module into the kernel-modules subpackage. Sometimes it takes multiple tries to actually get right. As of this writing there was still a similar issue with powerpc as well.

Testing the module filtering tends to be a slow process because it comes at the end of the build. It's not easy to restart partially because modules are removed from the tree. Longer term, I'd like to figure out a better way to aid in debugging filtering problems.


  1. Those dependencies are one reason why out of tree modules are a royal pain. In tree modules will get API/rename/whatever updates automatically. Out of tree modules will not. Consider this your periodic PSA about out of tree modules and why supporting them is hard. 


Rawhide blogging

| categories: fedora, rawhide

April/May started the cycle of planning for the next year here in Red Hat. This means it's time to write goals. Goals are supposed to be SMART. This is easier for some parts of my job than others. One of my primary responsibilities is making Fedora releases. While discussing my goals, it was pointed out that just saying "Did I make releases?" doesn't fully capture what I do. Scripts can make releases. I can't yet replace myself with a script so this job must involve not being useless.

I'm now on rawhide for the 4.7 release so as an experiment I'm going to try writing about what work goes into making some of the rawhide releases. I have no idea if this will be valuable. At a minimum more people will get a chance to see how the sausage gets made. Let's see what happens.


Rawhide Week 5/16-5/20

| categories: fedora, rawhide

This was the first week of the merge window for the 4.7 kernel. This included the merges of at least 43 trees (probably a few more that I didn't list as being relevant to Fedora). Highlights for this work:

  • The secure boot patches needed to be adjusted several times. Being a large out of tree patch set, this is bound to occur sometimes. It usually happens because someone tweaked a little bit of context or added a new #define. This merge window brought in some new work done by David Howells which had major conflicts with one of the secure boot patches. That series reworked most of the code paths the secure boot implementation was touching. Part of the series implemented a similar feature to what the secure boot patches was trying to do (see line about "could also be used to provide blacklisting"). For now, I left the secure boot patch out in favor of what's in tree with a plan to follow up later.

  • The cpupower library got an soname bump The only package I could find that actually uses this was part of the mate desktop. Here's hoping any other users are doing the right thing.

  • This merge has once again brough in the deletion of a binary file. I was curious how many other binary files are in the kernel. The answer is not that many. I also found out that file gets really confused on some kernel files:

    $ file arch/s390/boot/compressed/misc.c

    arch/s390/boot/compressed/misc.c: Minix filesystem, V3, 20302 zones

    $ file arch/alpha/include/asm/atomic.h

    arch/alpha/include/asm/atomic.h: Embedded OpenType (EOT)

    $ file drivers/gpu/drm/amd/amdgpu/amdgpu_powerplay.h

    drivers/gpu/drm/amd/amdgpu/amdgpu_powerplay.h: TI-XX Graphing Calculator (FLASH)

The merge window brought in the usual set of Kconfig changes. Highlights there:

  • LEDs can now be triggered to blink on MTD activity and on kernel panic.

  • Not actually part of the merge but the Intel power clamp driver was turned on per request on the mailing list.

  • Asus i2c keyboard support for EeeBook X205TA and VivoBook E200HA

  • ASoC supoort for Broxton platforms with RT298 audio codec driver

  • INT3406 display thermal driver

  • Support for the schedutil governor. This ties the cpu frequency to output from the scheduler.