Complaining about the kingdom of kernel

| categories: complaining, fedora

Jonathan Corbet of LWN gave a keynote at Linaro Connect about The kernel's limits to growth. The general summary was that the kernel had scaling problems in the late 90's (A single "B"DFL does not scale) but the developers figured out a method that was more sustainable. There's a growing concern that we're about to hit another scaling problems with insufficient maintainers. Solving this has gotten some attention of late. I have a lot of thoughts about maintainership and growing in the kernel (many of which can be summarized as "well nobody has told me to stop yet") but this is not that blog post. The talk mentioned that kernel development can be described as "A bunch of little feifdoms". This is a superb metaphor for so many things in Linux kernel land.

The terrible secret of Linux kernel development is that there really isn't a single kernel project. Sending stuff to just LKML is unlikely to go anywhere. Using will tell you who the maintainers are and what mailing lists to use1 but it won't tell you how the maintainer actually maintains or their preferences. There are some common documented guidelines for getting things in but there always seems to be an exception. The networking stack has a long list of the ways it is different. Some subsystems use patchwork as a method for tracking and acking patches. The ARM32 maintainer has his own separate system for tracking patches. DRM is embracing a group maintainer model.

The end result is that sending patches to different subsystems means figuring out a different set of procedures. This problem is certainly not unique to the kernel. The hardest part of open source is always going to be the social aspect and dealing with how others want to handle a project. No one tool is ever going to solve this problem. The kernel seems to be particularly in love with the idea of letting everyone do their own thing so long as it doesn't make anyone else too mad. I'm sure this worked great when all the kernel developers could fit in one room but these days having one set of procedures for the entire kernel would make things run much smoother.

If the kernel community is made up of feifdoms, then the kernel community itself is a strange archaic kingdom2. Many of Ye Olde kernel developers love to talk about why e-mail is the only acceptable method for kernel development. I'm going to pick on this talk for a bit. I can't deny that many of the other options aren't great. I refuse to believe that github having pull requests separate from the mailing list is actually worse than each subsystem having a completely separate mailing list though. Good luck if someone forgets to Cc LKML or if your mailing list3 doesn't have patchwork. Having everything go to mailing list also doesn't guarantee anyone will actually review it or learn from it. The way to learn from an open source community is to make deliberate time to read and review what's being submitted. People can learn whatever tool is available to make this happen if they want to be engaged with the community. Maybe this is e-mail, maybe this is github. Whatever. The harder part is making sure people want to use the preferred communication method to review what's going on in the community.

Once again, I seem to have come around to the point of community building, something which the Linux kernel community still seems to struggle at. The kernel community problems are well documented at this point and I don't feel like enumerating them again. The scaling problems of the kernel are only going to get worse if nobody actually wants to stick around long enough to become a maintainer.

  1. Among my list of petty grievances is that mailing lists can be hosted on a variety of servers so there isn't always a unified place to look at archives. RIP GMANE. 

  2. Insert Monty Python and the Holy Grail joke here 

  3. I love you linux-mm but either your patchwork is incredibly well hidden from me or it doesn't exist, both of which make me sad. 

That's not what I wanted Linux

| categories: complaining, fedora

Once upon a time, I was an intern at a company doing embedded Linux. This was a pretty good internship for a student. A lot of my work involved making builds of open source packages and fixing them when they failed in unusual embedded environments. One time, I was working in a new environment and halfway through a build of some package I got what was a cryptic message to me:

no: command not found

As a beginning developer, I was really confused by this message. It's saying "no the command isn't found". But what command? I don't remember much of how I debugged this but I eventually went through the build logs and came across

checking for perl

The autoconf script was set up incorrectly and set PERL=no instead of turning off perl or erroring out in the config stage. This was fixable by adding perl to my build environment. Alas, I don't think I fixed the autoconf.

Fast forward to the present day. Someone was reporting a build failure when rebuilding the rawhide kernel locally. I was seeing the same issue on my system:

install: cannot create directory
Not a directory

Checking the build tree, /usr/lib64/ was indeed not a directory. It was a binary file. Disassembling the binary file showed it was part of perf and seemed to be related to java. The build logs had this line.

install '/home/labbott/rpmbuild/BUILDROOT/kernel-4.10.0-0.rc2.git3.1.fc26.x86_64/usr/lib64'

install here behaves in a very *NIX manner. Without any other options, if lib64 exists as a directory, gets copied to the directory. This is what we expect to happen. If lib64 does not exist, the .so gets copied as a file named lib64. This is what was happening here. The fix is simple, check and create the directory exists before running the command. You could even add a trailing slash to ensure it's actually a directory.

So what is the moral of these stories? Laura enjoys complaining about Linux Your failure modes can produce really non-obvious behaviors if they don't actually fail. Error checking can be hard and Linux is cold and unfeeling when you screw up. Bugs will always happen, so review your code carefully.

Caching makes me cranky

| categories: complaining, fedora

Among issues with Ion is its incorrect use of the DMA APIs. I've briefly mentioned this before. My educated opinion is that it's a complete mess and that time travel would be a great solution to fix this this problem.

What the DMA APIs do underneath varies greatly depending on what the device is and what platform you are running on. DMA mapping can range from anything to setting up device page tables to just returning a physical address. With very high probability, your cell phone runs an ARM chip. With very high probability as well, your laptop is running some kind of x86 chip. The difference I'm going to highlight here is how the two architectures manage caches. Cache coherency is a topic worthy of PhD dissertations and many conference talks 1. The key point for this post is that the ARM architecture does not have the same cache guarantees as x86 so it needs explicit cache operations when transfering buffers between devices. The DMA mapping code for arm and arm64 includes explicit cache operations as part of mapping and the sync APIs. Ideally no driver should ever have to think about cache topology and if using the DMA APIs properly everything happens transparently. DMA APIs work properly by creating buffer ownership between the CPU and the device. When a driver calls dma_map_sg, the buffer now belongs to the device. The CPU may not touch the buffer again until dma_unmap_sg or dma_sync_sg_for_device \ dma_sync_sg_for_cpu is called. This ensures that the CPU and the device always see the appropriate data.

Enter Ion. Ion is not a driver for a particular hardware block. It is supposed to be an allocator for other drivers. Ion was written with Android and its stack in mind. Too many drivers written for Android do not conform to the traditional driver model and don't use the DMA APIs. This means drivers have to manage their caches some other way. Cache operations are easy to get wrong and can be dangerous to the data of the system. Public APIs are carefully reviewed and curated. The near universal rule in the kernel is that drivers should be relying on the DMA API (in a correct manner) to do their cache maintenance. The drivers that don't use the DMA APIs are typically written for one architecture and begrudingly call architecture implementations.

Ion, being the central location for all hopes and dreams, became a psuedo-DMA layer. When a caller allocates memory from Ion, it is guaranteed to be clean in the cache as if dma_map was called. It does this by calling the dma_sync APIs without calling map. This is not allowed by the DMA APIs and just 'happens' to work for the devices Ion is used on (i.e. cellphones). Why not just call dma_map_sg and let that take care of the caches? Calling map would guarantee the memory would be synced appropriately with the cache. It would also kill performance. Buffers in the Android graphics framework are allocated and deallocated with almost any input. To save on the overhead of allocation, Ion keeps pages around in a pool that can be drained when under memory pressure. These pages are guaranteed to be clean in the cache. Calling map each time on every page would be unnecessary. Even if performance weren't an issue, what device would be used for mapping? Ion has a device exported to userspace which could be used. That ends up feeling forced, especially when all that's needed is the cache operations. Calling map starts the contract of ownership between device and CPU. The buffer isn't actually being passed off anywhere so the operations become meaningless. Ion is allocating the buffer for some other device to eventually map and pass it off.

My attempt to pull caching directly into Ion was met with "No don't do that. Do it properly." I've got a set of APIs that are worth reviewing but I keep going back and forth on clean up and sending them out. I'd still like to pull as much of the explicit caching out of Ion as possible and make those APIs unnecessary. Some of my uncertainty is that I'm not working on a whole framework. No vendor has a really complete Ion implementation easily available for me to hack on. I'm making changes a bit blindly in hopes that others will pick it up. Focusing on one target would give me direction of "Let the framework support these needs". I could say "Yes, we can stop with the explicit caching most places and just require drivers call begin_cpu_access or the equivalent userspace calls". Welcome to open source software, I guess. Anyone can fix anything with the bits and pieces available if they try hard enough. We'll see where this goes.

  1. A big thank you to everyone from ARM for continuing to give these types of presentations at conferences.