String copying in the kernel

| categories: fedora

One of the many areas that the kernel self protection project looks at is making sure kernel developers are using APIs correctly and safely. The string APIs, in particular string copying APIs, seem to be one area that gets developers confused. Strings in C aren't real1 in that there isn't a proper string type. For the purposes of this discussion, a C string is an array of characters with a terminating NUL (\0) character.

One of the obvious operations a programmer would want to do is copy a string. There's an API strcpy to do so:

char *strcpy(char *dest, const char *src);

From the man page:

   The  strcpy()  function  copies the string pointed to by src, including
   the terminating null byte ('\0'), to the buffer  pointed  to  by  dest.
   The  strings  may  not overlap, and the destination string dest must be
   large enough to receive the copy.  Beware  of  buffer  overruns!   (See

That last sentence is important and the source of numerous bugs. Because C strings don't have an inherent length associated with them, it's up to the programmer to know/check the length everywhere. Otherwise, you may end up copying bytes outside the dst buffer. This is pretty annoying and error prone so there's another API, strncpy

char *strncpy(char *dest, const char *src, size_t n);

This one takes a length parameter so it's getting better. From the man page:

   The  strncpy()  function is similar, except that at most n bytes of src
   are copied.  Warning: If there is no null byte among the first n  bytes
   of src, the string placed in dest will not be null-terminated.

   If  the  length of src is less than n, strncpy() writes additional null
   bytes to dest to ensure that a total of n bytes are written.

That last sentence in the first paragraph is, again, important. If your src string is greater than n your buffer will not be NUL terminated. You may not have written beyond the buffer but the next time you access the string at dst C will happily look in the next memory area until it sees a NUL character. It's also pretty easy to run into some anti-patterns with strncpy. If you don't specify the bound on n correctly, it's possible to overrun the buffer. If your bound for n is a function of your src string, you haven't solved anything. gcc has started to warn on some of these issues which is helpful (if annoying to clean up).

There's also strlcpy:

 size_t strlcpy(char *dst, const char *src, size_t size);

I couldn't quite find the full history but this one seems to be derived from BSD. From the kernel's lib/string.c:

    Compatible with ``*BSD``: the result is always a valid
    NUL-terminated string that fits in the buffer (unless,
    of course, the buffer size is zero). It does not pad
    out the result like strncpy() does.

So strlcpy will solve the truncation issue but will not pad the buffer. The padding may or may not be behavior that's wanted. strlcpy in the kernel also has the implementation detail of calling strlen(src) which means that you will always be reading the entire string length even if you only specify a subset of the string to be copied. This shouldn't matter for most uses but there may be cases which could result in reading memory unexpectedly if src is not NUL terminated.

There's also strscpy which was introduced in 2015 and is designed to be a combination of both strcpy and strlcpy. This was not without controversy but today the API is frequently preferred over either strncpy or strlcpy.

More important than a general rule of "You should always use strscpy" is to make sure you understand what all the APIs do. There may be cases where it is appropriate to just use strcpy or you want the behavior of strncpy or strlcpy. If you're doing something unusual, please document your code for the benefit of others.

  1. C strings are about as real as Linux containers. 


| categories: fedora

Last week was Open Source Summit and Linux Security Summit in beautiful Vancouver, BC. Highlights:

  • There was a talk on security in Zephyr and Fuchsia. While the focus of the conference is Linux, there's a growing interest in running Linux in conjunction with processors running other operating systems. Zephyr is an open source RTOS targeted at processors with a smaller footprint than Linux. Most of the security improvements have been adding features to take advantage of the MMU/MPU. One of those features was userspace support, which is always a bit of a surprise to hear as a new feature. Fuchsia is Google's new microkernel operating system. There's some argument that microkernels offer more security than Linux since more parts can run in userspace. Much of the talk was about the resource and namespace model. There's been a good deal of work put into this but it was noted much of this is still likely to be reworked.

  • Kees Cook talked about how to make C less dangerous. I've seen bits and pieces of this talk before and LWN did a great writeup so I won't rehash it all. This did spawn a thread about how exactly VLAs are or aren't security issues.

  • Someone from Microsoft talked about Azure Sphere. Azure Sphere is Microsoft's attempt at an IoT based microprocessor that runs Linux. The real challenge is that the device has 4MB. The talk focused on what kinds of optimizations they had to do to get it to run in that space. There's been similar attempts before but 4MB is still incredibly impressive. I'll be keeping an eye out when the patches go upstream (and maybe buy a device).

  • Two people from the Android security team at Google gave a talk about the state of things there. Much of the talk was numbers was numbers and statistics. Standard recommendations such as "reduce your attack surface" and "use SELinux" are very effective at reducing the severity of bugs. Bounds checks were a very common root cause. It turns out, copy_*_user APIs are easy to get wrong. Features such as CONFIG_HARDENED_USERCOPY are very effective here (there was an all too familiar story about "well if I turn on hardened usercopy my tests don't pass"). The Android security team does great work and it's good to see the data.

  • Alexander Popov gave a talk on his experience in upstreaming the stackleak plugin. This is a gcc plugin that's designed to clear the stack after every system call to reduce the chance of information leak. The talk covered the history of development from separating the plugin from grsecurity to its current form. Like many stories of contributing, this one was not easy. It took many iterations and has been dismissed by Linus. As of this writing it still hasn't been pulled in and I hope it gets taken in soon.

  • Greg KH ~~generated headlines~~ talked about Spectre and Meltdown response in the kernel. The most interesting part of the talk was outlining the time frame of when various parts got fixed. Also important was the discussion of stable kernels and pointing out that backporting is a huge pain.

  • Sasha Levin and Julia Lawall talked about using machine learning on stable trees. The current system for getting fixes into stable trees relies on a combination of maintainers and developers realizing a fix should go in. This leads to many fixes that could be useful not actually making it in. The new idea is to use machine learning to figure out what patches might be appropriate for stable. Like all machine learning work, it's not perfect but it's found a number of patches. Sasha has also done a lot of analysis on the stable trees and buggy patches (it turns out patches that come later in the -rc cycle are more likely to be buggy) so this work is overall beneficial to the kernel. I for one welcome our new bot maintainers.

  • Julia Cartwright talked about the state of the RT patches. These patches have been out of tree for a very long time and have been slowly getting merged. The good news is there may be a light at the end of the tunnel thanks to a renewed effort. The current patch set is a manageable size and the current work can be explained in a few slides. She also mentioned the RT group can always use more people to get involved for anyone who is interested in fun projects.

  • Casey Schaufler discussed the trade offs in kernel hardening. Security is often a trade off for some other aspect of system performance (speed, memory). Security is also harder to quantify vs. "it goes 20% faster". Casey talked about some examples similar to Kees of APIs that need to be corrected and problems with getting things merged. Ultimately, we are going to have to figure out how to make security work.

  • Amye Scarvada gave a talk about "rebooting a project" but it was really a short workshop on a method of how to do planning. The target audience was community managers but I found it really useful for any project. She talked about things like short and near term goals and determining external vs internal and technical vs non techcnical problems. Really helpful for framing problems.

  • Jim Perrin talked about making a move from engineering to management. I've seen various people talk about this before and the most important point to remember is that management is not a "promotion", it is a different track and set of skills than engineering. You need to learn and develop those skills. He gave some good examples of what he had to figure out and learn. He emphasized that you should not go into management unless you really want to. Once again, really good advice.

  • There was a panel discussion about writing for your career. All the the panelists work in open source and writing in some fashion and encouraged everyone to write. Much of the discussion was about how to work with professional editors and common pitfalls people make when writing. Having a clear point to your writing is important and makes writing easier (something I've certainly found when trying to blog). You also don't write a book to get rich. I appreciated the insight from all the panelists and have some more ideas for my own writing.

A big thank you to the organizers for giving me a chance to look at actual penguins

Kernel community management

| categories: fedora

I was at Open Source Summit last week (full trip report forthcoming) and like always one of the keynotes was Linus being interviewed by Dirk Hohndel. The topic of the kernel community and community management came up and whether Linus thought the kernel needed to do anything more to grow. Paraphrasing, his response was the success of the kernel community shows that it's generally doing fine. I disagree with some aspects of this and have actually thought a lot about what community management would mean for the kernel.

Community manager is a job that many modern source projects above a certain size seem to have. If you google for "open source community manager" you'll find lots of different descriptions of what the job entails. Lots of people who actually have experience with this (i.e. not me) have written and spoken about this work. The big thing for me is that community management is a deliberate choice to shape the community. You have to make the choice to build the community you want to see. Even if you don't have a community manager, developers are still doing community management every time they interact, because ultimately that's the community.

A better question than "does the kernel need a community manager" is "does the kernel need community management" to which I give an emphatic yes. The kernel has certainly been a successful project but people have pointed out some issues. Again, community management is about making choices to actively build a community. You can't have a stream of maintainers unless you actively work to make sure people are coming in. The kernel community is great at attracting other people who want to work on the kernel but that may not be enough. The kernel is way behind in terms of continuous integration and other tools most people expect from open source projects these days. One area we need to grow is people who work on tools to support the kernel. That pool may need to come from outside the traditional kernel development community.

The role of the TAB in community management is an interesting one. If you look at the description on that page, "The Technical Advisory Board provides the Linux kernel community a direct voice into The Linux Foundation’s activities and fosters bi-directional interaction with application developers, end users, and Linux companies." I know there are some unfavorable opinions (and conspiracy theories) out there about the Linux Foundation. What the Linux Foundation does well is help guide corporations in doing open source which is very different from grassroots free software. There's a large number of companies who have become very active members of the kernel community thanks to guidance and support from developers like those who are on the TAB. Enabling companies to contribute successfully is a form of community building as a practicality; companies have different needs and requirements than individuals. I do believe the members of the TAB deeply care about the kernel community, including those who aren't part of any corporate entity. Figuring out how to set that direction may be less obvious though.

Anyone who says they have the magic solution to community management is lying and I certainly don't have one. I do believe you have to shape your community with intentionality and just focusing on the code will not achieve that.

Flock 2018

| categories: fedora

Last week was Flock 2018. Highlights:

  • I gave a talk on the relationship between the Fedora and RHEL kernels. The short summary is that that the two kernels are not closely related, despite the fact that they are supposed to be. I've been working with some people inside Red Hat to figure out ways to improve this situation. The goal is to have more Red Hat kernel developers participating in Fedora to make the Fedora kernel more beneficial for future RHEL work. I talked about some of the upcoming work such as syncing up core kernel configuration and packaging. This all seemed fairly well received.

  • RHEL + Fedora was a theme throughout many presentations. Josh Boyer and Brendan Conoboy gave a talk about aligning Fedora and RHEL across the entire OS. Some of this was about what you would expect (more testing etc.) but one of the more controversial points was suggesting redefining what makes up the system vs. applications. RPMs are nominally the smallest unit of a distribution but this doesn't quite mesh with the modern concepts of self-contained applications. You want to be able to update applications independently of the underlying system and vice versa. The talk was fairly high level about what to actually do about this problem but it generated some discussion.

  • Kevin Fenzi gave a talk about rawhide. As a relative newcomer to the project, I enjoyed hearing the history of how rawhide came about and what's being done to keep it moving forward. I'll echo the sentiment that rawhide is typically fairly usable, so give it a shot!

  • Dusty Mabe and Benjamin Gilbert gave a talk about Fedora CoreOS. I've always thought the CoreOS concept was a great idea and I'm pleased to see it continue on. Some of the talk was a bit of a retrospective about what worked and didn't work for CoreOS. Certain parts are going to be re-written. I enjoyed hearing the upcoming plans as well as getting to meet the CoreOS team.

  • Peter Robinson ran an IoT BOF. IoT is now an official Fedora objective and has a regular release. Part of the goal of the BoF was to talk about what it currently supports and what people want to do. Several people had great plans for utilizing some older hardware and I look forward to seeing more projects.

  • Peter Robinson and Spot gave a talk on the Raspberry Pi. Support for this device has come a long way and there's always new things happening. If you have a Raspberry Pi give it a shot!

  • There was a session on Fedora in Google Summer of Code and Outreachy. Fedora was extremely successful with its interns this past summer and it was great to hear from everyone and the mentors. There is another round of Outreachy happening soon as well.

Once again, a great time. Thanks to the organizers to putting on a fantastic conference.

The cabbage patch for linker scripts

| categories: fedora

Quick quiz: what package provides ld? If you said binutils and not gcc, you are a winner! That's not actually the story, I just tend to forget which package to look at when digging into problems. This is actually a story about binutils, linker scripts, and toolchains.

Usually by -rc4, the kernel is fairly stable so I was a bit surprised when the kernel was failing on arm64:

ld: cannot open linker script file ldscripts/aarch64elf.xr: No such file or directory

There weren't many changes to arm64 so it was pretty easy to narrow down the problem to a seemingly harmless change. If you are running a toolchain on a standard system such as Fedora, you will probably expect it to "just work". And it should if everything goes to plan! binutils is a very powerful library though and can be configured to allow for emulating a bunch of less standard linkers, if you run ld -V you can see what's available:

$ ld -V
GNU ld version 2.29.1-23.fc28
  Supported emulations:

This is what's on my Fedora system. Depending on how your toolchain is compiled, the output may be different. A common variant toolchain setup is the 'bare metal' toolchain. This is (generally) a toolchain that's designed to compile binaries to run right on the hardware without an OS. The kernel technically meets this definition and provides all its own linker scripts so in theory you should be able to compile the kernel with a properly configured bare metal toolchain. What the harmless looking change did was switch the emulation mode from linux to one that works with bare metal toolchains.

So why wasn't it working? Looking across the system, I found no trace of the file aarch64elf.xr, yet clearly it was expecting it. Because this seemed to be something internal to the toolchain, I decided to try another one. Linaro helpfully provides toolchains for compiling arm targets. Turns out the Linaro toolchain worked. strace helpfully showed where it was picking up the file1:

lstat("/opt/gcc-linaro-7.1.1-2017.08-x86_64_aarch64-linux-gnu/aarch64-linux-gnu/lib/ldscripts/aarch64elf.xr", {st_mode=S_IFREG|0644, st_size=5299, ...}) = 0

So clearly the file was supposed to be included. Looking at the build log for Fedora's binutils, I could definitely see the scripts being installed. Further down the build log, there was also a nice rm -rf removing the directory where these scripts were installed to. This very deliberately exists in the spec file for building binutils with a comment about gcc. The history doesn't make it completely clear, but I suspect this was either intended to avoid conflicts with something gcc generated or it was 'borrowed' from gcc to remove files Fedora didn't care about. Linaro, on the other hand, chose to package the files with their toolchain. Given Linaro has a strong embedded background, it would make sense for them to care about emulation modes that might be used on more traditional embedded hardware.

For one last piece of the puzzle, if all the linker scripts are rm -rf'd why does the linker work at all, shouldn't it complain? The binutils source has the answer. If you trace through the source tree, you can find a folder with all the emulation options, along with the template they use for generating the structure representation. There's a nice check for $COMPILE_IN to actually build a linker script into the binary. The file is actually responsible for generating all the linker scripts and will compile in the default script. This makes sense, since you want the default case to be fast and not hit the file system.

I ended up submitting a revert of the patch since this was a regression, but it turns out Debian suffers from a similar problem. The real take away here is toolchains are tricky. Choose yours carefully.

  1. You also know a file is a bit archaic when it has a comment about the Solaris linker 

« Previous Page -- Next Page »