The bug in Valgrind

time to read 3 min | 404 words

Valgrind is an essential tool for anyone who is working with native code, especially if you are running C or C++ code. I have a codebase that is about 15,000 lines of C code, and Valgrind is absolutely essential for me to check my work. It has caught quite a few of my slips.

I recently switched systems and when running the same code using Valgrind, I started to get annoying warnings, like this:

==16896==
--16896-- WARNING: Serious error when reading debug info
--16896-- When reading debug info from /tmp/test.txt:
--16896-- can't read file to inspect ELF header
==16896==

The key issue is that this is, as you can imagine, a data file, why is Valgrind attempting to read ELF details from the file?

It took me a while to narrow things down, but I found that I could reproduce this easily with the following code:

If you’ll run this code with the following command, you should see the warning:

clang a.c && valgrind   ./a.out

Note that this is with clang 10.0.0-4ubuntu1 and valgrind-3.16.1. I decided to check what Valgrind is doing using strace, which gave the following output:

Digging a little deeper, let’s highlight the root cause of this:

openat(AT_FDCWD, "test.txt", O_RDWR|O_CREAT|O_TRUNC|O_DSYNC|O_DIRECT|O_CLOEXEC, 0600) = 3
mmap(0x4a4d000, 262144, PROT_READ, MAP_SHARED|MAP_FIXED, 3, 0) = 0x4a4d000
pread64(3, 0x1002ea98a0, 1024, 0) = -1 EINVAL (Invalid argument)

I’m opening the test.txt file using the O_DIRECT file, which limits the kind of things that you can do with the file. In particular, it means that all access should be on page aligned memory. The pread64() call is not using a page aligned buffer to read from the file.

What is interesting is that my code isn’t issuing any such call, this is coming from inside of Valgrind itself. In particular, I believe that this is the offending piece of code: di_notify_mmap is called whenever we map code, and is complex. The basic issue is that it does not respect the limits of files created with O_DIRECT and that causes the pread() call to fail. At this point, Valgrind outputs the warning.

Brief look at the code indicate that this should be fine. This is a data mapping, not executable mapping, but it still make the attempt. Debugging into Valgrind is beyond the scope of what I want to do. For now, I changed things so any mmap() won’t use the file descriptor with O_DIRECT, and that resolved things for me.