Warning options for printf incantations

Martin LichtMartin Licht
7 min read

The printf function in C is known for its technical syntax and error proneness. This is shared by its numerous variants, such as snprintf or vsprintf. This article will discuss how your C compiler, GCC or Clang, can help you mitigate these difficulties.

Enter the printf dungeon

The printf function accepts a format string and a variable number of optional arguments. The printf format string compactly encodes how to print the values in optional arguments of the function. For example,

printf("%8s: %10d\n%8s: %10d\n% 8s: %10d\n", "Income", 1234567, "Costs", -4000, "Balance", 1230567 );

will print the following text, with the particular arrangement of white space:

   Income:   1234567
    Costs:     -4000
  Balance:   1230567

Each line in this example begins with a string of at least 8 characters, right-aligned with leading space, followed by a colon and a space, and finally, a right-aligned decimal integer occupying at least 10 character positions.

As this example illustrates, the format string describes both the number and the types of the arguments passed to printf, as well as how each value should be formatted. After the format string, the corresponding values to be printed are passed as variadic arguments. Intuitively, these arguments must match the number and types expected by the format string.

Understanding how variadic functions are implemented is crucial here. The optional arguments in a variadic function call are pushed onto the stack (or passed via registers, depending on the calling convention) without any accompanying metadata. That is, the function receives no explicit information about how many arguments there are or what types they have. This metadata must be provided by other means. In the case of printf and related functions, the format string serves precisely that role: it encodes the number and types of the arguments in a compact and symbolic form.

Let us reflect upon just how low-level this feature is: the printf function receives a pointer to the format string. Based on that character sequence, it performs read and perhaps write operations throughout the stack.

The printf lower levels

The printf function, however, has no means of verifying that the optional arguments actually match the description in the format string. This may cause incorrect output by misinterpreting the stack. Even worse, stack corruption is possible if the format string contains write-back instructions. Getting the printf string correct requires careful attention to detail.

For instance, take a look at the following:

int temperature;
printf("The temperature is %u degrees outside!%n", temperature );

At the very first glance, the code's purpose seems to print an integer (-10) on a single line. However, this will not work as expected. On the one hand, the bit pattern of the negative integer -10, which uses the two's complement representation, will be interpreted as an unsigned integer (%u instead of %d). As a result, an incomprehensibly larger positive number is printed instead. On the other hand, the author probably meant to write a line break \n at the end but wrote an accidental %n instead. That format specifier will tell printf to assume that one more argument is provided: a pointer to a memory location where it is supposed to write the number of characters printed so far. Obviously, no such pointer has been pushed onto the stack, so printf will write the value into some completely wrong place and accidentally corrupt the stack. This is clearly a source of bugs and potential security vulnerabilities.

That being said, the above code will compile just fine. Nothing in the language standard prevents us from misusing printf when the format string does not match the arguments, leading to pathological examples such as the one above. The capabilities of C are not expressive enough to diagnose the mismatch between the format string and the arguments at compile-time.

Warning options to the rescue

While ill-formed format strings are technically allowed by the C standard, compilers offer warnings that help detect such issues. This gets us fairly close to detecting such bugs at compile-time. These warning features are enabled and controlled via numerous option flags.

-Wformat -Wno-format -Wformat=0 -Wformat=1 -Wformat=2

These compiler flags broadly enable (or disable) the different warning levels for format functions at different levels. You can either switch them off completely (-Wno-format or -Wformat=0), enable the most important warnings (-Wformat or -Wformat=1), or enable even stricter warnings (-Wformat=2).

For example, with -Wformat we receive a warning about the line:

printf("pi is ca. %d\n",3.14159);

Let us have a look at some specialized warning options in more detail.

-Wformat-contains-nul

This flag will have warnings triggered if the format string contains null bytes, that is, if the string terminates earlier than expected due to an unintended \0.

Example:

printf("Hello\0World!\n");

-Wformat-extra-args

Typically, printf ignores any excess arguments passed in the variable argument list which are not referenced in the format string. Such mismatches are often unintentional. When this flag is enabled, a warning will be triggered in any such situation.

Example:

printf("Hello World! #%d\n", 1, -3.14159);

-Wformat-signedness

When this warning flag is set, then the compiler will warn you whenever signedness indicated in the format specifier does not match the argument's type. For example, we may use %d but pass an unsigned argument or use %u and pass a signed argument. The output will then confuse negative signed numbers and very large unsigned integers.

Example:

unsigned int u = std::numeric_limits<unsigned int>::max();
printf("Variable u has value %d\n", u );
signed int s = -1;
printf("Variable s has value %u\n", s );

-Wno-format-zero-length -Wformat-zero-length

Enabling this flag will have the compiler emit a warning whenever the format string has length zero.

printf("", "Hello World!\n" );

-Wformat-nonliteral -Wformat-security

When the format flag is not a literal, then several bugs or security vulnerabilities can be introduced. For example, code of the form

const char* fmt_str = get_format_string();
printf(fmt_str);

may lead to undefined behavior, stack memory leakage, or memory corruption if fmt_str includes some format specifiers. In particular, these may cause printf to expose secrets on the stack. For example, if the format specifier %n appears in the string, this can potentially corrupt memory on the stack and poses a security risk.

The flag -Wformat-nonliteral makes the compiler issue warnings whenever the format string is not a literal string and hence cannot be checked for such a mismatch of format specifiers and arguments. At the time of this writing, the flag -Wformat-security enables a subset of the checks enabled by -Wformat-nonliteral. At the time of this writing, the official documentation of GCC indicates that the former flag will enable some additional checks on its own in future releases.

-Wformat-truncation -Wformat-truncation=level -Wformat-overflow -Wformat-overflow=level

The -Wformat-truncation and -Wformat-overflow warnings are relevant for functions such assnprintf and sprintf, which involve writing formatted output to memory buffers. These flags address subtle bugs due to output truncation or buffer overflows. The latter are a source of security vulnerabilities.

When -Wformat-truncation is set, then a warning will be triggered when the compiler can determine that a call to an output function such as snprintf or vsnprintf will truncate the output because the buffer is too small to contain the formatted string, including the null terminator. Such truncations are often unintentional.

Example:

char buf[8];
snprintf(buf, sizeof(buf), "Result: %d\n", 123456);

When -Wformat-overflow is set, then a warning will be triggered when the compiler can determine that the output will exceed the size of the buffer. Such a buffer overflow compromises memory safety. This pertains to functions such as sprintf or vsprintf, which do not limit the number of characters into the buffer.

Example:

char buf[16];
sprintf(buf, "This is a very long string: %d\n", 42);

These flags can be set with a certain level that determines the effort and thoroughness with which the compiler will try to detect truncation or overflow errors. Generally speaking, switching on optimization improves the detection of such errors.

Treasure chest

For completeness, I also mention the warning flag -Wformat-y2k, which does not concern format string as in printf but a different type of format strings, namely the type accepted by strftime. The latter function is supposed to print dates in human-readable form. The warning flag -Wformat-y2k warns whenever a year is only printed as a two-digit year instead of the full four-digit year. Obviously, that is a whole different class of format strings.

0
Subscribe to my newsletter

Read articles from Martin Licht directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Martin Licht
Martin Licht