Escape sequences in C


It is easy to write a string that contains alphanumeric characters. Non-alphanumeric characters must be encoded with an escape sequence. Let us review the details of escape sequences in the C standard.
Escape sequences are special character combinations within your C source code that appear within a string or when specifying a char
value. For example:
char* str = "Hello World!\n";
char c = '\n';
Escape sequences begin with a backslash (\
) followed by one or more characters that specify the escape sequence. An escape sequence consists of more than one character in your source code, but each escape sequence is converted into a single character during compilation (assuming characters are encoded in a single byte). If the escape sequence is not recognized, then the compiler issues an error.
Let us review the escape sequences, categorized into a few groups.
Special Characters
To use the characters '
, "
, ?
, and \
, we must use the corresponding escape sequences \'
, \"
, \?
, and \\
.
Escape Sequence | Meaning |
\\ | backslash |
\? | question mark |
\' | single quotation mark |
\" | double quotation mark |
Evidently, we need an escape sequence for the backslash character since the backslash is already part of any escape sequence.
Example:
#include <stdio.h>
int main(){
printf("Hello\? Is \"\\\\\\\" a string of three backslashes in \'C\'\?");
return 0;
}
A Note on Line Continuation
The backslash character is also used to continue a string across lines. Specifically, any backslash that is immediately followed by a newline character in the source code is deleted. Effectively, this is used to connect source lines into one logical line. This happens before escape sequences are processed by the compiler.
char* str = "This string ends with \\
\"; // ends with a backslash
Non-printable Characters
Many of these character sequences come from a time when terminals had limited display capabilities:
Escape Sequence | Meaning |
\a | alert |
\b | backspace |
\f | form feed |
\n | newline |
\r | carriage return |
\t | horizontal tab |
\v | vertical tab |
Let us summarize them with simple examples:
The escape sequence '\a'
is supposed to produce an alert beep. This feature is less common on modern machines.
printf("This may trigger an alert beep!\a\n");
Example using '\t'
(horizontal tab):
printf("Column1\tColumn2\tColumn3\n");
Example using '\b'
(backspace):
printf("12345\b\b67\n"); // Output: 12367
Example using '\r'
(carriage return):
printf("Hello, world!\rBye!\n"); // Output: Bye!o, world!
Example using '\v'
(vertical tab):
printf("Line1\vLine2\n");
Example using '\f'
(form feed, rarely used today):
printf("First page\fSecond page\n");
Numerical escape sequences
Escape sequences in string and character literals allow us to define characters directly using their numerical code value. This is done either in octal or hexadecimal form.
An escape sequence in octal form consists of \
followed by one, two, or three octal digits (0-7). The integer value determines the character that is put into that position. Notice that at most three octal digits are read, any digits after that (and any non-octal digit) is not part of the escape sequence.
int main(){
printf("\101\102\103\n"); // prints ABZ
printf("\17011\17111\17211\n"); // prints x11y11z11
printf("\7789"); // prints A89
return 0;
}
Most importantly, this is the standard way to insert the null character manually into a string.
printf("Hallo\0Welt!\n"); // prints Hallo
Similarly, hexadecimal escape sequences begin with \x
followed by one or more hexadecimal digits (0
-9
, a
-f
, A
-F
). Any number of digits may follow \x
, but whitespace or non-hex characters terminate the sequence. Unlike octal escape sequences, hexadral escape sequences can have an arbitrary number of digits.
printf("\x41\x42\x43\n\x41G\n"); // prints: ABC followed by AG
printf("\xAa\n"); // Mixed uppercase/lowercase is possible
In practice, hexahedral digits for one 8-bit characters are sufficient for any English text. More hexahedral digits are meaningful if the string character type uses more than one byte.
A technical detail is the fact that numerical escape sequences in char constants are always interpreted as unsigned chars. If the numerical value of the escape sequence does not fit into the character type, then an overflow occurs. The compiler may also emit a warning. The result of that overflow is used to determine the character of the escape sequence.
char c = '\777'; // 511 mod 256 = 255, value stored is 255
printf("%d\n", (unsigned char)c); // prints: 255
Technically, numerical escape sequences within a string are interpreted as an int
, and the result is translated into a character.
printf("\x4142\n"); // 0x4142 = 16706. The result depends on the implementation
printf("\x41" "42\n"); // "A42" via string literal concatenation
The translation of escape sequences and overflow management depends on the implementation and the type of characters in the string. Different string types and glyph encodings shall be a topic for another blog post.
Subscribe to my newsletter
Read articles from Martin Licht directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
