Segmentation Faults

Gabe JohnsonGabe Johnson
5 min read

The embattled "segfault" is as hated as it is misunderstood. If you've ever written a program in any language, you've probably encountered some variant of this error. What is it, and how do you prevent it from happening?

A segmentation fault happens when a program attempts to access memory that it doesn't own. In my data structures course, 99% of the time this indicates that the code is dereferencing a pointer that is null. Or to put it in code:

int* myVar = nullptr;
cout << *myVar << endl;

Here we are asking to read the value contained in myVar , interpret that as a memory address, and then read the value contained at that address. But I've initialized it to null! And in C and C++, null is just a fancy word for zero. So this code really means "read the value contained at address zero" - which your program definitely doesn't own. The operating system notices these shenanigans, and puts a quick end to it by crashing your program.

There are many ways beyond the above contrived example that will lead to a segfault. This Stack Overflow answer lists many more.

Tracking down segfaults

If your program is crashing (and this will definitely happen to you), there are strategies that help. First, try to figure out where the crash is happening. There might be a console statement with a line number. Or if you have console output that abruptly stops before the program explodes, you might be able to narrow it down. Some people frown on this approach, but don't be afraid to drop in cout statements!

cout << "first thing..." << endl;
first->foo();
cout << "second thing..." << endl;
second->foo();
cout <<< "third thing..." << endl;
third->foo();

If you see "first thing" and "second thing", but it blows up before printing the third thing, where do you think the issue is?

Look for the arrow operator. Segfaults can happen any time you dereference a pointer, and that's what the arrow operator is doing: second->foo() means "look at the memory pointed to by second and invoke its foo function". Sometimes there are multiple pointer dereferences on a single line of code, and that can make it harder to know which one it is. Consider splitting that code into several simpler lines.

You can use a debugger - just drop a breakpoint at some place before where the error happens, and step through. Your debugger should show you the variables on hand along with their values. A null pointer has a value of 0! If you step through slowly you might see it coming. If you don't see it coming, it will still go boom 💣 and you can adjust your breakpoint to get closer to the action and debug it again.

A segfault by any other name...

Various languages have their own ways of letting you know that you've committed memory crime. Technically, they're not necessarily exactly the same thing, but they feel similar: a segmentation fault is a low level error, typically originating from hardware protections and delivered by the operating system. Other languages might operate differently, such as in a virtual machine or interpreter, and those sandboxed environments manage their own memory. Even with those languages, you can write code that references symbols that don't exist.

I wanted to show you what invalid references look like in a few other languages, just to show you that they're all kinda the same despite using different names (null, undefined, nil, None, etc).

In Javascript, referencing an undefined or null object produces wonderfully arcane messages like this:

Cannot read properties of undefined (reading 'quantity')
  -or-
Cannot read properties of null (reading 'quantity')

Javascript can be enhanced via Typescript, which lets you write code that is deliberate about what types you're working with at compile time, and save you loads of heartache down the road. (It should be noted that I am a raging Typescript fan.)

In Python, it is called None, and error messages can look like this:

AttributeError: 'NoneType' object has no attribute 'quantity'

Javascript, Typescript, Python, and other interpreted languages don't give you direct memory access, but the variables that you're working with are symbols that can still cause referential errors.

Go is interesting because it is a statically typed language that gives you compile-time safety like bounds checking, and the inability to do pointer arithmetic. It also has an adorable mascot. In Go the concept of "has no value" is called nil. It is still possible to dereference a variable that is nil, but this causes a program panic which is recoverable within the program.

Rust is even more interesting, because it goes to great lengths to prevent you from building a program that has unsafe memory usage. Rust has a few language features that makes it very difficult to produce a segfault like a C++ program would. In fact, the White House recently released a report that exhorts software engineers to get their act together and use memory safe languages such as Rust for national security reasons. As government documents go, it is a thrilling read! (Maybe start around page 7...)

Conclusion

Segfaults, null pointer exceptions, and whatever that weird message that Javascript gives you: they're all related, and you will encounter them many, many times. I hope this has helped you understand some of the ways you can write (and debug!) code that has memory or symbolic reference problems.

0
Subscribe to my newsletter

Read articles from Gabe Johnson directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Gabe Johnson
Gabe Johnson

Part neanderthal, software engineer, sometimes computer science professor, design nerd.