Self-Referential Structs in Rust: Memory Allocation Explained

Introduction to Self-Referential Structs

Self-referential structs are structures where one or more fields in the struct contain references to another instance of the same struct type. This pattern is common in scenarios like creating linked data structures (e.g., linked lists, trees, or graphs), where elements need to reference each other.

A simple example of this is a Human struct representing a person with references to their parents.

đź’ˇ
THE EXAMPLE BELOW IS NOT WORKING! SHOWN TO CLEAR THE SELF-REFERENCE QUESTION!
struct Human { 
    name: String, 
    surname: String, 
    mom: Option<Human>, 
    dad: Option<Human>
}

Here, each Human contains references to their mother and father. However, this design won’t work in Rust as-is. Let’s explore why.

Why Rust Limits Self-Referential Structs

Rust’s memory management system relies on ownership and borrowing rules to ensure safety. Rust also enforces the following key rule: the size of a struct must be known at compile time.

If a struct directly contains another instance of itself (e.g., a Human struct containing fields of type Human), it creates an endless loop: Rust wouldn’t be able to determine how much memory is required because the struct’s size depends on itself infinitely.

Problematic Code

struct Human {
    name: String,
    surname: String,
    mom: Option<Human>,
    dad: Option<Human>,
}

This code will fail to compile with an error like:

recursive type Human has infinite size

Solution: Using Indirection

To work around this limitation, you can use pointers (or similar indirection mechanisms). In Rust, smart pointers like Box, Rc, or Arc allow you to reference data without directly embedding it into the struct. This solves the infinite-size issue because the pointer itself has a fixed size, typically the size of a machine word.

Example with Box

Box is a smart pointer in Rust used for heap allocation. By wrapping the recursive fields (mom and dad) in a Box, you ensure that the struct has a finite size, as it contains fixed-size pointers rather than directly embedding other instances.

struct Human {
    name: String,
    surname: String,
    mom: Option<Box<Human>>,
    dad: Option<Box<Human>>,
}

Now the Human struct works because the Box pointer is a fixed size, regardless of the size of the data it points to. The actual data (another Human) is stored on the heap, and the Box manages its memory.

How Memory Allocation Works

When you wrap a field in a Box, the field’s data is allocated on the heap, while the Box itself is stored on the stack. Here’s what happens in the example above:

1. Heap Allocation: Each Box<Human> allocates memory for the Human struct on the heap. This memory is managed by the Box, which takes care of deallocating it when it goes out of scope.

2. Stack Allocation: The Box itself (a fixed-size pointer) stores on the stack. It keeps a reference to the heap-allocated memory.

Comparison of Struct Definitions

Problematic Definition

struct Human {
    name: String,
    surname: String,
    mom: Option<Human>, // Error: infinite size
    dad: Option<Human>, // Error: infinite size
}

Rust cannot determine the size of Human at compile time.

Fixed Definition

struct Human {
    name: String,
    surname: String,
    mom: Option<Box<Human>>, // Fixed-size pointer
    dad: Option<Box<Human>>, // Fixed-size pointer
}

By using Box, we avoid infinite size issues. The size of the Human struct is now predictable, as the Box fields have a fixed size regardless of what they point to.

Conclusion

Self-referential structs in Rust require careful handling due to the language’s strict memory safety guarantees. By using smart pointers like Box(or other), you can implement these structures safely and effectively while adhering to Rust’s rules.

0
Subscribe to my newsletter

Read articles from Yelyzaveta Dymchenko directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Yelyzaveta Dymchenko
Yelyzaveta Dymchenko