Self-Referential Structs in Rust: Memory Allocation Explained


Introduction to Self-Referential Structs
Self-referential structs are structures where one or more fields in the struct contain references to another instance of the same struct type. This pattern is common in scenarios like creating linked data structures (e.g., linked lists, trees, or graphs), where elements need to reference each other.
A simple example of this is a Human struct representing a person with references to their parents.
struct Human {
name: String,
surname: String,
mom: Option<Human>,
dad: Option<Human>
}
Here, each Human contains references to their mother and father. However, this design won’t work in Rust as-is. Let’s explore why.
Why Rust Limits Self-Referential Structs
Rust’s memory management system relies on ownership and borrowing rules to ensure safety. Rust also enforces the following key rule: the size of a struct must be known at compile time.
If a struct directly contains another instance of itself (e.g., a Human struct containing fields of type Human), it creates an endless loop: Rust wouldn’t be able to determine how much memory is required because the struct’s size depends on itself infinitely.
Problematic Code
struct Human {
name: String,
surname: String,
mom: Option<Human>,
dad: Option<Human>,
}
This code will fail to compile with an error like:
recursive type Human has infinite size
Solution: Using Indirection
To work around this limitation, you can use pointers (or similar indirection mechanisms). In Rust, smart pointers like Box, Rc, or Arc allow you to reference data without directly embedding it into the struct. This solves the infinite-size issue because the pointer itself has a fixed size, typically the size of a machine word.
Example with Box
Box is a smart pointer in Rust used for heap allocation. By wrapping the recursive fields (mom and dad) in a Box, you ensure that the struct has a finite size, as it contains fixed-size pointers rather than directly embedding other instances.
struct Human {
name: String,
surname: String,
mom: Option<Box<Human>>,
dad: Option<Box<Human>>,
}
Now the Human struct works because the Box pointer is a fixed size, regardless of the size of the data it points to. The actual data (another Human) is stored on the heap, and the Box manages its memory.
How Memory Allocation Works
When you wrap a field in a Box, the field’s data is allocated on the heap, while the Box itself is stored on the stack. Here’s what happens in the example above:
1. Heap Allocation: Each Box<Human> allocates memory for the Human struct on the heap. This memory is managed by the Box, which takes care of deallocating it when it goes out of scope.
2. Stack Allocation: The Box itself (a fixed-size pointer) stores on the stack. It keeps a reference to the heap-allocated memory.
Comparison of Struct Definitions
Problematic Definition
struct Human {
name: String,
surname: String,
mom: Option<Human>, // Error: infinite size
dad: Option<Human>, // Error: infinite size
}
Rust cannot determine the size of Human at compile time.
Fixed Definition
struct Human {
name: String,
surname: String,
mom: Option<Box<Human>>, // Fixed-size pointer
dad: Option<Box<Human>>, // Fixed-size pointer
}
By using Box, we avoid infinite size issues. The size of the Human struct is now predictable, as the Box fields have a fixed size regardless of what they point to.
Conclusion
Self-referential structs in Rust require careful handling due to the language’s strict memory safety guarantees. By using smart pointers like Box(or other), you can implement these structures safely and effectively while adhering to Rust’s rules.
Subscribe to my newsletter
Read articles from Yelyzaveta Dymchenko directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
