To collect or contain, that is the question...

Part of writing an operating system, or any freestanding software project, for that matter, is developing your own data structures, mostly from scratch. This implies a number of design decisions, many of which I had never considered before starting this project. A key decision that I have been grappling with for os559 is whether to use collections or containers for managing groups of like objects. I will begin by clarifying what I mean by each term.
Collections
I use collection to mean a data structure that links like objects without regard to where or how they are allocated. This is (generally) how the Linux Kernel manages like objects. The most common example is the Kernel’s Doubly-Linked List implementation:
struct list_head {
struct list_head *next, *prev;
};
Note that this structure doesn’t contain a reference to any object other than itself. To use this linked list, one embeds the struct in an object that will be linked, such as a clocksource
:
/* include/linux/clocksource.h */
struct clocksource {
/* Other members */
struct list_head list;
/* More members */
};
In this example, struct clocksource::list
is a link to the list of all clock sources in the system. Elsewhere, another instance of struct list_head
is instantiated to be the “head” node in the list:
/* kernel/time/clocksource.c */
static LIST_HEAD(clocksource_list);
/* The 'LIST_HEAD' macro expands to a declaration of a struct list_head
instance and assigns both `next` and `prev` to point to the instance: */
static struct list_head clocksource_list = { &clocksource_list, &clocksource_list };
A simplification of adding a clocksource to the list would look like this:
/* The real code is significantly more involved, this simply shows how nodes get linked */
void add_clocksource(struct clocksource *cs)
{
clocksource_list.prev = &cs->list;
cs->list.next = clock_source_list.next;
cs->list.prev = &clocksource_list;
clocksource_list.next = &cs->list;
}
/* The actual code has a combination of functions and function-like macros that simplify
the list operations. */
Something to note here is that clocksource_list
remains a distinct node in the list, but it is not contained in a struct clocksource
. This effectively turns its next
pointer into a pointer to the list’s head and its prev
into a pointer to the list’s tail.
Once a node is on a list, one accesses the associated data structure using the LIST_ENTRY
macro, which in turn uses the container_of
macro, which ultimately takes the list node’s address, and subtracts its offset within the containing struct to get its address.
Containers
I use container to mean a data structure that manages storage for like objects in addition to providing a means to access them, an array being the simplest form of container. This is the paradigm the C++ library uses. When adding an object to an STL container, the container allocates storage for the object and (typically) constructs it in place.
Deciding
After a few weeks of equivocating, I finally decided to go with collections as the primary means of associating related objects in the os559 kernel. I made this decision for four reasons:
Many kernel objects have static allocation, and using containers for associating static objects at least partly obviates some of the advantages of both static objects and containers. You basically end up with two choices, you can use containers of pointers to objects, which is basically a collection with extra steps, or you can make copy the static object into the container’s heap allocation.
Collections work without modification whether an object is allocated statically or dynamically.
Collections are generic by default without requiring templates. One can make generic containers without templates, but in the process you end up losing the type safety you get with templates, and you may lose the allocation advantage.
It takes extra work to ensure that pointers to container objects remain valid for their lifetime. The C++
std::vector
, for instance, makes no guarantee that a pointer or reference to a an object in the vector will be valid after another object is inserted. A collection, on the other hand, doesn’t store the objects, it just provides an organized means of retrieving them.
Conclusion
Containers and collections each have their advantages and disadvantages, but collections are the best fit for most use cases in the os559 kernel. I will update this post in the future with some code from the kernel showing how I’ve put them to use.
Subscribe to my newsletter
Read articles from James Osterhage directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
