Data Forge # 3: Refactoring, Lots Of Refactoring...


The Broken Window Theory in Code
Lately, I’ve been reading The Pragmatic Programmer, and one of the many ideas that really stood out to me is the Broken Window Theory. It’s all about reducing entropy in your codebase. Here’s the gist:
If a single broken window in a building is left unrepaired, it signals neglect. Eventually, more windows get broken, and the building falls into disrepair. The same goes for code—if messy, poorly written, or buggy code is left untouched, it sets a precedent. Other developers (or your future self) start to think that quality doesn’t matter here. Over time, more “broken windows” appear, and the codebase becomes harder to maintain, test, and build on.
It’s a subtle psychological shift: once you cut one corner, it becomes easier to cut more. That’s why even small messes should be cleaned up early. Fix the little things. Refactor that confusing function. Rename the poorly named variable. Pay off technical debt incrementally before it compounds. Guard the integrity of your codebase like it’s your home—because in many ways, it is.
My Broken Window
After wrapping up the dynamic array API I showcased in Data Forge #2, I started doing deeper research into how C libraries should be structured. That’s when I realized—I had been building mine completely wrong.
Coming from a web development background, I approached library design the same way I’d build a typical application. That especially applied to how I handled errors—I was handling them internally. But that’s not how it should work when you’re writing tools for other developers. The user of the library should be the one handling the errors, not the library itself.
Better Error Handling
The first step in fixing my “broken window” was defining a custom error type: DfError
, which would represent every error the library could return.
typedef enum {
DF_OK = 0,
DF_ERR_NULL_PTR,
DF_ERR_ALLOC_FAILED,
DF_ERR_INDEX_OUT_OF_BOUNDS,
DF_ERR_OUT_OF_RANGE,
DF_ERR_EMPTY,
DF_ERR_ALREADY_FREED,
} DfError;
As you can see, DF_OK
is assigned the value 0 to indicate success. Every function in the library now returns a DfError
, allowing for clean and consistent error handling.
To improve usability even further, I implemented a helper function that turns an error code into a human-readable string:
const char *df_error_to_string(DfError err) {
switch (err) {
case DF_OK: return "No error";
case DF_ERR_NULL_PTR: return "Null pointer";
case DF_ERR_ALLOC_FAILED: return "Memory allocation failed";
case DF_ERR_INDEX_OUT_OF_BOUNDS: return "Index out of bounds";
case DF_ERR_OUT_OF_RANGE: return "Value out of range";
case DF_ERR_EMPTY: return "Structure is empty";
case DF_ERR_ALREADY_FREED: return "Memory has already been freed";
default: return "Unknown error";
}
}
I plan to expand on these explanations in the future, but for now, this gives users enough clarity to understand what went wrong without digging into the source.
Uniform Return Type
After creating the custom error type, I ran into a new challenge: how could I return both an error and data from my functions? My solution was to create a standard return type that I could use across all applicable functions—DfResult
.
typedef struct {
DfError error;
void *value;
} DfResult;
This provides a clean and consistent way to return data from a function, while still enabling flexible error handling by the user. Along with the error and result types, I also created a few helper functions to reduce code duplication throughout the library. I’ll show how these helpers are used shortly, but the key takeaway here is that DfResult
makes error handling clean and predictable.
Here are the helpers I wrote:
DfResult df_result_init() {
return (DfResult){.error = DF_OK, .value = NULL};
}
void df_null_ptr_check(void *ptr, DfResult *res) {
if (!ptr) {
res->error = DF_ERR_NULL_PTR;
}
}
void df_index_check_access(size_t index, size_t length, DfResult *res) {
if (index >= length) {
res->error = DF_ERR_INDEX_OUT_OF_BOUNDS;
}
}
void df_index_check_insert(size_t index, size_t length, DfResult *res) {
if (index > length) {
res->error = DF_ERR_INDEX_OUT_OF_BOUNDS;
}
}
Refactoring
Now that I had my DfError
and DfResult
types set up, it was time for the (very tedious) part—refactoring all the code I had written so far to use them properly. I started with my DfArray
implementation, removing the old internal error handling logic and replacing it with the new, standardized approach.
Here’s an example of what that looked like:
Before:
void *DfArray_Get(DfArray *array, size_t index) {
if (index >= array->length) {
fprintf(stderr, "Error: Index %zu out of bounds (length: %zu)\n", index, array->length);
exit(1);
}
void *dest = malloc(array->elem_size);
memcpy(dest, (char *)array->items + index * array->elem_size, array->elem_size);
return dest;
}
After:
DfResult dfarray_get(DfArray *array, size_t index) {
DfResult res = df_result_init();
df_null_ptr_check(array, &res);
if (res.error) {
return res;
}
df_index_check_access(index, array->length, &res);
if (res.error) {
return res;
}
void *dest = malloc(array->elem_size);
if (!dest) {
res.error = DF_ERR_ALLOC_FAILED;
return res;
}
memcpy(dest, (char *)array->items + index * array->elem_size, array->elem_size);
res.value = dest;
return res;
}
Usage:
DfResult get_result = dfarray_get(array, 1);
if (get_result.error) {
printf("Get error: %s\n", df_error_to_string(get_result.error));
} else {
int *retrieved = (int *)get_result.value;
printf("Retrieved value: %d\n", *retrieved);
free(retrieved);
}
As you can see, this new approach is more verbose, but the trade-off is worth it. The code is now clearer in its intent, easier to maintain, and far more flexible for users of the library. The helper functions like df_result_init
, df_null_ptr_check
, and df_index_check_access
help cut down on the noise, too.
I refactored all of the functions in DfArray
the same way, if you’d like to dive into the full refactored codebase and explore the finer details of the Data Forge library—especially the parts I haven’t yet covered in any article—you can check it out here. Just a heads up: at the time of writing, I'm still in the process of updating the documentation to reflect the changes, so some parts may be a bit out of sync for now.
Lessons Learned
This whole process turned out to be a solid learning experience on multiple fronts. I now have a much clearer picture of how to structure my library with users in mind. More importantly, I proved to myself that I can tackle big refactors—something that used to intimidate me and lead to a lot of “broken windows” left untouched.
I also got to apply several principles from The Pragmatic Programmer to a real-world codebase, which helped solidify that knowledge for me in a practical way. All in all, it was a fun and productive challenge, and I’m excited to keep building on this foundation across the rest of the library.
What’s Next
Keep an eye out for some shorter posts coming soon. I’ll be diving into how I implemented generic utility functions and a custom iterator type in Data Forge—stuff that got a bit overshadowed during the refactor but is definitely worth sharing.
Also, I’m currently wrapping up a singly linked list API for Data Forge. It’s almost done, so I’m hoping to have an article on that out by this time next month.
Thanks for hanging out—I'll see you in the next one!
Subscribe to my newsletter
Read articles from Andrew Archer directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
