Data Forge # 2: Adding A Dynamic Array To My C Library

Andrew ArcherAndrew Archer
13 min read

In higher level languages, it is generally assumed that an array is dynamic by default, that you can push, pop, and shift any element in or out of your array with ease. In C the process is not as straight forward, if you want to add an element to your array you have to manually allocate more memory, then manually copy the the correct number of bytes into the newly allocated space. If you want to grab an element from within an array, you have to ensure you dont accidently try to access an invalid index or you will get the dreaded “segmentation fault” because there is no bounds checking in C. These are only a few of the tedious things a C developer needs to be aware of when trying to work with static array’s.

The goal of adding a dynamic array to Data Forge is to provide a memory safe and feature rich option for those that need to use a dynamic array. The dynamic array in Data Forge provides the following core features.

  • Dynamic resizing

  • Bounds checking

  • Get and Set operations

  • Push and Pop operations

  • Shift and Unshift operations

  • Generic type storage via void *

Additionally I will be incorporating an iterator and adding utility functions in the future. Be on the lookout for articles covering those topics in the coming weeks! Now without further ado lets get into how i built a dynamic array for Data Forge!

Defining DfArray

First thing that I needed to do was define a type for my array. I have decided on the naming convention of prefixing all of my data structures with “Df” to prevent naming conflicts. Next, needed to store the following pieces of information.

  • items - void pointer (to allow for generic type storage) to where our array of items begins in memory

  • length - length of our array

  • elem_size - the size of the type of element in our array

  • capacity - number of elements (of size elem_size) that can be stored within our array

typedef struct DfArray {
  void *items;
  size_t length;
  size_t elem_size;
  size_t capacity;
} DfArray;

Originally I defined this type directly in my header file, but this caused an issue. With the full type being defined in the header file, that allowed users to modify any of the values within the type without using any of the methods I wanted them to use. In order to fix this I instead defined an opaque type in the header file and defined the full type in the .c file.

typedef struct DfArray DfArray;

Now a user has access to the type, but not direct access to the data stored inside it. With the type defined and declared in a safe way I could move on to defining my first functions.

Creation and Destruction

The first function I made was a creation function. Much like in an object oriented language, I wanted to have a function that returned a fully functional array that I could call with ease. In order to do this my function would need two pieces of information; the size of the element being stored in the array and the initial capacity required. Here is the function signature from my header file for reference.

DfArray* DfArray_Create(size_t elem_size, size_t initial_capcity);

Lets break down how this function works. The first step is allocating memory for the array and storing a pointer to it that we will return. Next we ensure that the memory for our array was allocated properly, if not then we throw an error and exit the program.

DfArray *array = malloc(sizeof(DfArray));
  if (!array) {
    perror("Struct memory allocation failed.");
    exit(EXIT_FAILURE);
  }

Next we allocate memory for array→items, this is where the pointer to the start of our elements begins in memory. We calculate how much space we need to allocate by multiplying our parameters, elem_size and initial_capacity together. So if an int is 8 bits and we need an initial capacity of 5 int’s, then we get 8 bits × 5 elements = 40 bits of memory allocated. I am breaking this down a bit here because it was very confusing for me at first, so if you are in that boat don’t worry you are not alone.

Next we check to ensure that the memory for array→items was allocated properly, if it wasn’t then we free the array from memory, throw an error and exit the program.

  array->items = malloc(initial_capacity * elem_size);
  if (!array->items){
    free(array);
    perror("Items memory allocation failed.");
    exit(EXIT_FAILURE);  
  }

Finally we initialize the length, element size and capacity fields then return a pointer to the array.

  array->length = 0;
  array->elem_size = elem_size;
  array->capacity = initial_capacity;

  return array;

With that we have a fully functional way to create a dynamic array!

DfArray* DfArray_Create(size_t elem_size, size_t initial_capacity) { 
  DfArray *array = malloc(sizeof(DfArray));
  if (!array) {
    perror("Struct memory allocation failed.");
    exit(EXIT_FAILURE);
  }

  array->items = malloc(initial_capacity * elem_size);
  if (!array->items){
    free(array);
    perror("Items memory allocation failed.");
    exit(EXIT_FAILURE);  
  }

  array->length = 0;
  array->elem_size = elem_size;
  array->capacity = initial_capacity;

  return array;
}

Now that I could create an array, I needed an efficient way to destroy one, or in other words free our array from memory without having to manually call free. The way I did this was pretty simple, I created a function that takes a pointer to the array and frees both our items and the struct itself from memory.

void DfArray_Destroy(DfArray* array) {
  if (array) {
    free(array->items);
    free(array);
  }
}

Once I completed both of these functions I could move on to functions that allow users to interact with the array.

Push, Pop & Resizing

Coming from web development, for me push is by far the most used method for a dynamic array. That fact made creating this functionality very interesting, it was my first real look under the hood of functionality Iv taken as a given for so long. I started out by defining the parameters I would need, those being a pointer to an array and a void pointer to a value.

void DfArray_Push(DfArray* array, void *value);

The most important part of the push function comes next, and that is checking if there is enough memory allocated to store the new value I want to push. I do this by checking if the length of the array is greater than or equal to the capacity of the array. If it is, then I call a separate resizing function that will allocate more memory. I will circle back to how the resize function works shortly.

  if (array->length >= array->capacity) {
    DfArray_Resize(array);
  }

If it is not the case that I need to resize the array, then using some pointer arithmetic I copy the value I passed in as a parameter into the next available space in memory within the array. Finally I increment the length.

  memcpy((char *)array->items + array->length * array->elem_size, value, array->elem_size);
  array->length++;

Here is the completed push function.

void DfArray_Push(DfArray* array, void *value) {
  if (array->length >= array->capacity) {
    DfArray_Resize(array);
  }

  memcpy((char *)array->items + array->length * array->elem_size, value, array->elem_size);
  array->length++;
}

Now lets circle back around to the resize function and how that works. As you can see above the only parameter it takes is a pointer to the array we want to resize. The logic within the function is pretty simple. The first thing it does is define variables to store our new capacity and resized array→items.

 size_t new_capacity;
  void *resized_items;

Next it check whether or not the current capacity is 0, the reason it need’s to check for this will become apparent when I go over the pop function. If the capacity is 0 then it set the new capacity to a value of 5 and allocates enough space for one item. If its not then it set’s the new capacity to double the old capacity and reallocates the elements stored in array→items into the resized_items variable with the new capacity.

 if (array->capacity == 0) {
    new_capacity = 5;
    resized_items = malloc(new_capacity * array->elem_size);
  } else {
    new_capacity = array->capacity * 2;
    resized_items = realloc(array->items, new_capacity * array->elem_size);
  }

Next it does similar error handling to what I showed before.

  if(!resized_items) {
    perror("Reallocation of IntArray->capacity failed");
    return;
  }

Finally it updates array→items and array→capacity.

  array->items = resized_items;
  array->capacity = new_capacity;

Here is the completed resize function.

void DfArray_Resize(DfArray* array) {
  size_t new_capacity;
  void *resized_items;
  if (array->capacity == 0) {
    new_capacity = 5;
    resized_items = malloc(new_capacity * array->elem_size);
  } else {
    new_capacity = array->capacity * 2;
    resized_items = realloc(array->items, new_capacity * array->elem_size);
  }

  if(!resized_items) {
    perror("Reallocation of IntArray->capacity failed");
    return;
  }

  array->items = resized_items;
  array->capacity = new_capacity;
}

Now on to the pop functionality. The pop function takes a pointer to an array as a parameter like the previous functions but unlike the rest it returns a void pointer. Coming from high level languages the idea of having to cast my returned value to the proper type seemed clunky and prone to user error but there really isnt a way around it. But I will say that the more I practiced at it and wrote tests for this function, the more natural it felt. Here is the function signature for pop.

void *DfArray_Pop(DfArray *array);

The first step in the function is to do some bounds checking, if the array is empty I do not want to attempt to grab memory from an empty array as this will cause errors.

if (array->length < 1) {
    fprintf(stderr, "Error: Array is empty, can not pop\n");
    exit(1);
  }

Next, after I confirm that I have an element to pop, I define the variable I am going to store the popped value in and allocate the proper amount of space for it. I did this by declaring a void pointer and using the stored array→elem_size to allocate the proper amount of space.

void *dest = malloc(array->elem_size);

Following the allocation of space for the element, like before I copy the memory that contains the element into the dest variable using pointer arithmetic, but instead of incrementing the length like before I decrement. Now this does not free the memory containing the element in our array, but due to the logic of the array and the way I access elements through methods, this element is no longer reachable. I will show how that logic is implemented when I go over to the get and set methods.

  memcpy(dest, (char *)array->items + (array->length - 1) * array->elem_size, array->elem_size);
  array->length--;

Once I have successfully removed the element from the array I check to see how much unused memory I have allocated for our array→items. If I am using less than or equal to half of the allocated space, I shrink the allocated space. Finally I return our popped element.

  if (array->length <= array->capacity/2 || array->length == 0) {
    DfArray_Shrink(array);
  }

  return dest;

Here is the completed pop function.

void *DfArray_Pop(DfArray *array) {
  if (array->length < 1) {
    fprintf(stderr, "Error: Array is empty, can not pop\n");
    exit(1);
  } 
  void *dest = malloc(array->elem_size);
  memcpy(dest, (char *)array->items + (array->length - 1) * array->elem_size, array->elem_size);
  array->length--;

  if (array->length <= array->capacity/2 || array->length == 0) {
    DfArray_Shrink(array);
  }

  return dest;
}

For shrinking the array the process is similar to increasing the allocated size. The first thing I do is check if the current length of the array is 0, if it is then I free array→items, set the pointer for items to NULL and set capacity to 0 then return. I do this to ensure there is no memory leakage.

 if (array->length == 0) {
    free(array->items);
    array->items = NULL;
    array->capacity = 0;
    return;
  }

The rest of the function is pretty much the same as resizing, define the variables, set the new capacity to the current length of the array, reallocate the memory, error check, and reassign the values in the array. This is the final product.

void DfArray_Shrink(DfArray* array) {
  if (array->length == 0) {
    free(array->items);
    array->items = NULL;
    array->capacity = 0;
    return;
  }

  size_t new_capacity = array->length;
  void *shrunk_items = realloc(array->items, new_capacity * array->elem_size);

  if (!shrunk_items) {
    perror("Realloc failed in dfArray_Shrink.");
  }

  array->items = shrunk_items;
  array->capacity = new_capacity;
}

That wraps up the push and pop functions. The next set of functions are very similar to push and pop, they just work with the head of the array instead of the tail, so I will keep it brief.

Shift & Unshift

Lets start with Unshift, this function adds a new element on to the front of the array similar to how the push function adds an element to the end.

It does the standard bounds checking, resizes if needed, moves all of the elements in the array to the right by one, copies the new elements into the first index and finally increments the length.

void DfArray_Unshift(DfArray* array, void *value) {
  if (array->length >= array->capacity) {
    DfArray_Resize(array);
  }

  memmove((char *)array->items + array->elem_size, array->items, array->length * array->elem_size);
  memcpy(array->items, value, array->elem_size);

  array->length++;
}

Moving on to the shift function, this removes an element from the front on an array. Again similar to pop, it does bounds checking, allocates memory for the destination variable, copies the element into the newly allocated space, shifts all elements in the array to left by one index, decrements the length and returns the element.

void *DfArray_Shift(DfArray* array) {
  if (array->length == 0) {
    fprintf(stderr, "Error: Cannot shift from an empty array\n");
    exit(1);
  }
  void *dest = malloc(array->elem_size);
  memcpy(dest, array->items, array->elem_size);
  memmove(array->items, (char *)array->items + array->elem_size, (array->length -1) * array->elem_size);
  array->length--;

  if (array->length <= array->capacity/2 || array->length == 0) {
    DfArray_Shrink(array);
  }

  return dest;
}

Get & Set

Bear with me, I know this is a long article but we are almost done! The last two functions are pretty simple. The get function returns an element from a specified index and the set function updates the element stored in a specified index.

Get, like I have shown before starts with a bounds check to ensure it’s not accessing memory where it shouldn’t. Next it defines a variable and allocates memory for that variable to store the element. Finally it copies the element from the array into the allocated space and returns the element.

void *DfArray_Get(DfArray *array, size_t index) {
  if (index >= array->length) {
    fprintf(stderr, "Error: Index %zu out of bounds (length: %zu)\n", index, array->length);
    exit(1);
  }
  void *dest = malloc(array->elem_size);
  memcpy(dest, (char *)array->items + index * array->elem_size, array->elem_size);
  return dest;
}

Set is the most simple function so far, it does a bounds check to ensure the specified index is valid then copies the value passed in as a parameter into the proper index, overwriting the element that is currently there.

void DfArray_Set(DfArray *array, size_t index, void *value) {
  if (index >= array->length) {
    fprintf(stderr, "Error: Index %zu out of bounds (length: %zu)\n", index, array->length);
    exit(1);
  }
  memcpy((char *)array->items + index * array->elem_size, value, array->elem_size);
}

Usage & Wrapping Up

If you made it this far, thank you for reading! I hope you learned something new and enjoyed doing so. If you have happened to notice a mistake or have a suggestion, please let me know! The whole point of me writing this blog is so I can learn new things, so please do not be shy about letting me know my mistakes or showing me a better way. With that I will leave you with some example usage of the dynamic array I built for Data Forge and see you in the next one!

#include <stdio.h>
#include <stdlib.h>
#include "df_array.h"

int main() {
    // Create an array for integers with an initial capacity of 5
    DfArray *array = DfArray_Create(sizeof(int), 5);

    // Push some values
    for (int i = 1; i <= 5; i++) {
        DfArray_Push(array, &i);
    }

    // Print the values
    printf("Array elements after pushing:\n");
    for (size_t i = 0; i < array->length; i++) {
        int *val = (int *)DfArray_Get(array, i);
        printf("%d ", *val);
        free(val);  // Free the returned allocated memory
    }
    printf("\n");

    // Pop a value
    int *popped = (int *)DfArray_Pop(array);
    printf("Popped value: %d\n", *popped);
    free(popped);

    // Shift a value
    int *shifted = (int *)DfArray_Shift(array);
    printf("Shifted value: %d\n", *shifted);
    free(shifted);

    // Unshift a value
    int unshiftValue = 100;
    DfArray_Unshift(array, &unshiftValue);
    printf("Array elements after unshifting 100:\n");
    for (size_t i = 0; i < array->length; i++) {
        int *val = (int *)DfArray_Get(array, i);
        printf("%d ", *val);
        free(val);
    }
    printf("\n");

    // Destroy the array
    DfArray_Destroy(array);

    return 0;
}

/*
Expected Output:

Array elements after pushing:
1 2 3 4 5 
Popped value: 5
Shifted value: 1
Array elements after unshifting 100:
100 2 3 4 
*/
0
Subscribe to my newsletter

Read articles from Andrew Archer directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Andrew Archer
Andrew Archer